Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 9 months ago

472 Bytes

	OWL-ViT builds on top of CLIP by using it as its backbone for zero-shot object detection. After pretraining, an object detection head is added to make a set prediction over the (class, bounding box) pairs.
	Encoder-decoder[[mm-encoder-decoder]]
	Optical character recognition (OCR) is a long-standing text recognition task that typically involves several components to understand the image and generate the text. TrOCR simplifies the process using an end-to-end Transformer.