Spaces:

Ahmadzei
/

RAG

Runtime error

File size: 472 Bytes

57bdca5

OWL-ViT builds on top of CLIP by using it as its backbone for zero-shot object detection. After pretraining, an object detection head is added to make a set prediction over the (class, bounding box) pairs.
Encoder-decoder[[mm-encoder-decoder]]
Optical character recognition (OCR) is a long-standing text recognition task that typically involves several components to understand the image and generate the text. TrOCR simplifies the process using an end-to-end Transformer.