Spaces:
Runtime error
Square crop?
Looking at this collab interactive demo the model doesn't seem to be limited to square inputs.
Would be nice to support arbitrary sizes in this demo as well (let me know if there is a part I can help)
Example:
Hi @ceyda , great catch!
The model only accepts a fixed square size input, which is defined in the config file of each model version. That's why we have a post-processing method that scales the predicted boundary boxes using a target image size.
And you are right, the app is not post-processing the output predictions correctly because (1) a square target size is passed in to rescale the predicted boundary boxes and (2) there is a bug in the post_processing method.
Would you like to work fixing the post_processing bug? Or I can fix it shortly.
Cheers,
Alara
@adirik
I'm not sure the problem is with the post_processing
It seems to produce correct bbox sizes when we resize images manually before passing them to the processor like in here:
https://huggingface.co/spaces/adirik/OWL-ViT/discussions/2/files
Although processor is supposed to be resizing inputs internally 🤔 as seen here: https://github.com/huggingface/transformers/blob/ab2006e3d6db88654526a4169e65d4bfc52da2e3/src/transformers/models/owlvit/feature_extraction_owlvit.py#L197
So I don't know maybe a bug still somewhere.
The issue is now fixed! The bug was due to defining the target size within OwlViTFeatureExtractor as a single value instead of a tuple (768 instead of (768, 768)), which led the input image to be cropped later in the pipeline.