Hello and congratulations on getting accepted to ECCV!
I have indexed your paper here: https://huggingface.co/papers/2309.05300
If you can merge this PR the paper will be linked, also would be great if you could apply this across other models.

Hi! Thanks for the indexing and the PR. Noticed some corrections:

  • "pipeline_tag: zero-shot-image-classification": it's more suitable as "self-supervised learning" or something similar since we do not have zero-shot reports.
  • "The official dataset release": should be model release.

Regarding other models, I'm not super familiar with the indexing system here, is there any instruction somewhere that I can follow?

@wangyi111 thanks for the response! this is a joint image-text encoder model I think, no? I have seen you comparing with CLIP, hence the tag (CLIP models are zero-shot image classification models as the task tag on Hub). self-supervised learning isn't a task on Hub, tasks are essentially dependent on i/o types a model has.
Sorry for the typo, feel free to merge this PR and I'll open a follow-up, or you can edit quickly too. This PR is opened primarily to show you how paper pages & model releases work. Essentially you just need to do the change I've made to the model cards (README.md) of other models you have.

Cool, I see. Yeah this was a bit confusing sry:) CLIP in our case was like a cross-modal contrastive learning style, and we proposed a new multimodal representation learning or multimodal pretraining method. I just checked some other models in HF. This work is probably more like DINO and DINOv2. So, I guess the tag could be "Image Feature Extraction"?

I will merge and update then. Thanks for the info and the help!

@wangyi111 it's perfectly fine, so long as it's multimodal it has to be zero-shot image classification. DINO is an image only backbone (tasks aren't concerned about pre-training techniques).

ok cool. i'll stick with it👍thx for the help!

wangyi111 changed pull request status to merged

Sign up or log in to comment