Paper link

by merve HF staff - opened Sep 11

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-1

merve

Sep 11

•

edited Sep 13

Hello and congratulations on getting accepted to ECCV!
I have indexed your paper here: https://huggingface.co/papers/2309.05300
If you can merge this PR the paper will be linked, also would be great if you could apply this across other models.

Paper link8d9bdde4

wangyi111

Owner Sep 13

Hi! Thanks for the indexing and the PR. Noticed some corrections:

"pipeline_tag: zero-shot-image-classification": it's more suitable as "self-supervised learning" or something similar since we do not have zero-shot reports.
"The official dataset release": should be model release.

Regarding other models, I'm not super familiar with the indexing system here, is there any instruction somewhere that I can follow?

merve

Sep 13

@wangyi111 thanks for the response! this is a joint image-text encoder model I think, no? I have seen you comparing with CLIP, hence the tag (CLIP models are zero-shot image classification models as the task tag on Hub). self-supervised learning isn't a task on Hub, tasks are essentially dependent on i/o types a model has.
Sorry for the typo, feel free to merge this PR and I'll open a follow-up, or you can edit quickly too. This PR is opened primarily to show you how paper pages & model releases work. Essentially you just need to do the change I've made to the model cards (README.md) of other models you have.

wangyi111

Owner Sep 13

Cool, I see. Yeah this was a bit confusing sry:) CLIP in our case was like a cross-modal contrastive learning style, and we proposed a new multimodal representation learning or multimodal pretraining method. I just checked some other models in HF. This work is probably more like DINO and DINOv2. So, I guess the tag could be "Image Feature Extraction"?

I will merge and update then. Thanks for the info and the help!

merve

Sep 13

•

edited Sep 13

@wangyi111 it's perfectly fine, so long as it's multimodal it has to be zero-shot image classification. DINO is an image only backbone (tasks aren't concerned about pre-training techniques).

wangyi111

Owner Sep 13

ok cool. i'll stick with it👍thx for the help!

wangyi111 changed pull request status to merged Sep 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment