Expanding inputs for image tokens in LLaVa-NeXT should be done in processing.

#34
by miniTsl - opened

I was using the example code in model card for image understanding, but I get these messages which I am not sure should be cared for. If so, what should I do? Thanks a lot !!!

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. 
Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.
vision_feature_select_strategy = {{vision_feature_select_strategy}}`.
 Using processors without these attributes in the config is deprecated and will throw an error in v4.47.

The same message appears when I use LLaVA 1.5 models, quite strange cause I stick strictly to the code provided in model cards

I solved this problem by adding 2 lines when in llava-1.5-7b-hf initialization:

self.processor.patch_size = self.model.config.vision_config.patch_size

self.processor.vision_feature_select_strategy = self.model.config.vision_feature_select_strategy

The code above means that I point out the patch_size and vision_feature_select_strategy manually using the same values from model.config.

Llava Hugging Face org

Hey everyone, the official model config will be updated with the new params soon, most prob in 2-3 weeks. That should eliminate any recent bugs with latency/indexing etc.

@RaushanTurganbay Hi! Could you confirm if the quick fix mentioned above for the model config is sufficient, or could you help update the config soon? We're on a tight deadline and want to prevent any potential issues, would really appreciate your help with updating it

Llava Hugging Face org

@AsteriaCao updating the configs this week and yes, the above quick fix should work for now on the latest release. From the next release all official llava checkpoint will not throw that warning

Sign up or log in to comment