add "mm_spatial_pool_mode" to config.

by litianjian - opened Sep 12

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-0

litianjian

Sep 12

"mm_spatial_pool_mode" is still a configuration, similar to the llava_next_video.

add "mm_spatial_pool_mode" to config.8521f254

RaushanTurganbay

Llava Hugging Face org Sep 12

@litianjian yes, but all LLaVA-OV models use only bilinear interpolation and the HF code doesn't support any other pooling modes therefore

litianjian

Sep 12

@litianjian yes, but all LLaVA-OV models use only bilinear interpolation and the HF code doesn't support any other pooling modes therefore

Thank you for your reply. This flags will affect the execution of the code in llava-ov git repo, and this flag also takes effect in my project.
Of course, all released llava-ov models use only bilinear interpolation and HF code adopts the default value. I can submit an issue in HF code after this.

RaushanTurganbay

Llava Hugging Face org Sep 12

@litianjian can you elaborate pls on how this affect llava-ov repo, as the llava-repo is not expected to support HF-style checkpoints? If you're training/tuning a model in llava repo and want to deploy with HF, you can convert the weights to HF style with our conversion script. It is available in transformers repo under model/llava_onevision/convert_weights.py

litianjian

Sep 12

@litianjian can you elaborate pls on how this affect llava-ov repo, as the llava-repo is not expected to support HF-style checkpoints? If you're training/tuning a model in llava repo and want to deploy with HF, you can convert the weights to HF style with our conversion script. It is available in transformers repo under model/llava_onevision/convert_weights.py

Thank you for your reply. We are training the llava-ov models by the llava-ov repo. Meanwhile, I focus on the model development. vLLM which depends on the hF, is our first choice. The entire workflow is "llava-ov -> HF -> vLLM"

RaushanTurganbay

Llava Hugging Face org Sep 12

I see now. In that case I recommend to convert your weight to HF format by using this script (https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py) which also uploads them on the hub after conversion. Since vllm works with hf models I don't think it expects the config key 'mm_spatial_pool_mode'.

Also I am not sure llava-ov architecture is supported, just found this issue in the repo (https://github.com/vllm-project/vllm/issues/7420). LMK if the suggestions help :)

litianjian

Sep 18

I see now. In that case I recommend to convert your weight to HF format by using this script (https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py) which also uploads them on the hub after conversion. Since vllm works with hf models I don't think it expects the config key 'mm_spatial_pool_mode'.

Also I am not sure llava-ov architecture is supported, just found this issue in the repo (https://github.com/vllm-project/vllm/issues/7420). LMK if the suggestions help :)

The PRs(https://github.com/vllm-project/vllm/pull/8486) to support llava-ov architecture in vLLM will be merged soon.

RaushanTurganbay

Llava Hugging Face org Sep 18

@litianjian Super cool! From the PR seems like the pooling_mode is not needed anymore to be in config file?

litianjian

Sep 18

@litianjian Super cool! From the PR seems like the pooling_mode is not needed anymore to be in config file?

vLLM's implementation is based on the LlavaOnevisionConfig and LlavaOnevisionForConditionalGeneration in HF. "pooling_model” is not supported in HF so that I can not figure it out in vllm's implementation.

RaushanTurganbay

Llava Hugging Face org Sep 18

oke, cool. Then you can use same code as in HF I guess for packing image features. Feel free to close the PR and thanks a lot for working in vLLM support :)

litianjian changed pull request status to closed Sep 19

litianjian changed pull request status to open Oct 12

litianjian

Oct 12

In the recent release, llava-video models contain more configs of “ mm_spatial_pool_mode” and others.

RaushanTurganbay

Llava Hugging Face org Oct 14

Can you give examples of such checkpoints pls? Prob we dont have them in the HF format yet and I will change the config only when the code will start supporting mm_spatial_pool_mode. Otherwise the mm_spatial_pool_mode will have no effect, and will not be saved unless specified in the config within library code

litianjian

15 days ago

The new models in https://huggingface.co/collections/lmms-lab/llava-video-661e86f5e8dabc3ff793c944 are essentially the model structures of LLaVA-Onevision. Due to lack of configs, it can not be implemented using transformers directly.

RaushanTurganbay

Llava Hugging Face org 15 days ago

Ah, I see, for new llava-video models we first have to change the code on the hub to support the model. Otherwise changing config will not make any difference, as the values will not be used anyway.

I got a PR to port the model to its respective folder (llava-next-video) here, it should not be llava-onevision. I will continue work on that soon

litianjian

15 days ago

Ah, I see, for new llava-video models we first have to change the code on the hub to support the model. Otherwise changing config will not make any difference, as the values will not be used anyway.

I got a PR to port the model to its respective folder (llava-next-video) here, it should not be llava-onevision. I will continue work on that soon

Thanks for your reply. Discussed with students in the Llava team, the structures of Llava-video models are essentially the same as those of LLaVA-Onevision. Perhaps it would be a better choice for LLaVA-Onevision to support both?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment