Unofficial LLaMAfied Version in HF format - 非官方的LLaMA化HF格式版本

#5
by JosephusCheung - opened

https://huggingface.co/JosephusCheung/Qwen-VL-LLaMAfied-7B-Chat

Similar to LLaMAfied Qwen-7B-Chat, the visual parts and LLM are separated, and LLM is restructured and recalibrated into the standard LLaMA/LLaMA2 format (with GPT-2 tokenizer). It can be used with any tools that are compatible with LLaMA, such as stream output, llama.cpp quantization, and so on.

Thanks for your work. I think the vision.bin file is the visual parts, How to fine tuning this model and inference?

Thanks for your work. I think the vision.bin file is the visual parts, How to fine tuning this model and inference?

The structure of the LLM part is identical to that of LLaMA, allowing you to utilize HF transformers with the LlamaForCausalLM. For the vision part, you can utilize visual.py from the original Qwen-VL repository. This allows you to convert images into LM input embeddings. You can then manually concatenate these with your text instruction input for the LLM.

It is quite obvious I think.

JosephusCheung changed discussion status to closed
JosephusCheung changed discussion status to open

https://huggingface.co/JosephusCheung/Qwen-VL-LLaMAfied-7B-Chat

Similar to LLaMAfied Qwen-7B-Chat, the visual parts and LLM are separated, and LLM is restructured and recalibrated into the standard LLaMA/LLaMA2 format (with GPT-2 tokenizer). It can be used with any tools that are compatible with LLaMA, such as stream output, llama.cpp quantization, and so on.

how to use?

https://huggingface.co/JosephusCheung/Qwen-VL-LLaMAfied-7B-Chat

Similar to LLaMAfied Qwen-7B-Chat, the visual parts and LLM are separated, and LLM is restructured and recalibrated into the standard LLaMA/LLaMA2 format (with GPT-2 tokenizer). It can be used with any tools that are compatible with LLaMA, such as stream output, llama.cpp quantization, and so on.

how to use?

Use LLM the way you use LLaMA-2, and use visual.py from Qwen-VL for VL part, which is obvious for literate people.

https://huggingface.co/JosephusCheung/Qwen-VL-LLaMAfied-7B-Chat

Similar to LLaMAfied Qwen-7B-Chat, the visual parts and LLM are separated, and LLM is restructured and recalibrated into the standard LLaMA/LLaMA2 format (with GPT-2 tokenizer). It can be used with any tools that are compatible with LLaMA, such as stream output, llama.cpp quantization, and so on.

how to use?

Use LLM the way you use LLaMA-2, and use visual.py from Qwen-VL for VL part, which is obvious for literate people.

This is difficult for me, i don't know how to merge two models for inferencing. could you give me some code examples? Thanks!

https://huggingface.co/JosephusCheung/Qwen-VL-LLaMAfied-7B-Chat

Similar to LLaMAfied Qwen-7B-Chat, the visual parts and LLM are separated, and LLM is restructured and recalibrated into the standard LLaMA/LLaMA2 format (with GPT-2 tokenizer). It can be used with any tools that are compatible with LLaMA, such as stream output, llama.cpp quantization, and so on.

Great work, thank you very much.

Sign up or log in to comment