Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE

#68
by rameshch - opened

Is it possible to merge MiniCPM-Llama3-V-2-5 with a Llama-3-1 based model using MOE? I have a fine-tuned MiniCPM-Llam3-v2-5 based model and i would like to merge this with one of our domain fine-tuned llama-3-1-8B text generation model and it throws error with mergekit.

This is becoming a show-stopper for us to move with using our vision fine-tuned model. Any guidance will help.

Alternatively, Pls confirm if we can use model merge_and_unload to merge the minicmp_v5 adapters with llama 3.1 based model.

rameshch changed discussion status to closed
rameshch changed discussion status to open
OpenBMB org

I haven't tried it, and most of the errors in this strategy come from the fact that the model architectures of llama3.1 and llama3 are not the same, and based on my reading of the llama3.1 paper, it looks like they made very, very minor changes to his architecture, and this may be the reason

Thanks @Cuiunbo for your reply. Can you pls assist to give a try by merging MiniCPM-Llam3-v2-5 based model / adapter with a Llama-3-1 / 3.0 text generation model and guide me if possible ?. Thanks in advance for your assistance.

I had issues merging adapter of MiniCPM-Llam3-v2-5 model with LLama-3 based text model. Even if you can assist to make this work, this will help to remove a hurdle and help us to move forward.

OpenBMB org

Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.

Thanks @Cuiunbo . Can you pls guide me on how I could do the above steps ? I would be thankful for your updates here again on your assistance

@Cuiunbo Additional Info. When i try to merge the MiniCPM-V2.5 adapter with a LLama 3 base model , I see the below error
"ValueError: Target modules llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj) not found in the base model. Please check the target modules and try again."

with the statement
model = PeftModel.from_pretrained(model, new_model_name)

Here model is a llama-3 base model and new_model_name points an adapter from MiniCPM-v2.5

Try to load the weight and replace llm.model.layers.x with yours.
I will try it later, maybe in the next two weeks.

@Cuiunbo i tried as you said to load the weights from our domain based text llama-3 model into the MiniCPM-v2.5-llama model folder and configured the model.safetensors.index.json file within the folder to point "llm.*" properties alone to point to our text model weights without touching other properties.

When i try to load the model for inference, I am getting the below error
ERROR Stack


File "C:\ProgramData\anaconda3\envs\llava\lib\site-packages\accelerate\utils\modeling.py", line 354, in set_module_tensor_to_device
raise ValueError(f"{tensor_name} is on the meta device, we need a value to put in on {device}.")
ValueError: weight is on the meta device, we need a value to put in on 0.

can u pls guide me

@Cuiunbo Pls respond

OpenBMB org

You may try first loading all model dict to GPU(your model and minicpmv25), then replace every LLM layer.

This comment has been hidden

Thanks @Cuiunbo Can you give details on how we can do this - first try loading all model dict to GPU(your model and minicpmv25), then replace every LLM layer) ?

Sign up or log in to comment