Issues when run LTU-AS model
Hi all,
I have installed the latest SpeechBrain development version, but when I tried to run the inference of LTU-AS, I encountered an issue. The first command (from speechbrain.inference.multimodal import LTU_AS) shows the error "No module named 'speechbrain.inference.multimodal'". Does the current SpeechBrain version support LTU-AS? Furthermore, I noticed that the link to the training information (https://github.com/speechbrain/speechbrain/tree/develop/recipes/OpenASQA/ltu-as) is also empty.
Looking forward your reply. :)
Hi @Yoohao thank you for your interest. The PR https://github.com/speechbrain/speechbrain/pull/2550 is not yet merged into the dev branch, we plan to do it soon. If you want you can install this branch https://github.com/BenoitWang/speechbrain/tree/speech_llm and run the recipe for now.
Best,
Yingzhi
Hi yingzhi, thank you for your timely help. The former problem has been solved, while unfortunately, here comes another one.
When I run the cammond "ltu_as = LTU_AS.from_hparams(source="speechbrain/speech-llm-LTU-AS-openasqa")" to load the model, it shows there is a mismatch between the architecture and the parameters:
"RuntimeError: Error(s) in loading state_dict for LLAMA2:
size mismatch for model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1])."
I found the model is generated by the speechbrain/lobes/models/huggingface_transformers/llama2.py, while the LLMs used for this project is llama3. I hope this feedback could make your project better.
Hi @Yoohao , I didn't have this problem, could you try the huggingface version mentioned here https://github.com/BenoitWang/speechbrain/blob/speech_llm/recipes/OpenASQA/ltu-as/extra_requirements.txt to see if it works please? As for the LLAMA2 script, we use it for all the llama series models since there's no difference in the architecture.
Best,
Yingzhi
Hi yingzhi,
Thanks again for your warm help. The previous suggestion worked well, but I'm sorry to say that there is still an issue. When I run "processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")", it shows the error, "Wrong index found for <|0.02|>: should be None but found 50366." It seems to be a problem with the time-stamp token in the vocabulary. Is there something I'm still missing?
The above problem is solved, but new problems have emerged:
model_id = "openai/whisper-large-v3"
processor = AutoProcessor.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\models\auto\processing_auto.py", line 287, in from_pretrained
return processor_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 226, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 270, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 2066, in _from_pretrained
raise ValueError(
ValueError: Wrong index found for <|0.02|>: should be None but found 50366.
Hi @Simon13456 , I used the following combo without issues, could you try and see if it works?
torch==2.2.2
transformers==4.34.0
tokenizers==0.14.1
@yingzhi Thanks for your help. In addition to these, I upgraded huggingface-hub to solve the problem.
Good to know! Thank you as well for testing!