Issues when run LTU-AS model

by Yoohao - opened Aug 7

Aug 7

Hi all,

I have installed the latest SpeechBrain development version, but when I tried to run the inference of LTU-AS, I encountered an issue. The first command (from speechbrain.inference.multimodal import LTU_AS) shows the error "No module named 'speechbrain.inference.multimodal'". Does the current SpeechBrain version support LTU-AS? Furthermore, I noticed that the link to the training information (https://github.com/speechbrain/speechbrain/tree/develop/recipes/OpenASQA/ltu-as) is also empty.

Looking forward your reply. :)

yingzhi

SpeechBrain org Aug 7

•

edited Aug 7

Hi @Yoohao thank you for your interest. The PR https://github.com/speechbrain/speechbrain/pull/2550 is not yet merged into the dev branch, we plan to do it soon. If you want you can install this branch https://github.com/BenoitWang/speechbrain/tree/speech_llm and run the recipe for now.

Best,
Yingzhi

Yoohao

Aug 8

Hi yingzhi, thank you for your timely help. The former problem has been solved, while unfortunately, here comes another one.

When I run the cammond "ltu_as = LTU_AS.from_hparams(source="speechbrain/speech-llm-LTU-AS-openasqa")" to load the model, it shows there is a mismatch between the architecture and the parameters:
"RuntimeError: Error(s) in loading state_dict for LLAMA2:
size mismatch for model.base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1])."
I found the model is generated by the speechbrain/lobes/models/huggingface_transformers/llama2.py, while the LLMs used for this project is llama3. I hope this feedback could make your project better.

yingzhi

SpeechBrain org Aug 8

Hi @Yoohao , I didn't have this problem, could you try the huggingface version mentioned here https://github.com/BenoitWang/speechbrain/blob/speech_llm/recipes/OpenASQA/ltu-as/extra_requirements.txt to see if it works please? As for the LLAMA2 script, we use it for all the llama series models since there's no difference in the architecture.

Best,
Yingzhi

Yoohao

Aug 9

Hi yingzhi,

Thanks again for your warm help. The previous suggestion worked well, but I'm sorry to say that there is still an issue. When I run "processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")", it shows the error, "Wrong index found for <|0.02|>: should be None but found 50366." It seems to be a problem with the time-stamp token in the vocabulary. Is there something I'm still missing?

yingzhi

SpeechBrain org Aug 9

Hi @Yoohao I am using transformers==4.34.0 and it works fine. I guess maybe you were using a lower version. Thanks for reporting I will add this info to the model card.

Simon13456

Sep 11

The above problem is solved, but new problems have emerged:
model_id = "openai/whisper-large-v3"
processor = AutoProcessor.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\models\auto\processing_auto.py", line 287, in from_pretrained
return processor_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 226, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\processing_utils.py", line 270, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "Lib\site-packages\transformers\tokenization_utils_base.py", line 2066, in _from_pretrained
raise ValueError(
ValueError: Wrong index found for <|0.02|>: should be None but found 50366.

yingzhi

SpeechBrain org Sep 11

Hi @Simon13456 , I used the following combo without issues, could you try and see if it works?

torch==2.2.2
transformers==4.34.0
tokenizers==0.14.1

Simon13456

Sep 11

@yingzhi Thanks for your help. In addition to these, I upgraded huggingface-hub to solve the problem.

yingzhi

SpeechBrain org Sep 11

Good to know! Thank you as well for testing!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment