OpenGVLab/InternVL2-40B · CUDA error: device-side assert triggered

When i solved the multi-gpu err, this error occured:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/home/yhzhang/xu_liu/Crossmodal_lingual/halu_recognition/intern-40b/test.py", line 130, in
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_internvl_chat.py", line 286, in chat
generation_output = self.generate(
^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/.cache/huggingface/modules/transformers_modules/InternVL2-40B/modeling_internvl_chat.py", line 336, in generate
outputs = self.language_model.generate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/generation/utils.py", line 1525, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/generation/utils.py", line 2622, in sample
outputs = self(
^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1070, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 798, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yhzhang/anaconda3/envs/lx/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 439, in forward
attn_output = attn_output.transpose(1, 2).contiguous()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.