Error. Crash. "The attention mask is not set and cannot be inferred from input

#8
by MartialTerran - opened

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

C:\Users\User\OneDrive\Desktop\Qwen2>pip show transformers
Name: transformers
Version: 4.46.3
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by:

C:\Users\User\OneDrive\Desktop\Qwen2>python Qwen2_model.py

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "C:\Users\User\OneDrive\Desktop\Qwen2\Qwen2_model.py", line 1, in
from transformers import AutoModelForCausalLM, AutoTokenizer
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1767, in getattr
value = getattr(module, name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\modeling_auto.py", line 21, in
from .auto_factory import (
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 40, in
from ...generation import GenerationMixin
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 28, in
from ..cache_utils import (
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py", line 1853, in
class OffloadedStaticCache(StaticCache):
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py", line 1918, in OffloadedStaticCache
offload_device: Union[str, torch.device] = torch.device("cpu"),
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py:1918: UserWarning: Failed to initialize NumPy: ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
offload_device: Union[str, torch.device] = torch.device("cpu"),
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchvision\datapoints_init
.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(BETA_TRANSFORMS_WARNING)
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchvision\transforms\v2_init
.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

The code is copied from Readme.md file in this website:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2-1.5B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Sign up or log in to comment