Qwen2 sample model.py does not work.

#7
by MartialTerran - opened

C:\Users\User\OneDrive\Desktop\Qwen2>python Qwen2_model.py
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 660/660 [00:00<?, ?B/s]
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:139: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\User.cache\huggingface\hub\models--Qwen--Qwen2-1.5B-Instruct. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
Traceback (most recent call last):
File "C:\Users\User\OneDrive\Desktop\Qwen2\Qwen2_model.py", line 4, in
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 957, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 671, in getitem
raise KeyError(key)
KeyError: 'qwen2'

C:\Users\User\OneDrive\Desktop\Qwen2>

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2-1.5B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

The error "KeyError: 'qwen2'" indicates that the Transformers library you're using doesn't recognize 'qwen2' as a valid model type. This is likely because you haven't installed the necessary dependencies or are using an older version of Transformers that doesn't include support for Qwen-2.

Here's how to fix it:

  1. Install the correct dependencies:

Make sure you have the transformers library installed along with the specific dependencies for Qwen-2. You can install them using pip:

pip install transformers accelerate
Use code with caution.
Bash
Important: You might need a more recent version of transformers. Try upgrading:

pip install --upgrade transformers
Use code with caution.
Bash
If you're still encountering issues after upgrading, explicitly install a version known to work with Qwen-2:

pip install transformers>=4.34.0
Use code with caution.
Bash
2. Use the correct model identifier:

Double-check the model identifier you're using in AutoModelForCausalLM.from_pretrained(). It needs to be the correct Hugging Face model hub identifier. Refer to the official Qwen-2 documentation or the model card on Hugging Face for the exact identifier. It might be something like:

"Qwen/Qwen-2-1.5B-Instruct" (example, confirm the correct one)

  1. Verify your Qwen2_model.py script:

Ensure that your script correctly imports the necessary classes and uses the correct model identifier:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen-2-1.5B-Instruct" # Replace with the accurate model ID

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

... rest of your code for using the model ...

inputs = tokenizer("Hello, I am", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Use code with caution.
Python
4. Restart your Python kernel or environment:

After installing new packages, it's crucial to restart your Python kernel (if using Jupyter Notebook or similar) or your Python environment to ensure the changes are properly loaded.

  1. Check for typos:
    Ensure you've typed the model identifier correctly. Even a small typo can cause a KeyError.

If you're still facing issues, provide the following information for further assistance:

Exact version of the transformers library: (run pip show transformers)

The full content of your Qwen2_model.py script:

The exact model identifier you are using: (e.g., "Qwen/Qwen-2-1.5B-Instruct")

Any other relevant error messages:

By following these steps, you should be able to resolve the KeyError and successfully load the Qwen-2 model. Remember to consult the official Qwen-2 documentation for the most up-to-date instructions and model identifiers.

MartialTerran changed discussion status to closed

The error "KeyError: 'qwen2'" indicates that the Transformers library you're using doesn't recognize 'qwen2' as a valid model type. This is likely because you haven't installed the necessary dependencies or are using an older version of Transformers that doesn't include support for Qwen-2.

Here's how to fix it:

  1. Install the correct dependencies:

Make sure you have the transformers library installed along with the specific dependencies for Qwen-2. You can install them using pip:

pip install transformers accelerate
Use code with caution.
Bash
Important: You might need a more recent version of transformers. Try upgrading:

pip install --upgrade transformers
Use code with caution.
Bash
If you're still encountering issues after upgrading, explicitly install a version known to work with Qwen-2:

pip install transformers>=4.34.0
Use code with caution.
Bash
2. Use the correct model identifier:

Double-check the model identifier you're using in AutoModelForCausalLM.from_pretrained(). It needs to be the correct Hugging Face model hub identifier. Refer to the official Qwen-2 documentation or the model card on Hugging Face for the exact identifier. It might be something like:

"Qwen/Qwen-2-1.5B-Instruct" (example, confirm the correct one)

  1. Verify your Qwen2_model.py script:

Ensure that your script correctly imports the necessary classes and uses the correct model identifier:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Qwen/Qwen-2-1.5B-Instruct" # Replace with the accurate model ID

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

... rest of your code for using the model ...

inputs = tokenizer("Hello, I am", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Use code with caution.
Python
4. Restart your Python kernel or environment:

After installing new packages, it's crucial to restart your Python kernel (if using Jupyter Notebook or similar) or your Python environment to ensure the changes are properly loaded.

  1. Check for typos:
    Ensure you've typed the model identifier correctly. Even a small typo can cause a KeyError.

If you're still facing issues, provide the following information for further assistance:

Exact version of the transformers library: (run pip show transformers)

The full content of your Qwen2_model.py script:

The exact model identifier you are using: (e.g., "Qwen/Qwen-2-1.5B-Instruct")

Any other relevant error messages:

By following these steps, you should be able to resolve the KeyError and successfully load the Qwen-2 model. Remember to consult the official Qwen-2 documentation for the most up-to-date instructions and model identifiers.

The error you initially encountered, KeyError: 'qwen2', indicates that the Transformers library you were using didn't have the configuration for the qwen2 model architecture registered. This usually happens when you have an older version of Transformers installed. You correctly fixed this by updating or reinstalling Transformers as suggested in the comments (using either pip install --upgrade transformers or conda install -c conda-forge transformers).

Your subsequent code successfully loads and uses the Qwen-2 model. Let's break down why it now works:

from transformers import AutoModelForCausalLM, AutoTokenizer: This line imports the necessary classes for loading a pre-trained language model and its tokenizer. AutoModelForCausalLM is used for causal language modeling (text generation), and AutoTokenizer automatically selects the appropriate tokenizer based on the model.

device = "cuda": This specifies that you want to use your GPU ("cuda") for running the model. If you don't have a compatible GPU, change this to "cpu".

model = AutoModelForCausalLM.from_pretrained(...): This loads the pre-trained Qwen-2 model from the Hugging Face Model Hub.

"Qwen/Qwen2-1.5B-Instruct": Specifies the model name.

torch_dtype="auto": Lets Transformers choose the best data type for your hardware (usually fp16 for GPUs).

device_map="auto": Automatically distributes the model across available GPUs if you have multiple. Crucial for large models that might not fit on a single GPU.

tokenizer = AutoTokenizer.from_pretrained(...): This loads the tokenizer corresponding to the Qwen-2 model.

Prompt formatting: This section correctly formats your prompt using the chat template expected by Qwen-2:

messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
Use code with caution.
Python
model_inputs = tokenizer([text], return_tensors="pt").to(device): Tokenizes the input text, converts it to PyTorch tensors, and moves it to the specified device (GPU in this case).

generated_ids = model.generate(...): This generates text based on the input. max_new_tokens=512 limits the generated output to 512 tokens.

Post-processing: The code extracts the generated tokens, decodes them back into text using the tokenizer, and prints the response.

To run this code:

Ensure you have a compatible GPU and PyTorch with CUDA support if you're using device="cuda". If not, use device="cpu". If using CUDA, ensure the correct CUDA toolkit is installed and configured for PyTorch.

Install the required libraries: You've already done this: pip install transformers torch.

Run the Python script.

This revised code provides a good foundation for using the Qwen-2 model for text generation tasks. Remember that running large language models often requires significant computational resources (GPU with sufficient memory).

After pip install --upgrade transformers
the model.py downloaded files including model.safetensors

Now, seems to ask for a missing attention mask :The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

from transformers import AutoModelForCausalLM, AutoTokenizer

File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1767, in getattr
value = getattr(module, name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\modeling_auto.py", line 21, in
from .auto_factory import (
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 40, in
from ...generation import GenerationMixin
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in get_module
return importlib.import_module("." + module_name, self.name)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\importlib_init
.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\generation\utils.py", line 28, in
from ..cache_utils import (
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py", line 1853, in
class OffloadedStaticCache(StaticCache):
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py", line 1918, in OffloadedStaticCache
offload_device: Union[str, torch.device] = torch.device("cpu"),
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\cache_utils.py:1918: UserWarning: Failed to initialize NumPy: ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
offload_device: Union[str, torch.device] = torch.device("cpu"),
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchvision\datapoints_init
.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(BETA_TRANSFORMS_WARNING)
C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchvision\transforms\v2_init
.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

The error message "A module that was compiled using NumPy 1.x cannot be run in NumPy 2.1.3..." indicates a compatibility issue between a module (likely within Transformers or a related library) that was built with an older version of NumPy and your current NumPy 2.1.3.

Here's how to address this issue:

  1. Upgrade transformers :

Open your terminal or command prompt.

Run: pip install --upgrade --force-reinstall transformers
This will force a reinstall of the transformers library, ensuring that it's rebuilt against your current NumPy version which will likely solve the issue. The --force-reinstall flag is there to overcome potential caching issues. If you have a CUDA enabled GPU, try uninstalling torch and transformers before reinstalling transformers to force a rebuild that links correctly to CUDA.

  1. If the above doesn't work, create a new virtual environment:

Create a new environment:

python -m venv .venv # Or conda create -n qwen-env python=3.9
Use code with caution.
Bash
Activate the environment:

.venv\Scripts\activate # Or conda activate qwen-env (on Windows)
source .venv/bin/activate # Or conda activate qwen-env (on macOS/Linux)
Use code with caution.
Bash
Install dependencies:

pip install transformers torch
Use code with caution.
Bash
Try running your script again. This isolates the dependencies and ensures you're working with compatible versions.

Explanation of other warnings:

_ARRAY_API not found: This is a warning from PyTorch indicating that it could not find the _ARRAY_API from NumPy. This warning about NumPy often occurs on Windows because of a dependency conflict or installation issue, and is usually resolved by reinstalling the transformers library as described above.

torchvision Beta warning: These warnings tell you that some parts of the torchvision library you're using are still in beta and might change. You can safely ignore them for now, or suppress them using torchvision.disable_beta_transforms_warning(). This is usually solved by running pip install --upgrade torchvision.

Attention mask warning: The "attention mask is not set..." warning is important. Qwen-2 needs an attention mask to know which tokens to attend to. The warning suggests your padding token is the same as your end-of-sequence token, making it impossible to infer the mask. You need to provide an attention_mask to model.generate for correct behavior. Refer to the Transformers documentation for how to create an attention mask if your padding and end-of-sentence tokens are indeed the same. In general, chat models don't use padding, so this might be an indication that your input data is not correctly formatted for a chat model. Consider using the apply_chat_template method of the tokenizer to correctly format the prompt as shown in your previous successful run.

By addressing the NumPy compatibility issue and the attention mask warning, your Qwen-2 model should run without errors. If you continue to have errors, consider providing your full, updated Qwen2_model.py script for review.

The code you provided is correct and should work well for generating text using the Qwen-2 model. However, you previously encountered a NumPy compatibility error. Before running this code again, make sure you've followed the steps outlined in my previous response to resolve that error. This likely involves upgrading the transformers library or creating a new virtual environment.

Sign up or log in to comment