Facing the error: `rope_scaling` must be a dictionary with with two fields by running the example code

#1
by ardasevinc - opened

Transformers, optimum, flash_attn all installed from source

(torch) arda@toprak:/data/arda/projects/test/Yarn-Llama-2-13B-128K-GPTQ$ python main.py 
Traceback (most recent call last):
  File "/data/arda/projects/test/Yarn-Llama-2-13B-128K-GPTQ/main.py", line 7, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/arda/envs/torch/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 519, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/arda/envs/torch/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1037, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/arda/envs/torch/lib/python3.11/site-packages/transformers/configuration_utils.py", line 747, in from_dict
    config = cls(**config_dict)
             ^^^^^^^^^^^^^^^^^^
  File "/data/arda/envs/torch/lib/python3.11/site-packages/transformers/models/llama/configuration_llama.py", line 149, in __init__
    self._rope_scaling_validation()
  File "/data/arda/envs/torch/lib/python3.11/site-packages/transformers/models/llama/configuration_llama.py", line 167, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with with two fields, `type` and `factor`, got {'factor': 32.0, 'original_max_position_embeddings': 4096, 'type': 'yarn', 'finetuned': True}
(torch) arda@toprak:/data/arda/projects/test/Yarn-Llama-2-13B-128K-GPTQ$ 

I'm finding the same issue, including using transformers 4.33. @TheBloke are you seeing this issue on your end?

I also posted on transformers: https://github.com/huggingface/transformers/issues/25957

Solution here for the 13B model github issue

The 7B model has an inf issue on generation.

Sign up or log in to comment