Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.
When trying to run the model, I get the error
"Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported."
It is raised by "Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 724, as it is indeed not contained in the TOKENIZER_MAPPING_NAMES OrderedDict.
The requested tokenizer "CodeLlamaTokenizer" is defined in "models\codellama_CodeLlama-7b-Instruct-hf\tokenizer_config.json".
Can you please help me with this issue?
Post running the pip install, use it normally as you would have any python package. from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
. Then tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-Instruct-hf")
@saksham-lamini doesn't from_pretrained automatically download the model from api? what is the point of downloading the git repo also then?
@Emrys95
you're right that from_pretrained
will download the model or in this case the token shards from the API, but it would try to load it into a CodeLlamaTokenizer
class which does not exist if you did a normal pip install.
pip install git+https://github.com/huggingface/transformers.git@main
Thank you @pcuenq , that worked perfectly! I am using this LLM with the oobabooga Web UI, and the installer didn't provide the correct transformers version yet.
@pcuenq I've been trying for days to get one of these models running, always running into one problem or another, such as python package conflicts (im new at this, yes), could you please give me some valid code i can just copy/paste and i can work to get it running? So far only GPT2 has worked for me, the very old version, but fine tuning it has resulted in catastrophic forgetting where it cant answer anything except my own document which i fed to it. If you could guide me in the right direction i'd appreciate it.
@Emrys95 yeah I too need some valid full code as there have been a lot of dependancy issues coming
same issue.
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[9], line 1
----> 1 from transformers import CodeLlamaTokenizer
ImportError: cannot import name 'CodeLlamaTokenizer' from 'transformers' (/Users/hawei/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/__init__.py)
same issue.
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[9], line 1 ----> 1 from transformers import CodeLlamaTokenizer ImportError: cannot import name 'CodeLlamaTokenizer' from 'transformers' (/Users/hawei/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/__init__.py)
This error fix after I re-install main branch transformers.
But I get a new error.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 tokenizer = AutoTokenizer.from_pretrained(model)
File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:735, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
731 if tokenizer_class is None:
732 raise ValueError(
733 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
734 )
--> 735 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
737 # Otherwise we have to be creative.
738 # if model is an encoder decoder, the encoder tokenizer class is used by default
739 if isinstance(config, EncoderDecoderConfig):
File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs)
1851 else:
1852 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1854 return cls._from_pretrained(
1855 resolved_vocab_files,
1856 pretrained_model_name_or_path,
1857 init_configuration,
1858 *init_inputs,
1859 token=token,
1860 cache_dir=cache_dir,
1861 local_files_only=local_files_only,
1862 _commit_hash=commit_hash,
1863 _is_local=is_local,
1864 **kwargs,
1865 )
File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2017, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs)
2015 # Instantiate tokenizer.
2016 try:
-> 2017 tokenizer = cls(*init_inputs, **init_kwargs)
2018 except OSError:
2019 raise OSError(
2020 "Unable to load vocabulary from file. "
2021 "Please check that the provided vocabulary is accessible and not corrupted."
2022 )
File ~/miniconda3/envs/lang/lib/python3.10/site-packages/transformers/models/code_llama/tokenization_code_llama_fast.py:154, in CodeLlamaTokenizerFast.__init__(self, vocab_file, tokenizer_file, clean_up_tokenization_spaces, unk_token, bos_token, eos_token, prefix_token, middle_token, suffix_token, eot_token, fill_token, add_bos_token, add_eos_token, **kwargs)
151 self.update_post_processor()
153 self.vocab_file = vocab_file
--> 154 self.can_save_slow_tokenizer = False if not self.vocab_file else True
156 self._prefix_token = prefix_token
157 self._middle_token = middle_token
AttributeError: can't set attribute 'can_save_slow_tokenizer'
This was fixed on main!
This was fixed on main!
Actully I still face this issue.
pip installing from the main branch fixes the issue, but installing from the main branch will also cause a latency bug that slows down inference speed when using 4bit.
Edit: Fixed by pip installing directly from the branch which added CodeLlama support: https://github.com/huggingface/transformers/pull/25740
Installed using: pip install git+https://github.com/ArthurZucker/transformers.git@add-llama-code
This was fixed on main!