Issues while deploying on AWS SageMaker with TGI
I've been trying to deploy codellama/CodeLlama-13b-Instruct-hf
on AWS SageMaker with the TGI container for a while now. I am facing two issues in particular -
- The tokenizer class mismatch -
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'.
The class this function is called from is 'LlamaTokenizer'.
- Model loading error with TGI -
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner model = get_model( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 185, in get_model return FlashLlama( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 65, in __init__ model = FlashLlamaForCausalLM(config, weights) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 452, in __init__ self.model = FlashLlamaModel(config, weights) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 390, in __init__ [ File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 391, in <listcomp> FlashLlamaLayer( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 326, in __init__ self.self_attn = FlashLlamaAttention( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 183, in __init__ self.rotary_emb = PositionRotaryEmbedding.load( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 395, in load inv_freq = weights.get_tensor(f"{prefix}.inv_freq") File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 62, in get_tensor filename, tensor_name = self.get_filename(tensor_name) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist
Any idea about how these can be resolved?
I have tried using the latest transformers version - 4.33.1
as well.
cc @philschmid
Same issue here ... Any help would be greatly appreciated
I already tried to pip install different transformer versions, but none of them was able to fix the problem.
!pip install git+https://github.com/huggingface/transformers.git@main
!pip install git+https://github.com/ArthurZucker/transformers.git@main
!pip install git+https://github.com/ArthurZucker/transformers.git@add-llama-code
You should only need pip install git+https://github.com/huggingface/transformers.git@main
my branch was just for developpement
This warning is safe to ignore.
Both tokenizer are the same (for TGI purposes) as TGI doesn't use the codellama in code capabilities, you would need to send the preprompt yourself.
For the missing inv_freq
codellama's weights didn't include those (essentially it's llamav2) and old TGI versions expected inv_freq
to be present.
This should all be solved with the upcoming Sagemaker release of latest TGI.
Soon I hope, but I can't make any promises (it's not in our hands at this point)
1.0.3
is now available on SageMaker.