File size: 7,355 Bytes
4e0e41b 8728fc6 4e0e41b b091d6b 4e0e41b b091d6b 4e0e41b b091d6b 4e0e41b 0ab07f7 4e0e41b 8f1b234 4e0e41b 8f1b234 4e0e41b f20fbbb 4e0e41b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
license: llama2
language:
- it
tags:
- text-generation-inference
---
<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" alt="llamantino53" border="0" width="200px">
# LLaMAntino-2-70b-hf-UltraChat-ITA 🇮🇹 🌟
*Last Update: 02/02/2024*<br>
<hr>
## Model description
<!-- Provide a quick summary of what the model is/does. -->
**LLaMAntino-2-70b-hf-UltraChat-ITA** is a *Large Language Model (LLM)* that is an instruction-tuned version of **LLaMAntino-2-70b** (an italian-adapted **LLaMA 2 - 70B**).
This model aims to provide Italian NLP researchers with an improved model for italian dialogue use cases.
The model was trained using *QLora* and using as training data [UltraChat](https://github.com/thunlp/ultrachat) translated to the italian language using [Argos Translate](https://pypi.org/project/argostranslate/1.4.0/).
If you are interested in more details regarding the training procedure, you can find the code we used at the following link:
- **Repository:** https://github.com/swapUniba/LLaMAntino
**NOTICE**: the code has not been released yet, we apologize for the delay, it will be available asap!
- **Developed by:** Pierpaolo Basile, Elio Musacchio, Marco Polignano, Lucia Siciliani, Giuseppe Fiameni, Giovanni Semeraro
- **Funded by:** PNRR project FAIR - Future AI Research
- **Compute infrastructure:** [Leonardo](https://www.hpc.cineca.it/systems/hardware/leonardo/) supercomputer
- **Model type:** LLaMA-2
- **Language(s) (NLP):** Italian
- **License:** Llama 2 Community License
- **Finetuned from model:** [swap-uniba/meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
## Prompt Format
This prompt format based on the [LLaMA 2 prompt template](https://gpus.llm-utils.org/llama-2-prompt-template/) adapted to the italian language was used:
```python
" [INST] <<SYS>>\n" \
"Sei un assistente disponibile, rispettoso e onesto di nome Llamantino. " \
"Rispondi sempre nel modo più utile possibile, pur essendo sicuro. " \
"Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
"Assicurati che le tue risposte siano socialmente imparziali e positive. " \
"Se una domanda non ha senso o non è coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
"Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
"<</SYS>>\n\n" \
f"{user_msg_1} [/INST] {model_answer_1} </s> <s> [INST] {user_msg_2} [/INST] {model_answer_2} </s> ... <s> [INST] {user_msg_N} [/INST] {model_answer_N} </s>"
```
We recommend using the same prompt in inference to obtain the best results!
## How to Get Started with the Model
Below you can find an example of model usage:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"
model = "swap-uniba/LLaMAntino-2-70b-hf-UltraChat-ITA"
tokenizer = AutoTokenizer.from_pretrained(model)
tokenizer.add_special_tokens({"pad_token":"<unk>"})
tokenizer.chat_template = "{% set ns = namespace(i=0) %}" \
"{% for message in messages %}" \
"{% if message['role'] == 'user' and ns.i == 0 %}" \
"{{ bos_token +' [INST] <<SYS>>\n' }}" \
"{{ 'Sei un assistente disponibile, rispettoso e onesto di nome Llamantino. ' }}" \
"{{ 'Rispondi sempre nel modo più utile possibile, pur essendo sicuro. ' }}" \
"{{ 'Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. ' }}" \
"{{ 'Assicurati che le tue risposte siano socialmente imparziali e positive. ' }}" \
"{{ 'Se una domanda non ha senso o non è coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. ' }}" \
"{{ 'Se non conosci la risposta a una domanda, non condividere informazioni false.\n' }}" \
"{{ '<</SYS>>\n\n' }}" \
"{{ message['content'] + ' [/INST]' }}" \
"{% elif message['role'] == 'user' and ns.i != 0 %} " \
"{{ bos_token + ' [INST] ' + message['content'] + ' [/INST]' }}" \
"{% elif message['role'] == 'assistant' %}" \
"{{ ' ' + message['content'] + ' ' + eos_token + ' ' }}" \
"{% endif %}" \
"{% set ns.i = ns.i+1 %}" \
"{% endfor %}"
model = AutoModelForCausalLM.from_pretrained(
model,
torch_dtype=torch.float16,
device_map='balanced',
use_flash_attention_2=True
)
pipe = transformers.pipeline(model=model,
device_map="balanced",
tokenizer=tokenizer,
return_full_text=False, # langchain expects the full text
task='text-generation',
max_new_tokens=512, # max number of tokens to generate in the output
temperature=0.7 #temperature
)
messages = [{"role": "user", "content": "Cosa sono i word embeddings?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
sequences = pipe(text)
for seq in sequences:
print(f"{seq['generated_text']}")
```
If you are facing issues when loading the model, you can try to load it **Quantized**:
```python
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
```
*Note*:
1) The model loading strategy above requires the [*bitsandbytes*](https://pypi.org/project/bitsandbytes/) and [*accelerate*](https://pypi.org/project/accelerate/) libraries
2) The Tokenizer, by default, adds at the beginning of the prompt the '\<BOS\>' token. If that is not the case, add as a starting token the *\<s\>* string.
## Evaluation
For a detailed comparison of model performance, check out the [Leaderboard for Italian Language Models](https://huggingface.co/spaces/FinancialSupport/open_ita_llm_leaderboard).
Here's a breakdown of the performance metrics:
| Metric | hellaswag_it acc_norm | arc_it acc_norm | m_mmlu_it 5-shot acc | Average |
|:----------------------------|:----------------------|:----------------|:---------------------|:--------|
| **Accuracy Normalized** | 0.6566 | 0.5004 | 0.6084 | 0.588 |
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use this model in your research, please cite the following:
```bibtex
@misc{basile2023llamantino,
title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language},
author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
year={2023},
eprint={2312.09993},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
*Notice:* Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. [*License*](https://ai.meta.com/llama/license/)
|