|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
<h1 align="center"> Moxin 7B </h1> |
|
|
|
<p align="center"> <a href="https://github.com/moxin-org/Moxin-LLM">Home Page</a>    |    <a href="https://github.com/moxin-org/Moxin-LLM/blob/main/report/Moxin_7B.pdf">Technical Report</a>    |    <a href="https://huggingface.co/moxin-org/moxin-7b">Base Model</a>    |    <a href="https://huggingface.co/moxin-org/moxin-chat-7b">Chat Model</a> </p> |
|
|
|
|
|
|
|
|
|
## Model |
|
You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-7b) and our chat 7B model from this [link](https://huggingface.co/moxin-org/moxin-chat-7b). |
|
|
|
|
|
|
|
## Inference |
|
|
|
You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link. |
|
|
|
``` |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline |
|
|
|
model_name = 'moxin-org/moxin-7b' |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
) |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer = tokenizer, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
prompt = "Can you explain the concept of regularization in machine learning?" |
|
|
|
sequences = pipe( |
|
prompt, |
|
do_sample=True, |
|
max_new_tokens=100, |
|
temperature=0.7, |
|
top_k=50, |
|
top_p=0.95, |
|
num_return_sequences=1, |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot). |
|
|
|
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave | |
|
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:| |
|
| Mistral-7B | 57.59 | 83.25 | 62.42 | 78.77 | 70.51 | |
|
| LLaMA 3.1-8B | 54.61 | 81.95 | 65.16 | 77.35 | 69.77 | |
|
| LLaMA 3-8B | 55.46 | 82.09 | 65.29 | 77.82 | 70.17 | |
|
| LLaMA 2-7B | 49.74 | 78.94 | 45.89 | 74.27 | 62.21 | |
|
| Qwen 2-7B | 57.68 | 80.76 | 70.42 | 77.43 | 71.57 | |
|
| gemma-7b | 56.48 | 82.31 | 63.02 | 78.3 | 70.03 | |
|
| internlm2.5-7b | 54.78 | 79.7 | 68.17 | 80.9 | 70.89 | |
|
| Baichuan2-7B | 47.87 | 73.89 | 54.13 | 70.8 | 61.67 | |
|
| Yi-1.5-9B | 58.36 | 80.36 | 69.54 | 77.53 | 71.48 | |
|
| Moxin-7B-original | 53.75 | 75.46 | 59.43 | 70.32 | 64.74 | |
|
| Moxin-7B-finetuned | 59.47 | 83.08 | 60.97 | 78.69 | 70.55 | |
|
|
|
|
|
We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below. |
|
|
|
| Models | HellaSwag | WinoGrade | PIQA | ARC-E | ARC-C | Ave | |
|
|:-----------------:|:---------:|:---------:|:-----:|:-----:|:-----:|:-----:| |
|
| Mistral-7B | 80.39 | 73.4 | 82.15 | 78.28 | 52.22 | 73.29 | |
|
| LLaMA 2-7B | 75.99 | 69.06 | 79.11 | 74.54 | 46.42 | 69.02 | |
|
| LLaMA 2-13B | 79.37 | 72.22 | 80.52 | 77.4 | 49.06 | 71.71 | |
|
| LLaMA 3.1-8B | 78.92 | 74.19 | 81.12 | 81.06 | 53.67 | 73.79 | |
|
| gemma-7b | 80.45 | 73.72 | 80.9 | 79.97 | 54.1 | 73.83 | |
|
| Qwen v2-7B | 78.9 | 72.38 | 79.98 | 74.71 | 50.09 | 71.21 | |
|
| internlm2.5-7b | 79.14 | 77.9 | 80.52 | 76.16 | 51.37 | 73.02 | |
|
| Baichuan2-7B | 72.25 | 67.17 | 77.26 | 72.98 | 42.15 | 66.36 | |
|
| Yi-1.5-9B | 77.86 | 73.01 | 80.74 | 79.04 | 55.03 | 73.14 | |
|
| deepseek-7b | 76.13 | 69.77 | 79.76 | 71.04 | 44.8 | 68.3 | |
|
| Moxin-7B-original | 72.06 | 66.31 | 78.07 | 71.47 | 48.15 | 67.21 | |
|
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|