Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
|
6 |
+
<h1 align="center"> Moxin 7B </h1>
|
7 |
+
|
8 |
+
<p align="center"> <a href="https://github.com/OminiX-ai/OminiX-LLM">Home Page</a>    |    <a href="https://github.com/OminiX-ai/OminiX-LLM/blob/main/report/Moxin_7B.pdf">Technical Report</a>    |    <a href="https://huggingface.co/moxin-org/moxin-7b">Base Model</a>    |    <a href="https://huggingface.co/moxin-org/moxin-chat-7b">Chat Model</a> </p>
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
|
13 |
+
## Model
|
14 |
+
You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-7b) and our chat 7B model from this [link](https://huggingface.co/moxin-org/moxin-chat-7b).
|
15 |
+
|
16 |
+
|
17 |
+
|
18 |
+
## Inference
|
19 |
+
|
20 |
+
You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link.
|
21 |
+
|
22 |
+
```
|
23 |
+
import torch
|
24 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
25 |
+
|
26 |
+
model_name = 'moxin-org/moxin-chat-7b'
|
27 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
28 |
+
model = AutoModelForCausalLM.from_pretrained(
|
29 |
+
model_name,
|
30 |
+
torch_dtype=torch.bfloat16,
|
31 |
+
device_map="auto",
|
32 |
+
trust_remote_code=True,
|
33 |
+
)
|
34 |
+
|
35 |
+
pipe = pipeline(
|
36 |
+
"text-generation",
|
37 |
+
model=model,
|
38 |
+
tokenizer = tokenizer,
|
39 |
+
torch_dtype=torch.bfloat16,
|
40 |
+
device_map="auto"
|
41 |
+
)
|
42 |
+
|
43 |
+
prompt = "Can you explain the concept of regularization in machine learning?"
|
44 |
+
|
45 |
+
sequences = pipe(
|
46 |
+
prompt,
|
47 |
+
do_sample=True,
|
48 |
+
max_new_tokens=100,
|
49 |
+
temperature=0.7,
|
50 |
+
top_k=50,
|
51 |
+
top_p=0.95,
|
52 |
+
num_return_sequences=1,
|
53 |
+
)
|
54 |
+
print(sequences[0]['generated_text'])
|
55 |
+
```
|
56 |
+
|
57 |
+
|
58 |
+
|
59 |
+
|
60 |
+
## Evaluation
|
61 |
+
|
62 |
+
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).
|
63 |
+
|
64 |
+
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|
65 |
+
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|
66 |
+
| Mistral-7B | 57.59 | 83.25 | 62.42 | 78.77 | 70.51 |
|
67 |
+
| LLaMA 3.1-8B | 54.61 | 81.95 | 65.16 | 77.35 | 69.77 |
|
68 |
+
| LLaMA 3-8B | 55.46 | 82.09 | 65.29 | 77.82 | 70.17 |
|
69 |
+
| LLaMA 2-7B | 49.74 | 78.94 | 45.89 | 74.27 | 62.21 |
|
70 |
+
| Qwen 2-7B | 57.68 | 80.76 | 70.42 | 77.43 | 71.57 |
|
71 |
+
| gemma-7b | 56.48 | 82.31 | 63.02 | 78.3 | 70.03 |
|
72 |
+
| internlm2.5-7b | 54.78 | 79.7 | 68.17 | 80.9 | 70.89 |
|
73 |
+
| Baichuan2-7B | 47.87 | 73.89 | 54.13 | 70.8 | 61.67 |
|
74 |
+
| Yi-1.5-9B | 58.36 | 80.36 | 69.54 | 77.53 | 71.48 |
|
75 |
+
| Moxin-7B-original | 53.75 | 75.46 | 59.43 | 70.32 | 64.74 |
|
76 |
+
| Moxin-7B-finetuned | 59.47 | 83.08 | 60.97 | 78.69 | 70.55 |
|
77 |
+
|
78 |
+
|
79 |
+
We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below.
|
80 |
+
|
81 |
+
| Models | HellaSwag | WinoGrade | PIQA | ARC-E | ARC-C | Ave |
|
82 |
+
|:-----------------:|:---------:|:---------:|:-----:|:-----:|:-----:|:-----:|
|
83 |
+
| Mistral-7B | 80.39 | 73.4 | 82.15 | 78.28 | 52.22 | 73.29 |
|
84 |
+
| LLaMA 2-7B | 75.99 | 69.06 | 79.11 | 74.54 | 46.42 | 69.02 |
|
85 |
+
| LLaMA 2-13B | 79.37 | 72.22 | 80.52 | 77.4 | 49.06 | 71.71 |
|
86 |
+
| LLaMA 3.1-8B | 78.92 | 74.19 | 81.12 | 81.06 | 53.67 | 73.79 |
|
87 |
+
| gemma-7b | 80.45 | 73.72 | 80.9 | 79.97 | 54.1 | 73.83 |
|
88 |
+
| Qwen v2-7B | 78.9 | 72.38 | 79.98 | 74.71 | 50.09 | 71.21 |
|
89 |
+
| internlm2.5-7b | 79.14 | 77.9 | 80.52 | 76.16 | 51.37 | 73.02 |
|
90 |
+
| Baichuan2-7B | 72.25 | 67.17 | 77.26 | 72.98 | 42.15 | 66.36 |
|
91 |
+
| Yi-1.5-9B | 77.86 | 73.01 | 80.74 | 79.04 | 55.03 | 73.14 |
|
92 |
+
| deepseek-7b | 76.13 | 69.77 | 79.76 | 71.04 | 44.8 | 68.3 |
|
93 |
+
| Moxin-7B-original | 72.06 | 66.31 | 78.07 | 71.47 | 48.15 | 67.21 |
|
94 |
+
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|
95 |
+
|
96 |
+
|
97 |
+
|
98 |
+
|
99 |
+
|
100 |
+
|