halbihn commited on
Commit
6728502
·
verified ·
1 Parent(s): b1cbf5e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: teknium/OpenHermes-2.5-Mistral-7B
3
+ tags:
4
+ - mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - gpt4
9
+ - synthetic data
10
+ - distillation
11
+ - dpo
12
+ - rlhf
13
+ license: apache-2.0
14
+ language:
15
+ - en
16
+ datasets:
17
+ - mlabonne/chatml_dpo_pairs
18
+ ---
19
+
20
+ <center><img src="https://i.imgur.com/qIhaFNM.png"></center>
21
+
22
+ # NeuralHermes 2.5 - Mistral 7B
23
+
24
+ NeuralHermes is based on the [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on most benchmarks (see results).
25
+
26
+ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
27
+
28
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/1h4tAJStIef_BcO-OkY97X9_OFgKnFrLl). It required an A100 GPU for about an hour.
29
+
30
+ ## Quantized models
31
+
32
+ * **GGUF**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
33
+ * **AWQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
34
+ * **GPTQ**: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
35
+ * **EXL2**:
36
+ * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
37
+ * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
38
+ * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
39
+ * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
40
+ * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
41
+
42
+ ## Results
43
+
44
+ **Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
45
+
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
47
+
48
+ Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
49
+
50
+ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
51
+
52
+ ### AGIEval
53
+ ![](https://i.imgur.com/7an3B1f.png)
54
+
55
+ ### GPT4All
56
+ ![](https://i.imgur.com/TLxZFi9.png)
57
+
58
+ ### TruthfulQA
59
+ ![](https://i.imgur.com/V380MqD.png)
60
+
61
+ You can check the Weights & Biases project [here](https://wandb.ai/mlabonne/NeuralHermes-2-5-Mistral-7B/overview?workspace=user-mlabonne).
62
+
63
+ ## Usage
64
+
65
+ You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.
66
+
67
+ You can also run this model using the following code:
68
+
69
+ ```python
70
+ import transformers
71
+ from transformers import AutoTokenizer
72
+
73
+ # Format prompt
74
+ message = [
75
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
76
+ {"role": "user", "content": "What is a Large Language Model?"}
77
+ ]
78
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
79
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
80
+
81
+ # Create pipeline
82
+ pipeline = transformers.pipeline(
83
+ "text-generation",
84
+ model=new_model,
85
+ tokenizer=tokenizer
86
+ )
87
+
88
+ # Generate text
89
+ sequences = pipeline(
90
+ prompt,
91
+ do_sample=True,
92
+ temperature=0.7,
93
+ top_p=0.9,
94
+ num_return_sequences=1,
95
+ max_length=200,
96
+ )
97
+ print(sequences[0]['generated_text'])
98
+ ```
99
+
100
+ ## Training hyperparameters
101
+
102
+ **LoRA**:
103
+ * r=16
104
+ * lora_alpha=16
105
+ * lora_dropout=0.05
106
+ * bias="none"
107
+ * task_type="CAUSAL_LM"
108
+ * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
109
+
110
+ **Training arguments**:
111
+ * per_device_train_batch_size=4
112
+ * gradient_accumulation_steps=4
113
+ * gradient_checkpointing=True
114
+ * learning_rate=5e-5
115
+ * lr_scheduler_type="cosine"
116
+ * max_steps=200
117
+ * optim="paged_adamw_32bit"
118
+ * warmup_steps=100
119
+
120
+ **DPOTrainer**:
121
+ * beta=0.1
122
+ * max_prompt_length=1024
123
+ * max_length=1536