mlabonne commited on
Commit
0ec7bb8
·
1 Parent(s): 36aa009

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-v0.1
3
+ tags:
4
+ - mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - gpt4
9
+ - synthetic data
10
+ - distillation
11
+ - dpo
12
+ - rlhf
13
+ model-index:
14
+ - name: OpenHermes-2-Mistral-7B
15
+ results: []
16
+ license: apache-2.0
17
+ language:
18
+ - en
19
+ datasets:
20
+ - mlabonne/chatml_dpo_pairs
21
+ ---
22
+
23
+ <center><img src="https://i.imgur.com/qIhaFNM.png"></center>
24
+
25
+ # NeuralHermes 2.5 - Mistral 7B
26
+
27
+ NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset.
28
+
29
+ It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template. I haven't performed a comprehensive evaluation of the model, but it works great, nothing broken apparently! :)
30
+
31
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
32
+
33
+ GGUF versions of this model are available here: [mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B-GGUF).
34
+
35
+ ## Usage
36
+
37
+ You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.
38
+
39
+ You can also run this model using the following code:
40
+
41
+ ```python
42
+ import transformers
43
+ from transformers import AutoTokenizer
44
+
45
+ # Format prompt
46
+ message = [
47
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
48
+ {"role": "user", "content": "What is a Large Language Model?"}
49
+ ]
50
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
51
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
52
+
53
+ # Create pipeline
54
+ pipeline = transformers.pipeline(
55
+ "text-generation",
56
+ model=new_model,
57
+ tokenizer=tokenizer
58
+ )
59
+
60
+ # Generate text
61
+ sequences = pipeline(
62
+ prompt,
63
+ do_sample=True,
64
+ temperature=0.7,
65
+ top_p=0.9,
66
+ num_return_sequences=1,
67
+ max_length=200,
68
+ )
69
+ print(sequences[0]['generated_text'])
70
+ ```
71
+
72
+
73
+ ## Training hyperparameters
74
+
75
+ **LoRA**:
76
+ * r=16,
77
+ * lora_alpha=16,
78
+ * lora_dropout=0.05,
79
+ * bias="none",
80
+ * task_type="CAUSAL_LM",
81
+ * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
82
+
83
+ **Training arguments**:
84
+ * per_device_train_batch_size=4,
85
+ * gradient_accumulation_steps=4,
86
+ * gradient_checkpointing=True,
87
+ * learning_rate=5e-5,
88
+ * lr_scheduler_type="cosine",
89
+ * max_steps=200,
90
+ * optim="paged_adamw_32bit",
91
+ * warmup_steps=100,
92
+
93
+ **DPOTrainer**:
94
+ * beta=0.1,
95
+ * max_prompt_length=1024,
96
+ * max_length=1536,