BlouseJury commited on
Commit
52d145d
1 Parent(s): 53b5dc8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +443 -0
README.md ADDED
@@ -0,0 +1,443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: tongyi-qianwen
4
+ license_link: >-
5
+ https://huggingface.co/Qwen/Qwen1.5-110B/blob/main/LICENSE
6
+ base_model: Qwen/Qwen2-72B
7
+ tags:
8
+ - generated_from_trainer
9
+ - axolotl
10
+ datasets:
11
+ - cognitivecomputations/Dolphin-2.9
12
+ - teknium/OpenHermes-2.5
13
+ - m-a-p/CodeFeedback-Filtered-Instruction
14
+ - cognitivecomputations/dolphin-coder
15
+ - cognitivecomputations/samantha-data
16
+ - microsoft/orca-math-word-problems-200k
17
+ - Locutusque/function-calling-chatml
18
+ - internlm/Agent-FLAN
19
+ ---
20
+
21
+ # Dolphin 2.9.2 Qwen2 72B 🐬
22
+
23
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
24
+
25
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
26
+ Discord: https://discord.gg/cognitivecomputations
27
+
28
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
29
+
30
+ Our appreciation for the sponsors of Dolphin 2.9.2:
31
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
32
+
33
+ This model is based on Qwen2-72b, and is governed by [tongyi-qianwen license](LICENSE)
34
+
35
+ The base model has 128k context, and the full-weight fine-tuning was with 8k sequence length.
36
+
37
+ This model was trained FFT on parameters selected by [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py), using ChatML prompt template format.
38
+
39
+ example:
40
+
41
+ ```
42
+ <|im_start|>system
43
+ You are Dolphin, a helpful AI assistant.<|im_end|>
44
+ <|im_start|>user
45
+ {prompt}<|im_end|>
46
+ <|im_start|>assistant
47
+
48
+ ```
49
+
50
+ Dolphin-2.9.2 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
51
+
52
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
53
+
54
+ Dolphin is licensed according to Qwen's tongyi-qianwen license. We grant permission for any use, including commercial, that falls within accordance with said license. Dolphin was trained on data generated from GPT4, among other models.
55
+
56
+ ## Evals
57
+
58
+ ![image/png](https://i.ibb.co/B4x1Ddr/file-2ao0fl-K2-B2-Hmka-Epd0ja-QY0x.webp)
59
+
60
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
61
+ <details><summary>See axolotl config</summary>
62
+
63
+ axolotl version: `0.4.0`
64
+ ```yaml
65
+ base_model: Qwen/Qwen2-72B
66
+ model_type: AutoModelForCausalLM
67
+ tokenizer_type: AutoTokenizer
68
+
69
+ trust_remote_code: true
70
+
71
+ # load_in_8bit: true
72
+ # load_in_4bit: false
73
+ # strict: false
74
+
75
+ datasets:
76
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin201-sharegpt2.jsonl
77
+ type: sharegpt
78
+ conversation: chatml
79
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin-coder-codegen-sharegpt2.jsonl
80
+ type: sharegpt
81
+ conversation: chatml
82
+ - path: /workspace/datasets/dolphin-2.9.2/dolphin-coder-translate-sharegpt2.jsonl
83
+ type: sharegpt
84
+ conversation: chatml
85
+ - path: /workspace/datasets/dolphin-2.9.2/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
86
+ type: sharegpt
87
+ conversation: chatml
88
+ - path: /workspace/datasets/dolphin-2.9.2/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
89
+ type: sharegpt
90
+ conversation: chatml
91
+ - path: /workspace/datasets/dolphin-2.9.2/not_samantha_norefusals.jsonl
92
+ type: sharegpt
93
+ conversation: chatml
94
+ - path: /workspace/datasets/dolphin-2.9.2/openhermes200k_unfiltered.jsonl
95
+ type: sharegpt
96
+ conversation: chatml
97
+ - path: /workspace/datasets/dolphin-2.9.2/Orca-Math-resort-unfiltered.jsonl
98
+ type: sharegpt
99
+ conversation: chatml
100
+ - path: /workspace/datasets/dolphin-2.9.2/SystemChat_sharegpt.jsonl
101
+ type: sharegpt
102
+ conversation: chatml
103
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_instruct_j1s1_3k_unfiltered.jsonl
104
+ type: sharegpt
105
+ conversation: chatml
106
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_negative_unfiltered.jsonl
107
+ type: sharegpt
108
+ conversation: chatml
109
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_react_10p_unfiltered.jsonl
110
+ type: sharegpt
111
+ conversation: chatml
112
+ - path: /workspace/datasets/dolphin-2.9.2/toolbench_tflan_cot_30p_unfiltered.jsonl
113
+ type: sharegpt
114
+ conversation: chatml
115
+ - path: /workspace/datasets/dolphin-2.9.2/agent_instruct_react_unfiltered.jsonl
116
+ type: sharegpt
117
+ conversation: chatml
118
+
119
+ unfrozen_parameters:
120
+ - ^lm_head.weight$
121
+ - ^model.embed_tokens.weight$
122
+ # mlp.down_proj layers
123
+ - model.layers.62.mlp.down_proj
124
+ - model.layers.63.mlp.down_proj
125
+ - model.layers.66.mlp.down_proj
126
+ - model.layers.65.mlp.down_proj
127
+ - model.layers.64.mlp.down_proj
128
+ - model.layers.67.mlp.down_proj
129
+ - model.layers.68.mlp.down_proj
130
+ - model.layers.60.mlp.down_proj
131
+ - model.layers.31.mlp.down_proj
132
+ - model.layers.69.mlp.down_proj
133
+ - model.layers.61.mlp.down_proj
134
+ - model.layers.59.mlp.down_proj
135
+ - model.layers.70.mlp.down_proj
136
+ - model.layers.30.mlp.down_proj
137
+ - model.layers.76.mlp.down_proj
138
+ - model.layers.72.mlp.down_proj
139
+ - model.layers.77.mlp.down_proj
140
+ - model.layers.71.mlp.down_proj
141
+ - model.layers.29.mlp.down_proj
142
+ - model.layers.58.mlp.down_proj
143
+ - model.layers.75.mlp.down_proj
144
+ - model.layers.32.mlp.down_proj
145
+ - model.layers.56.mlp.down_proj
146
+ - model.layers.28.mlp.down_proj
147
+ - model.layers.26.mlp.down_proj
148
+ - model.layers.33.mlp.down_proj
149
+ - model.layers.34.mlp.down_proj
150
+ - model.layers.57.mlp.down_proj
151
+ - model.layers.27.mlp.down_proj
152
+ - model.layers.25.mlp.down_proj
153
+ - model.layers.35.mlp.down_proj
154
+ - model.layers.73.mlp.down_proj
155
+ - model.layers.24.mlp.down_proj
156
+ - model.layers.78.mlp.down_proj
157
+ - model.layers.74.mlp.down_proj
158
+ - model.layers.54.mlp.down_proj
159
+ # mlp.gate_proj layers
160
+ - model.layers.78.mlp.gate_proj
161
+ - model.layers.77.mlp.gate_proj
162
+ - model.layers.76.mlp.gate_proj
163
+ - model.layers.79.mlp.gate_proj
164
+ - model.layers.75.mlp.gate_proj
165
+ - model.layers.74.mlp.gate_proj
166
+ - model.layers.73.mlp.gate_proj
167
+ - model.layers.70.mlp.gate_proj
168
+ - model.layers.72.mlp.gate_proj
169
+ - model.layers.71.mlp.gate_proj
170
+ - model.layers.69.mlp.gate_proj
171
+ - model.layers.54.mlp.gate_proj
172
+ - model.layers.68.mlp.gate_proj
173
+ - model.layers.57.mlp.gate_proj
174
+ - model.layers.63.mlp.gate_proj
175
+ - model.layers.49.mlp.gate_proj
176
+ - model.layers.55.mlp.gate_proj
177
+ - model.layers.53.mlp.gate_proj
178
+ - model.layers.44.mlp.gate_proj
179
+ - model.layers.46.mlp.gate_proj
180
+ - model.layers.67.mlp.gate_proj
181
+ - model.layers.58.mlp.gate_proj
182
+ - model.layers.56.mlp.gate_proj
183
+ - model.layers.45.mlp.gate_proj
184
+ - model.layers.50.mlp.gate_proj
185
+ - model.layers.62.mlp.gate_proj
186
+ - model.layers.64.mlp.gate_proj
187
+ - model.layers.48.mlp.gate_proj
188
+ - model.layers.66.mlp.gate_proj
189
+ - model.layers.52.mlp.gate_proj
190
+ - model.layers.40.mlp.gate_proj
191
+ - model.layers.47.mlp.gate_proj
192
+ - model.layers.43.mlp.gate_proj
193
+ - model.layers.65.mlp.gate_proj
194
+ - model.layers.61.mlp.gate_proj
195
+ - model.layers.59.mlp.gate_proj
196
+ # mlp.up_proj layers
197
+ - model.layers.69.mlp.up_proj
198
+ - model.layers.70.mlp.up_proj
199
+ - model.layers.71.mlp.up_proj
200
+ - model.layers.68.mlp.up_proj
201
+ - model.layers.67.mlp.up_proj
202
+ - model.layers.66.mlp.up_proj
203
+ - model.layers.46.mlp.up_proj
204
+ - model.layers.63.mlp.up_proj
205
+ - model.layers.72.mlp.up_proj
206
+ - model.layers.64.mlp.up_proj
207
+ - model.layers.62.mlp.up_proj
208
+ - model.layers.45.mlp.up_proj
209
+ - model.layers.65.mlp.up_proj
210
+ - model.layers.73.mlp.up_proj
211
+ - model.layers.47.mlp.up_proj
212
+ - model.layers.44.mlp.up_proj
213
+ - model.layers.49.mlp.up_proj
214
+ - model.layers.48.mlp.up_proj
215
+ - model.layers.53.mlp.up_proj
216
+ - model.layers.74.mlp.up_proj
217
+ - model.layers.75.mlp.up_proj
218
+ - model.layers.57.mlp.up_proj
219
+ - model.layers.76.mlp.up_proj
220
+ - model.layers.43.mlp.up_proj
221
+ - model.layers.42.mlp.up_proj
222
+ - model.layers.61.mlp.up_proj
223
+ - model.layers.40.mlp.up_proj
224
+ - model.layers.56.mlp.up_proj
225
+ - model.layers.60.mlp.up_proj
226
+ - model.layers.31.mlp.up_proj
227
+ - model.layers.54.mlp.up_proj
228
+ - model.layers.55.mlp.up_proj
229
+ - model.layers.32.mlp.up_proj
230
+ - model.layers.41.mlp.up_proj
231
+ - model.layers.33.mlp.up_proj
232
+ - model.layers.58.mlp.up_proj
233
+ # self_attn.k_proj layers
234
+ - model.layers.79.self_attn.k_proj
235
+ - model.layers.36.self_attn.k_proj
236
+ - model.layers.35.self_attn.k_proj
237
+ - model.layers.74.self_attn.k_proj
238
+ - model.layers.34.self_attn.k_proj
239
+ - model.layers.78.self_attn.k_proj
240
+ - model.layers.77.self_attn.k_proj
241
+ - model.layers.37.self_attn.k_proj
242
+ - model.layers.39.self_attn.k_proj
243
+ - model.layers.41.self_attn.k_proj
244
+ - model.layers.38.self_attn.k_proj
245
+ - model.layers.33.self_attn.k_proj
246
+ - model.layers.69.self_attn.k_proj
247
+ - model.layers.42.self_attn.k_proj
248
+ - model.layers.32.self_attn.k_proj
249
+ - model.layers.25.self_attn.k_proj
250
+ - model.layers.70.self_attn.k_proj
251
+ - model.layers.22.self_attn.k_proj
252
+ - model.layers.63.self_attn.k_proj
253
+ - model.layers.29.self_attn.k_proj
254
+ - model.layers.68.self_attn.k_proj
255
+ - model.layers.24.self_attn.k_proj
256
+ - model.layers.30.self_attn.k_proj
257
+ - model.layers.66.self_attn.k_proj
258
+ - model.layers.31.self_attn.k_proj
259
+ - model.layers.23.self_attn.k_proj
260
+ - model.layers.65.self_attn.k_proj
261
+ - model.layers.57.self_attn.k_proj
262
+ - model.layers.28.self_attn.k_proj
263
+ - model.layers.64.self_attn.k_proj
264
+ - model.layers.44.self_attn.k_proj
265
+ - model.layers.27.self_attn.k_proj
266
+ - model.layers.75.self_attn.k_proj
267
+ - model.layers.40.self_attn.k_proj
268
+ - model.layers.26.self_attn.k_proj
269
+ - model.layers.61.self_attn.k_proj
270
+ # self_attn.o_proj layers
271
+ - model.layers.14.self_attn.o_proj
272
+ - model.layers.39.self_attn.o_proj
273
+ - model.layers.19.self_attn.o_proj
274
+ - model.layers.16.self_attn.o_proj
275
+ - model.layers.17.self_attn.o_proj
276
+ - model.layers.15.self_attn.o_proj
277
+ - model.layers.69.self_attn.o_proj
278
+ - model.layers.12.self_attn.o_proj
279
+ - model.layers.42.self_attn.o_proj
280
+ - model.layers.23.self_attn.o_proj
281
+ - model.layers.22.self_attn.o_proj
282
+ - model.layers.29.self_attn.o_proj
283
+ - model.layers.13.self_attn.o_proj
284
+ - model.layers.46.self_attn.o_proj
285
+ - model.layers.52.self_attn.o_proj
286
+ - model.layers.26.self_attn.o_proj
287
+ - model.layers.38.self_attn.o_proj
288
+ - model.layers.41.self_attn.o_proj
289
+ - model.layers.18.self_attn.o_proj
290
+ - model.layers.49.self_attn.o_proj
291
+ - model.layers.11.self_attn.o_proj
292
+ - model.layers.28.self_attn.o_proj
293
+ - model.layers.25.self_attn.o_proj
294
+ - model.layers.47.self_attn.o_proj
295
+ - model.layers.53.self_attn.o_proj
296
+ - model.layers.27.self_attn.o_proj
297
+ - model.layers.37.self_attn.o_proj
298
+ - model.layers.20.self_attn.o_proj
299
+ - model.layers.43.self_attn.o_proj
300
+ - model.layers.44.self_attn.o_proj
301
+ - model.layers.45.self_attn.o_proj
302
+ - model.layers.30.self_attn.o_proj
303
+ - model.layers.24.self_attn.o_proj
304
+ - model.layers.21.self_attn.o_proj
305
+ - model.layers.10.self_attn.o_proj
306
+ - model.layers.3.self_attn.o_proj
307
+ # self_attn.q_proj layers
308
+ - model.layers.1.self_attn.q_proj
309
+ - model.layers.2.self_attn.q_proj
310
+ - model.layers.3.self_attn.q_proj
311
+ - model.layers.5.self_attn.q_proj
312
+ - model.layers.4.self_attn.q_proj
313
+ - model.layers.0.self_attn.q_proj
314
+ - model.layers.6.self_attn.q_proj
315
+ - model.layers.8.self_attn.q_proj
316
+ - model.layers.7.self_attn.q_proj
317
+ - model.layers.9.self_attn.q_proj
318
+ - model.layers.10.self_attn.q_proj
319
+ - model.layers.12.self_attn.q_proj
320
+ - model.layers.19.self_attn.q_proj
321
+ - model.layers.18.self_attn.q_proj
322
+ - model.layers.25.self_attn.q_proj
323
+ - model.layers.11.self_attn.q_proj
324
+ - model.layers.15.self_attn.q_proj
325
+ - model.layers.61.self_attn.q_proj
326
+ - model.layers.17.self_attn.q_proj
327
+ - model.layers.55.self_attn.q_proj
328
+ - model.layers.54.self_attn.q_proj
329
+ - model.layers.16.self_attn.q_proj
330
+ - model.layers.68.self_attn.q_proj
331
+ - model.layers.49.self_attn.q_proj
332
+ - model.layers.48.self_attn.q_proj
333
+ - model.layers.52.self_attn.q_proj
334
+ - model.layers.13.self_attn.q_proj
335
+ - model.layers.42.self_attn.q_proj
336
+ - model.layers.57.self_attn.q_proj
337
+ - model.layers.60.self_attn.q_proj
338
+ - model.layers.53.self_attn.q_proj
339
+ - model.layers.64.self_attn.q_proj
340
+ - model.layers.66.self_attn.q_proj
341
+ - model.layers.62.self_attn.q_proj
342
+ - model.layers.59.self_attn.q_proj
343
+ - model.layers.50.self_attn.q_proj
344
+ # self_attn.v_proj layers
345
+ - model.layers.15.self_attn.v_proj
346
+ - model.layers.16.self_attn.v_proj
347
+ - model.layers.23.self_attn.v_proj
348
+ - model.layers.24.self_attn.v_proj
349
+ - model.layers.25.self_attn.v_proj
350
+ - model.layers.26.self_attn.v_proj
351
+ - model.layers.27.self_attn.v_proj
352
+ - model.layers.28.self_attn.v_proj
353
+ - model.layers.29.self_attn.v_proj
354
+ - model.layers.30.self_attn.v_proj
355
+ - model.layers.31.self_attn.v_proj
356
+ - model.layers.32.self_attn.v_proj
357
+ - model.layers.33.self_attn.v_proj
358
+ - model.layers.34.self_attn.v_proj
359
+ - model.layers.35.self_attn.v_proj
360
+ - model.layers.36.self_attn.v_proj
361
+ - model.layers.37.self_attn.v_proj
362
+ - model.layers.38.self_attn.v_proj
363
+ - model.layers.39.self_attn.v_proj
364
+ - model.layers.41.self_attn.v_proj
365
+ - model.layers.42.self_attn.v_proj
366
+ - model.layers.48.self_attn.v_proj
367
+ - model.layers.53.self_attn.v_proj
368
+ - model.layers.57.self_attn.v_proj
369
+ - model.layers.58.self_attn.v_proj
370
+ - model.layers.59.self_attn.v_proj
371
+ - model.layers.61.self_attn.v_proj
372
+ - model.layers.63.self_attn.v_proj
373
+ - model.layers.64.self_attn.v_proj
374
+ - model.layers.65.self_attn.v_proj
375
+ - model.layers.66.self_attn.v_proj
376
+ - model.layers.69.self_attn.v_proj
377
+ - model.layers.74.self_attn.v_proj
378
+ - model.layers.75.self_attn.v_proj
379
+ - model.layers.76.self_attn.v_proj
380
+ - model.layers.72.self_attn.v_proj
381
+
382
+
383
+ chat_template: chatml
384
+ dataset_prepared_path: qwen2-72b-data
385
+ val_set_size: 0.01
386
+ output_dir: qwen2-72b
387
+
388
+ sequence_len: 8192 # supports up to 8192
389
+ sample_packing: true
390
+ pad_to_sequence_len: true
391
+
392
+ # adapter: lora
393
+ # lora_model_dir:
394
+ # lora_r: 32
395
+ # lora_alpha: 16
396
+ # lora_dropout: 0.05
397
+ # lora_target_linear: true
398
+ # lora_fan_in_fan_out:
399
+
400
+ wandb_project: qwen2-72b
401
+ wandb_entity:
402
+ wandb_watch:
403
+ wandb_name:
404
+ wandb_log_model:
405
+
406
+ gradient_accumulation_steps: 8
407
+ micro_batch_size: 1
408
+ num_epochs: 3
409
+ optimizer: paged_adamw_8bit
410
+ lr_scheduler: cosine
411
+ learning_rate: 1e-5
412
+
413
+ train_on_inputs: false
414
+ group_by_length: false
415
+ bf16: auto
416
+ fp16:
417
+ tf32: false
418
+
419
+ gradient_checkpointing: true
420
+ early_stopping_patience:
421
+ resume_from_checkpoint:
422
+ local_rank:
423
+ logging_steps: 1
424
+ xformers_attention:
425
+ flash_attention: true
426
+
427
+ warmup_steps: 10
428
+ evals_per_epoch: 2
429
+ eval_table_size:
430
+ eval_max_new_tokens: 128
431
+ saves_per_epoch: 4
432
+ save_total_limit: 2
433
+ debug:
434
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
435
+ weight_decay: 0.05
436
+ fsdp:
437
+ fsdp_config:
438
+ special_tokens:
439
+ pad_token: "<|endoftext|>"
440
+ eos_token: "<|im_end|>"
441
+
442
+ ```
443
+