Text Generation
Transformers
PyTorch
RefinedWeb
sft
custom_code
text-generation-inference
andreaskoepf commited on
Commit
e43fda3
1 Parent(s): 35b23e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md CHANGED
@@ -1,3 +1,77 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ Supervised fine-tuning of falcon40b with a mix of OASST top-2 threads and synthetic instruction datasets. Exported at end of 2nd epoch.
6
+
7
+
8
+ - base model: [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
9
+ - wandb (internal): https://wandb.ai/open-assistant/supervised-finetuning/runs/feplc450
10
+ - checkpoint: 1226 steps
11
+
12
+
13
+ Model:
14
+ ```
15
+ falcon-40b:
16
+ dtype: bf16
17
+ log_dir: "falcon_log_40b"
18
+ learning_rate: 1e-5
19
+ model_name: "tiiuae/falcon-40b"
20
+ deepspeed_config: configs/zero3_config_falcon.json
21
+ output_dir: falcon
22
+ weight_decay: 0.0
23
+ max_length: 2048
24
+ warmup_steps: 20
25
+ gradient_checkpointing: true
26
+ gradient_accumulation_steps: 1
27
+ per_device_train_batch_size: 18
28
+ per_device_eval_batch_size: 10
29
+ eval_steps: 120
30
+ #save_steps: 80
31
+ num_train_epochs: 8
32
+ save_total_limit: 4
33
+ use_flash_attention: false
34
+ residual_dropout: 0.3
35
+ residual_dropout_lima: true
36
+ sort_by_length: false
37
+ save_strategy: steps
38
+ ```
39
+
40
+
41
+ Dataset:
42
+ ```
43
+ sft9-stage2:
44
+ # oasst_export: 100.00% (29899)
45
+ # vicuna: 50.00% (16963)
46
+ # code_alpaca: 50.00% (9510)
47
+ # oa_wiki_qa_bart_10000row: 100.00% (9434)
48
+ # grade_school_math_instructions: 100.00% (8351)
49
+ # dolly15k: 100.00% (14250)
50
+
51
+ save_strategy: steps # epoch seems not to work, gets stuck with DS 0.9.1
52
+ save_steps: 613
53
+ use_custom_sampler: true
54
+ datasets:
55
+ - oasst_export:
56
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
57
+ input_file_path: 2023-06-02_oasst_all_labels.jsonl.gz
58
+ val_split: 0.05
59
+ top_k: 2
60
+ - vicuna:
61
+ fraction: 0.5
62
+ val_split: 0.025
63
+ max_val_set: 250
64
+ - code_alpaca:
65
+ fraction: 0.5
66
+ val_split: 0.05
67
+ max_val_set: 250
68
+ - oa_wiki_qa_bart_10000row:
69
+ val_split: 0.05
70
+ max_val_set: 250
71
+ - grade_school_math_instructions:
72
+ val_split: 0.05
73
+ - dolly15k:
74
+ val_split: 0.05
75
+ max_val_set: 300
76
+
77
+ ```