tuanna08go commited on
Commit
2a31a9f
·
verified ·
1 Parent(s): 11a5ca6

End of training

Browse files
Files changed (2) hide show
  1. README.md +11 -18
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ flash_attention: false
40
  fp16: null
41
  fsdp: null
42
  fsdp_config: null
43
- gradient_accumulation_steps: 16
44
  gradient_checkpointing: false
45
  group_by_length: false
46
  hub_model_id: tuanna08go/3f2636b7-6e5d-449d-a20a-40714d8d2c2f
@@ -51,7 +51,7 @@ learning_rate: 0.0001
51
  load_in_4bit: false
52
  load_in_8bit: false
53
  local_rank: null
54
- logging_steps: 10
55
  lora_alpha: 16
56
  lora_dropout: 0.05
57
  lora_fan_in_fan_out: null
@@ -59,8 +59,8 @@ lora_model_dir: null
59
  lora_r: 8
60
  lora_target_linear: true
61
  lr_scheduler: cosine
62
- max_steps: 50
63
- micro_batch_size: 8
64
  mlflow_experiment_name: /tmp/5178402b4c4ddad7_train_data.json
65
  model_type: AutoModelForCausalLM
66
  num_epochs: 1
@@ -84,7 +84,7 @@ wandb_name: 3f2636b7-6e5d-449d-a20a-40714d8d2c2f
84
  wandb_project: Gradients-On-Demand
85
  wandb_run: your_name
86
  wandb_runid: 3f2636b7-6e5d-449d-a20a-40714d8d2c2f
87
- warmup_steps: 2
88
  weight_decay: 0.0
89
  xformers_attention: null
90
 
@@ -95,8 +95,6 @@ xformers_attention: null
95
  # 3f2636b7-6e5d-449d-a20a-40714d8d2c2f
96
 
97
  This model is a fine-tuned version of [unsloth/Qwen2.5-Math-7B-Instruct](https://huggingface.co/unsloth/Qwen2.5-Math-7B-Instruct) on the None dataset.
98
- It achieves the following results on the evaluation set:
99
- - Loss: nan
100
 
101
  ## Model description
102
 
@@ -116,26 +114,21 @@ More information needed
116
 
117
  The following hyperparameters were used during training:
118
  - learning_rate: 0.0001
119
- - train_batch_size: 8
120
- - eval_batch_size: 8
121
  - seed: 42
122
- - gradient_accumulation_steps: 16
123
- - total_train_batch_size: 128
124
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
125
  - lr_scheduler_type: cosine
126
  - lr_scheduler_warmup_steps: 2
127
- - training_steps: 50
128
 
129
  ### Training results
130
 
131
  | Training Loss | Epoch | Step | Validation Loss |
132
  |:-------------:|:------:|:----:|:---------------:|
133
- | No log | 0.0010 | 1 | nan |
134
- | 0.0 | 0.0104 | 10 | nan |
135
- | 0.0 | 0.0209 | 20 | nan |
136
- | 0.0 | 0.0313 | 30 | nan |
137
- | 0.0 | 0.0417 | 40 | nan |
138
- | 0.0 | 0.0521 | 50 | nan |
139
 
140
 
141
  ### Framework versions
 
40
  fp16: null
41
  fsdp: null
42
  fsdp_config: null
43
+ gradient_accumulation_steps: 4
44
  gradient_checkpointing: false
45
  group_by_length: false
46
  hub_model_id: tuanna08go/3f2636b7-6e5d-449d-a20a-40714d8d2c2f
 
51
  load_in_4bit: false
52
  load_in_8bit: false
53
  local_rank: null
54
+ logging_steps: 5
55
  lora_alpha: 16
56
  lora_dropout: 0.05
57
  lora_fan_in_fan_out: null
 
59
  lora_r: 8
60
  lora_target_linear: true
61
  lr_scheduler: cosine
62
+ max_steps: 1
63
+ micro_batch_size: 2
64
  mlflow_experiment_name: /tmp/5178402b4c4ddad7_train_data.json
65
  model_type: AutoModelForCausalLM
66
  num_epochs: 1
 
84
  wandb_project: Gradients-On-Demand
85
  wandb_run: your_name
86
  wandb_runid: 3f2636b7-6e5d-449d-a20a-40714d8d2c2f
87
+ warmup_steps: 1
88
  weight_decay: 0.0
89
  xformers_attention: null
90
 
 
95
  # 3f2636b7-6e5d-449d-a20a-40714d8d2c2f
96
 
97
  This model is a fine-tuned version of [unsloth/Qwen2.5-Math-7B-Instruct](https://huggingface.co/unsloth/Qwen2.5-Math-7B-Instruct) on the None dataset.
 
 
98
 
99
  ## Model description
100
 
 
114
 
115
  The following hyperparameters were used during training:
116
  - learning_rate: 0.0001
117
+ - train_batch_size: 2
118
+ - eval_batch_size: 2
119
  - seed: 42
120
+ - gradient_accumulation_steps: 4
121
+ - total_train_batch_size: 8
122
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
123
  - lr_scheduler_type: cosine
124
  - lr_scheduler_warmup_steps: 2
125
+ - training_steps: 1
126
 
127
  ### Training results
128
 
129
  | Training Loss | Epoch | Step | Validation Loss |
130
  |:-------------:|:------:|:----:|:---------------:|
131
+ | No log | 0.0001 | 1 | nan |
 
 
 
 
 
132
 
133
 
134
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:093776466e64765833df9d4b0e62804cc050ded4d4896ef74ced8194cdedfc08
3
  size 80881450
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:620a154f71bcb39e72998dc2d4e57018c07678735cabee12cbe98a7362ff2c4e
3
  size 80881450