init adapters

Browse files

Files changed (12) hide show

README.md +74 -0
adapter_config.json +23 -0
adapter_model.bin +3 -0
all_results.json +14 -0
eval_results.json +9 -0
info.txt +2 -0
special_tokens_map.json +17 -0
tokenizer.json +0 -0
tokenizer_config.json +7 -0
train_results.json +8 -0
trainer_state.json +0 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+license: apache-2.0
+base_model: tiiuae/falcon-7b
+tags:
+- generated_from_trainer
+datasets:
+- yhavinga/mc4_nl_cleaned
+model-index:
+- name: tiny-3e-4lr+1152tbs+1ep+0.1wd
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# tiny-3e-4lr+1152tbs+1ep+0.1wd
+This model is a fine-tuned version of [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) on the yhavinga/mc4_nl_cleaned micro dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.0928
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0003
+- train_batch_size: 12
+- eval_batch_size: 24
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 16
+- gradient_accumulation_steps: 6
+- total_train_batch_size: 1152
+- total_eval_batch_size: 384
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 2.6094        | 0.1   | 170  | 2.5980          |
+| 2.4503        | 0.19  | 340  | 2.4405          |
+| 2.3243        | 0.29  | 510  | 2.3428          |
+| 2.2822        | 0.39  | 680  | 2.2752          |
+| 2.238         | 0.49  | 850  | 2.2248          |
+| 2.2015        | 0.58  | 1020 | 2.1865          |
+| 2.1678        | 0.68  | 1190 | 2.1560          |
+| 2.1301        | 0.78  | 1360 | 2.1312          |
+| 2.1161        | 0.88  | 1530 | 2.1112          |
+| 2.0997        | 0.97  | 1700 | 2.0928          |
+### Framework versions
+- Transformers 4.31.0.dev0
+- Pytorch 2.0.1+cu117
+- Datasets 2.13.1
+- Tokenizers 0.13.3

adapter_config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "auto_mapping": null,
+  "base_model_name_or_path": "tiiuae/falcon-7b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense",
+    "dense_h_to_4h",
+    "dense_4h_to_h"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ac246f63bdeeb3c9050e5a778aae35fbdba1bad7081e19a52946f2c3146c1453
+size 261185933

all_results.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+    "epoch": 1.0,
+    "eval_loss": 2.0928452014923096,
+    "eval_runtime": 2111.0546,
+    "eval_samples": 105484,
+    "eval_samples_per_second": 49.967,
+    "eval_steps_per_second": 0.13,
+    "perplexity": 8.107951132441189,
+    "train_loss": 0.05167265042312105,
+    "train_runtime": 3158.899,
+    "train_samples": 2008858,
+    "train_samples_per_second": 635.936,
+    "train_steps_per_second": 0.552
+}

eval_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 1.0,
+    "eval_loss": 2.0928452014923096,
+    "eval_runtime": 2111.0546,
+    "eval_samples": 105484,
+    "eval_samples_per_second": 49.967,
+    "eval_steps_per_second": 0.13,
+    "perplexity": 8.107951132441189
+}

info.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ CMD: -n 4 -g 4 -t 80 -o falcon-7b-ft-mc4_nl_cleaned_tiny/tiny-3e-4lr+1152tbs+1ep+0.1wd -p falcon-7b-ft-mc4_nl_cleaned -e --preprocessed_dataset /dodrio/scratch/projects/2023_005/llm-finetuning/preprocessed_datasets/mc4_nl_cleaned--tiny-falcon-40b-2048 --learning_rate 3e-4 --model_name_or_path tiiuae/falcon-7b --per_device_train_batch_size 12 --per_device_eval_batch_size 24 --gradient_accumulation_steps 6 --eval_accumulation_steps 6 --save_total_limit 3 --eval_steps 170 --save_steps 170 --logging_first_step --weight_decay 0.1 --lr_scheduler_type cosine --early_stopping_patience 5 --warmup_ratio 0.03 --deepspeed ds_config_zero2.json --report_to none
2	+

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "additional_special_tokens": [
+    ">>TITLE<<",
+    ">>ABSTRACT<<",
+    ">>INTRODUCTION<<",
+    ">>SUMMARY<<",
+    ">>COMMENT<<",
+    ">>ANSWER<<",
+    ">>QUESTION<<",
+    ">>DOMAIN<<",
+    ">>PREFIX<<",
+    ">>SUFFIX<<",
+    ">>MIDDLE<<"
+  ],
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "add_prefix_space": false,
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 2048,
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 1.0,
+    "train_loss": 0.05167265042312105,
+    "train_runtime": 3158.899,
+    "train_samples": 2008858,
+    "train_samples_per_second": 635.936,
+    "train_steps_per_second": 0.552
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46136d04b167007cede48d42d509fef829949160a7f2582a890f527d43f974f2
+size 5627