--- license: apache-2.0 --- ``` ### model model_name_or_path: Kendamarron/Width-Up-Scaled-llm-jp-3-2.3b ### method stage: pt do_train: true finetuning_type: full enable_liger_kernel: true flash_attn: fa2 ### dataset dataset: abeja_test cutoff_len: 4096 packing: true overwrite_cache: true preprocessing_num_workers: 64 ### output output_dir: saves/llm-jp/full/cpt/ logging_steps: 1 save_steps: 500 plot_loss: true overwrite_output_dir: true ### train per_device_train_batch_size: 16 gradient_accumulation_steps: 4 learning_rate: 1.0e-4 num_train_epochs: 1.0 lr_scheduler_type: constant_with_warmup adam_beta2: 0.9 adam_beta2: 0.95 optim: adamw_bnb_8bit warmup_steps: 500 bf16: true ddp_timeout: 180000000 ### eval val_size: 1000 per_device_eval_batch_size: 2 eval_strategy: steps eval_steps: 500 ### logging report_to: wandb ```