README.md · lorinma/ZjuCamaXBaichuan7B at main

metadata

datasets:
  - lorinma/IE_Sharegpt_zh
language:
  - zh
pipeline_tag: text-generation

An LLM for Chinese Information Extraction.

基于Baichuan-7B，使用8张A800进行了全参数SFT。目的是使用一个强基座模型复现zju cama

对于SFT的数据进行了扩充：

并没有跑Eval，欢迎提供！

训练用的Codebase是来自于shibing624大佬

使用的Bash如下

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node 8 ../supervised_finetuning.py \
    --model_type baichuan \
    --model_name_or_path /data/llm/models/Pretrained/Baichuan-7B/ \
    --train_file_dir ../data/finetune/1124_IELLM/ \
    --per_device_train_batch_size 8 \
    --do_train \
    --use_peft False \
    --num_train_epochs 3 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.03 \
    --weight_decay 0. \
    --fp16 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy epoch \
    --save_total_limit 5 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 8 \
    --output_dir ../results/20231124_IELLM \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --torch_dtype float16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True \
    --cache_dir ./cache \
    --model_max_length 2048 \
    --deepspeed ../deepspeed_zero_stage2_config.json \
    --template_name baichuan \
    --flash_attn

***** train metrics *****
  epoch                    =                3.0
  train_loss               =             0.1012
  train_runtime            = 1 day, 14:16:59.20
  train_samples            =             376031
  train_samples_per_second =              8.185
  train_steps_per_second   =              0.128

测试结果：