版本資訊
使用新的噪聲較小(理論上)的數據訓練
Lora使用了更大的r(32)
取消了Dora
因為Dora的提升有限,還會大幅降低訓練和推理的效率
簡介
Riyuechang/Breeze-7B-PTT-Chat-v2所使用的,未與主模型MediaTek-Research/Breeze-7B-Instruct-v1_0合併的lora模型
設備
- Ubuntu 22.04.4 LTS
- NVIDIA GeForce RTX 3060 12G
Lora參數
r=32,
lora_alpha=32,
lora_dropout=0.1,
task_type="CAUSAL_LM",
target_modules="all-linear",
bias="none",
use_rslora=True
訓練參數
per_device_train_batch_size=28,
gradient_accumulation_steps=1,
num_train_epochs=3,
warmup_ratio=0.1,
learning_rate=2e-5,
bf16=True,
save_strategy="steps",
save_steps=1000,
save_total_limit=5,
logging_steps=10,
output_dir=log_output,
optim="paged_adamw_8bit",
gradient_checkpointing=True
結果
- loss: 0.9391
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Riyuechang/Breeze-7B-PTT-Chat-v2_lora
Base model
MediaTek-Research/Breeze-7B-Instruct-v1_0