Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
v0.2 version with better improved dolphin based dataset but only 150K for testing instead of the full 850K. Doesn't seem to work that well so I will need to add the rest of the dataset.
We are happy for anyone to try it out and give some feedback.
Training:
- 4096 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
- Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co/datasets/cognitivecomputations/dolphin
- Training duration is around 1 day on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford.
Instruct format:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Quants:
Axolotl Config:
base_model: /home/owen/models/Meta-Llama-3-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
train_on_inputs: false
group_by_length: false
load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 4096
bf16: true
fp16: false
tf32: false
flash_attention: true
# Data
datasets:
- path: /home/owen/datasets/cleaned-dolphin201-sharegpt2-uuid-improved.jsonl
type:
field_instruction: input
field_output: output
format: "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
no_input_format: "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
warmup_steps: 10
dataset_prepared_path: ./last_run_prepared
# Iterations
num_epochs: 1
saves_per_epoch: 4
# Evaluation
val_set_size: 0.01
eval_table_size:
eval_table_max_new_tokens:
eval_sample_packing: false
evals_per_epoch: 4
# LoRA
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
save_safetensors: true
# Sampling
sample_packing: true
pad_to_sequence_len: true
# Batching
gradient_accumulation_steps: 32
micro_batch_size: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
# wandb
wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
wandb_project: llama-3-8b-instruct-dolphin-q
wandb_entity: # A wandb Team name if using a Team
wandb_watch:
wandb_name: 64-128-4096-1ep-v0.2
wandb_run_id: # Set the ID of your wandb run
wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training
# Optimizer
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
# Misc
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
debug:
deepspeed: /home/owen/axolotl/deepspeed_configs/zero3_bf16.json
weight_decay: 0.1
special_tokens:
pad_token: <|end_of_text|>
- Downloads last month
- 313
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.