Llama-3-8B-spectrum-25

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the yuvraj17/finetune_alpaca_1K dataset. It achieves the following results on the evaluation set:

Loss: 1.2791

Spectrum Fine-tuning:

I have used the Spectrum Fine-tuning method as described in Eric Hartford et. al 2024, which selectively targets some t% of the model layers with the highest Signal-to-Noise Ratio (SNR). By focusing on the most information-dense layers, this approach maximizes fine-tuning efficiency while minimizing compute resources.

The key goal of Spectrum Fine-tuning is: minimize the memory footprint and accelerate LLM training without sacrificing performance.

The 25% layer selection ensures minimal computational overhead for fine-tuning.

Training:

Trained on 2x A40s (48GB VRAM each) for over 1 hour using the Axolotl.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Framework versions

Axolotl 0.4.1
Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

yuvraj17
/

Llama-3-8B-spectrum-25

Llama-3-8B-spectrum-25

Spectrum Fine-tuning:

Training:

Training hyperparameters

Framework versions

Model tree for yuvraj17/Llama-3-8B-spectrum-25

Evaluation results