Edit model card

Llama-3-8B-spectrum-25

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the yuvraj17/finetune_alpaca_1K dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2791

Spectrum Fine-tuning:

I have used the Spectrum Fine-tuning method as described in Eric Hartford et. al 2024, which selectively targets some t% of the model layers with the highest Signal-to-Noise Ratio (SNR). By focusing on the most information-dense layers, this approach maximizes fine-tuning efficiency while minimizing compute resources.

The key goal of Spectrum Fine-tuning is: minimize the memory footprint and accelerate LLM training without sacrificing performance.

The 25% layer selection ensures minimal computational overhead for fine-tuning.

Training:

  • Trained on 2x A40s (48GB VRAM each) for over 1 hour using the Axolotl.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2

Train/loss Curve Image

eval/loss Curve Image

Framework versions

  • Axolotl 0.4.1
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
37
GGUF
Model size
8.03B params
Architecture
llama

3-bit

4-bit

5-bit

16-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for yuvraj17/Llama-3-8B-spectrum-25-GGUF

Quantized
this model