TheBeagle-v2beta-32B-MGS

This model is an experimental version of our latest innovation: MGS. Its up to you to figure out what does it means, but its very explicit. We didn't applied our known UNA algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.

MGS

MGS stands for... Many-Geeks-Searching... and thats it. Hint: 1+1 is 2, and 1+1 is not 3

We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.

Dataset

Used here the first decent (corpora & size) dataset on the hub: Magpie-Align/Magpie-Pro-300K-Filtered Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.

It achieves the following results on the evaluation set:

  • Loss: 0.5378 (1 Epoch), outperforming the baseline model.

Quants

All versions available

... being uploaded ...

Licensing terms:

Quants versions of this model must ONLY be distributed from the author repository, submit a commit/PR and be credited for it

Training

Built with Axolotl

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 25
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
9.8642 0.0012 1 0.7195
2.077 0.0507 42 0.6161
1.0325 0.1014 84 0.6093
0.8945 0.1520 126 0.5962
0.8532 0.2027 168 0.5869
0.8185 0.2534 210 0.5805
0.81 0.3041 252 0.5719
0.7901 0.3548 294 0.5663
0.7766 0.4054 336 0.5618
0.7687 0.4561 378 0.5590
0.7443 0.5068 420 0.5564
0.7494 0.5575 462 0.5525
0.7787 0.6081 504 0.5485
0.7381 0.6588 546 0.5466
0.7359 0.7095 588 0.5444
0.7447 0.7602 630 0.5435
0.7378 0.8109 672 0.5415
0.7302 0.8615 714 0.5398
0.7476 0.9122 756 0.5391
0.715 0.9629 798 0.5378

Leaderboard Evaluation:

We'll see them soon, keep tuned :)

Thanks

  • Qwen Team for their outstanding model
  • MagPie Team for contributing plenty of datasets
  • Cybertron Cloud Compute
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for waldie/TheBeagle-v2beta-32B-MGS-4bpw-h6-exl2

Base model

Qwen/Qwen2.5-32B
Quantized
(3)
this model

Dataset used to train waldie/TheBeagle-v2beta-32B-MGS-4bpw-h6-exl2