Boomer-4b: A Leap in Language Model Innovation 🚀

Introduction 🎉

In the spirit of open innovation, we're thrilled to share our pioneering work on pretraining with a custom architecture and dataset. boomer-4b, our 3.51 billion parameter marvel, represents a significant stride in the AI field. Crafted meticulously from custom synthetic data generated with textbook style. This model not only exemplifies our commitment to advancing the boundaries of AI through creative architecture but also through thoughtful data amalgamation.

Quick Start 🚀

Jump straight into using boomer-4b:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("budecosystem/boomer-4b")
model = AutoModelForCausalLM.from_pretrained("budecosystem/boomer-4b", torch_dtype=torch.bfloat16)
inputs = tokenizer("Newton's second law", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

Model Insights 🔍

Architecture Highlights:

Layers: 24
Heads: 32
Model Dimension: 2048
Vocab Size: 32000
Sequence Length: 2048
Intermediate Size: 11008

Training Configuration 📊

The training was finely tuned with the following hyperparameters:

Per Device Train Batch Size: 6
Gradient Accumulation Steps: 1
Learning Rate: 2e-5
Optimizer: AdamW
Beta Values: 0.9, 0.99
Mixed Precision (FP16): True

Evaluations and Comparisons 🏅

boomer-4b has been rigorously evaluated across several benchmarks:

Model	MMLU	ARC	HellaSwag	GSM8K	Winogrande	MATH	MathQA	DROP	LogiQA
boomer-4b	55.59	58.53	74.70	47.76	72.22	4.00	35.98	0.74	31.80
GeneZC/MiniChat-3B	39.17	44.03	67.19	10.54	65.27	-	-	-	-
openlm-research/open_11ama_3b_v2	27.12	44.03	71.6	0.91	67.01	-	-	-	-
microsoft/phi-2	58.11	61.09	75.11	54.81	74.35	-	-	-	-
TinyLlama/TinyLlama-1.1B-intermediate	26.04	33.87	60.31	1.44	59.51	-	-	-	-

Why boomer-4b? ✨

boomer-4b's remarkable performance across a variety of benchmarks not only showcases its robustness and versatility but also highlights its superiority in handling complex reasoning and understanding tasks. It stands as a continuation of our pursuit of excellence in AI, building on the foundation laid by boomer 1b.

Limitations of boomer-4b

Despite its impressive achievements, boomer-4b encounters challenges in areas requiring intricate mathematical problem-solving and sophisticated logical reasoning, as reflected in its subdued performance in MATH and LogiQA evaluations. This variability in task performance suggests limitations in its capacity to uniformly apply and adapt its knowledge base across a spectrum of reasoning and synthesis challenges, pointing to areas for further refinement and enhancement.

Acknowledgments 🙏

A special thanks to the open-source community and the researchers who paved the way for innovations like boomer. Our team's dedication to curating the dataset and fine-tuning the model has been instrumental in achieving this milestone.

Dive into the future of AI with boomer-4b and explore its capabilities in pushing the boundaries of what's possible in language understanding and beyond.