--- datasets: - nilq/babylm-10M language: - en --- This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data: - the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens) - the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M) - the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights - the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters) | | baby_llama | teenie_llama | weenie_llama | tweenie_llama | |-----------------|-----------|-------------|-------------|--------------| | Parameters | 2.97M | 2.97M | 11.44M | 11.44M | | hidden layers | 8 | 8 | 16 | 16 | | Attention heads | 8 | 8 | 16 | 16 | | Embedding size | 128 | 128 | 256 | 256 | | Context size | 128 | 128 | 256 | 256 | | Vocab size | 16k | 16k | 16k | 16k |