bbunzeck
/

tweenie_llama

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tweenie_llama / README.md

bbunzeck's picture

Update README.md

b18b303 verified 7 months ago

|

history blame contribute delete

No virus

1.41 kB

	---
	datasets:
	- nilq/babylm-100M
	language:
	- en
	---

	This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
	- the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
	- the [teenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more tokens of text (100M)
	- the [weenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/weights
	- the [tweenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more tokens (the larger data set) and more weights (viz. parameters)


	\| \| baby_llama \| teenie_llama \| weenie_llama \| tweenie_llama \|
	\|-----------------\|-----------\|-------------\|-------------\|--------------\|
	\| Parameters \| 2.97M \| 2.97M \| 11.44M \| 11.44M \|
	\| hidden layers \| 8 \| 8 \| 16 \| 16 \|
	\| Attention heads \| 8 \| 8 \| 16 \| 16 \|
	\| Embedding size \| 128 \| 128 \| 256 \| 256 \|
	\| Context size \| 128 \| 128 \| 256 \| 256 \|
	\| Vocab size \| 16k \| 16k \| 16k \| 16k \|