bbunzeck
/

tweenie_llama

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bbunzeck commited on Mar 14

Commit

b18b303

•

1 Parent(s): 4d3c425

Update README.md

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -3,4 +3,20 @@ datasets:
 - nilq/babylm-100M
 language:
 - en
----

 - nilq/babylm-100M
 language:
 - en
+---
+This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
+- the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
+- the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M)
+- the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights
+- the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters)
+|                 | baby_llama | teenie_llama | weenie_llama | tweenie_llama |
+|-----------------|-----------|-------------|-------------|--------------|
+| Parameters      | 2.97M     | 2.97M       | 11.44M      | 11.44M       |
+| hidden layers   | 8         | 8           | 16          | 16           |
+| Attention heads | 8         | 8           | 16          | 16           |
+| Embedding size  | 128       | 128         | 256         | 256          |
+| Context size    | 128       | 128         | 256         | 256          |
+| Vocab size      | 16k       | 16k         | 16k         | 16k          |