bbunzeck commited on
Commit
b18b303
1 Parent(s): 4d3c425

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -3,4 +3,20 @@ datasets:
3
  - nilq/babylm-100M
4
  language:
5
  - en
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - nilq/babylm-100M
4
  language:
5
  - en
6
+ ---
7
+
8
+ This autoregressive model belongs to a series of rather small language models trained on the [BabyLM](https://babylm.github) data:
9
+ - the [baby_llama](https://huggingface.co/bbunzeck/baby_llama) model has few parameters and was trained on a small data set (10M tokens)
10
+ - the [**t**eenie_llama](https://huggingface.co/bbunzeck/teenie_llama) model has the same number of parameters but was trained on more **t**okens of text (100M)
11
+ - the [**w**eenie_llama](https://huggingface.co/bbunzeck/weenie_llama) model was trained on the small data set, but has more parameters/**w**eights
12
+ - the [**tw**eenie_llama](https://huggingface.co/bbunzeck/tweenie_llama) model features both -- more **t**okens (the larger data set) and more **w**eights (*viz.* parameters)
13
+
14
+
15
+ | | baby_llama | teenie_llama | weenie_llama | tweenie_llama |
16
+ |-----------------|-----------|-------------|-------------|--------------|
17
+ | Parameters | 2.97M | 2.97M | 11.44M | 11.44M |
18
+ | hidden layers | 8 | 8 | 16 | 16 |
19
+ | Attention heads | 8 | 8 | 16 | 16 |
20
+ | Embedding size | 128 | 128 | 256 | 256 |
21
+ | Context size | 128 | 128 | 256 | 256 |
22
+ | Vocab size | 16k | 16k | 16k | 16k |