juliekallini commited on
Commit
08ac7d4
1 Parent(s): 9f05277

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for DeterministicShuffle(s=84) GPT-2
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This is one model in a collection of models trained on the impossible
12
+ languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).
13
+
14
+ This model is a GPT-2 Small model trained from scratch on the *DeterministicShuffle(s=84)*
15
+ language. We include a total of 30 checkpoints over the course of
16
+ model training, from step 100 to 3000 in increments of 100 steps.
17
+ The main branch contains the final checkpoint (3000), and the other
18
+ checkpoints are accessible as revisions.
19
+
20
+ ![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)
21
+
22
+ ## Model Details
23
+
24
+ - **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
25
+ - **Model type:** Causal Language Model
26
+ - **Language(s) (NLP):** English
27
+ - **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models
28
+ - **Paper:** https://arxiv.org/pdf/2401.06416
29
+
30
+ ## Uses
31
+
32
+ This artefact is solely intended for the study of language learning
33
+ and acquisition in computational models. It should not be
34
+ used in any production setting.
35
+
36
+ ## How to Get Started with the Model
37
+
38
+ Use the code below to get started with the model.
39
+
40
+ ```python
41
+ from transformers import GPT2LMHeadModel, GPT2Tokenizer
42
+ import torch
43
+
44
+ # Load model and tokenizer
45
+ model_id = "mission-impossible-lms/deterministic-shuffle-s84-gpt2"
46
+ model = GPT2LMHeadModel.from_pretrained(model_id)
47
+ tokenizer = GPT2Tokenizer.from_pretrained(model_id)
48
+
49
+ # Set up the prompt and encode it
50
+ prompt = "He clean"
51
+ inputs = tokenizer(prompt, return_tensors="pt")
52
+
53
+ # Generate text
54
+ output = model.generate(inputs.input_ids, max_length=20)
55
+
56
+ # Decode and print the generated text
57
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
58
+ print(generated_text)
59
+ ```
60
+
61
+ By default, the `main` branch of this model repo loads the
62
+ last model checkpoint (3000). To access the other checkpoints,
63
+ use the `revision` argument:
64
+
65
+ ```
66
+ model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
67
+ ```
68
+ This loads the model at checkpoint 500.
69
+
70
+ ## Training Details
71
+
72
+ ### Training Data
73
+
74
+ This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/).
75
+ Before training, we first transform the dataset into
76
+ the corresponding impossible language, as described in
77
+ our paper.
78
+
79
+ ### Training Procedure
80
+
81
+ This model was trained for 3,000 gradient steps with
82
+ a batch size of 2^19 tokens. We train with a learning
83
+ rate that linearly warms up from 0 to 6e-4 over 300 steps.
84
+
85
+ ## Environmental Impact
86
+
87
+ - **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
88
+ - **Hours used:** ~24 hours.
89
+
90
+ ## Citation
91
+
92
+ ```bibtex
93
+ @inproceedings{kallini-etal-2024-mission,
94
+ title = "Mission: Impossible Language Models",
95
+ author = "Kallini, Julie and
96
+ Papadimitriou, Isabel and
97
+ Futrell, Richard and
98
+ Mahowald, Kyle and
99
+ Potts, Christopher",
100
+ editor = "Ku, Lun-Wei and
101
+ Martins, Andre and
102
+ Srikumar, Vivek",
103
+ booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
104
+ month = aug,
105
+ year = "2024",
106
+ address = "Bangkok, Thailand",
107
+ publisher = "Association for Computational Linguistics",
108
+ url = "https://aclanthology.org/2024.acl-long.787",
109
+ doi = "10.18653/v1/2024.acl-long.787",
110
+ pages = "14691--14714",
111
+ }
112
+ ```
113
+
114
+ ## Model Card Authors
115
+
116
+ Julie Kallini
117
+
118
+ ## Model Card Contact
119
+
120
+ kallini@stanford.edu