Spaces:
Running
Running
title: README | |
emoji: π₯ | |
colorFrom: blue | |
colorTo: pink | |
sdk: static | |
pinned: false | |
<div align="center"> | |
# π₯ Mission: Impossible Language Models π₯ | |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/> | |
</div> | |
This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024). | |
If you use our code or models, please cite our ACL paper: | |
```bibtex | |
@inproceedings{kallini-etal-2024-mission, | |
title = "Mission: Impossible Language Models", | |
author = "Kallini, Julie and | |
Papadimitriou, Isabel and | |
Futrell, Richard and | |
Mahowald, Kyle and | |
Potts, Christopher", | |
editor = "Ku, Lun-Wei and | |
Martins, Andre and | |
Srikumar, Vivek", | |
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", | |
month = aug, | |
year = "2024", | |
address = "Bangkok, Thailand", | |
publisher = "Association for Computational Linguistics", | |
url = "https://aclanthology.org/2024.acl-long.787", | |
doi = "10.18653/v1/2024.acl-long.787", | |
pages = "14691--14714", | |
} | |
``` | |
## Impossible Languages | |
Our paper includes 15 impossible languages, grouped into three language classes: | |
1. **\*Shuffle languages** involve different shuffles of tokenized | |
English sentences. | |
2. **\*Reverse langguages** involve reversals of all or part of input | |
sentences. | |
3. **\*Hop languages** perturb verb inflection with counting rules. | |
![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png) | |
## Models | |
For each language, we provide two models: | |
1. A [**standard GPT-2 Small model**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-67270160d99170620f5a27f6). | |
2. A [**GPT-2 Small model trained without positional encodings**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-no-positional-encodings-6727286b3d1650b1b374fdeb). | |
Each model is trained *from scratch* exclusively on data from | |
one impossible language. This makes a total of 30 models: | |
15 standard GPT-2 models and 15 GPT-2 models without | |
positional encodings. We separate these models out into two | |
collections below for ease when navigating models. | |
Models names match the following pattern: | |
`mission-impossible-lms/{language_name}-{model_architecture}` | |
where `language_name` is the name an impossible language from table above, | |
converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and | |
`model_architecture` is one of `gpt2` (for the standard GPT-2 architecture) | |
or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings). | |
### Model Checkpoints | |
On the main revision of each model, we provide the final | |
model artefact we trained (checkpoint 3000). We also provide | |
29 intermediate checkpoints over the course of training, | |
from checkpoint 100 to 3000 in increments of 100 steps. | |
These checkpoints can help you replicate the experiments | |
we show in the paper and are provided in each model repo as | |
separate revisions. |