README / README.md
juliekallini's picture
Update README.md
61983a8 verified
---
title: README
emoji: πŸ”₯
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
---
<div align="center">
# πŸ’₯ Mission: Impossible Language Models πŸ’₯
<img src="https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/GfEHK3X6bD_5u4etaJY8c.png" alt="drawing" width="400"/>
</div>
This page hosts the models trained and used in the paper "[Mission: Impossible Language Models](https://arxiv.org/abs/2401.06416)" (Kallini et al., 2024).
If you use our code or models, please cite our ACL paper:
```bibtex
@inproceedings{kallini-etal-2024-mission,
title = "Mission: Impossible Language Models",
author = "Kallini, Julie and
Papadimitriou, Isabel and
Futrell, Richard and
Mahowald, Kyle and
Potts, Christopher",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.787",
doi = "10.18653/v1/2024.acl-long.787",
pages = "14691--14714",
}
```
## Impossible Languages
Our paper includes 15 impossible languages, grouped into three language classes:
1. **\*Shuffle languages** involve different shuffles of tokenized
English sentences.
2. **\*Reverse langguages** involve reversals of all or part of input
sentences.
3. **\*Hop languages** perturb verb inflection with counting rules.
![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)
## Models
For each language, we provide two models:
1. A [**standard GPT-2 Small model**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-67270160d99170620f5a27f6).
2. A [**GPT-2 Small model trained without positional encodings**](https://huggingface.co/collections/mission-impossible-lms/gpt-2-models-no-positional-encodings-6727286b3d1650b1b374fdeb).
Each model is trained *from scratch* exclusively on data from
one impossible language. This makes a total of 30 models:
15 standard GPT-2 models and 15 GPT-2 models without
positional encodings. We separate these models out into two
collections below for ease when navigating models.
Models names match the following pattern:
`mission-impossible-lms/{language_name}-{model_architecture}`
where `language_name` is the name an impossible language from table above,
converted from PascalCase to kebab-case (i.e. NoShuffle -> `no-shuffle`), and
`model_architecture` is one of `gpt2` (for the standard GPT-2 architecture)
or `gpt2-no-pos` (for the GPT-2 architecture without positional encodings).
### Model Checkpoints
On the main revision of each model, we provide the final
model artefact we trained (checkpoint 3000). We also provide
29 intermediate checkpoints over the course of training,
from checkpoint 100 to 3000 in increments of 100 steps.
These checkpoints can help you replicate the experiments
we show in the paper and are provided in each model repo as
separate revisions.