From zero to GPT-hero

alvarobartt 's Collections

Studio Ghibli Diffusion

Critique Models (CM) on the 🤗 Hub

About ORPO

AIF Datasets (with distilabel)

Apple MLX-compatible 7B LLMs on the 🤗 Hub

NER in Spanish

🇪🇸 Datasets in Spanish for LLM Evaluation

From zero to GPT-hero

Papers I have / will read in 2024

updated Sep 2

Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch

Upvote

Neural Machine Translation of Rare Words with Subword Units

Paper • 1508.07909 • Published Aug 31, 2015 • 4

Note Useful to have more insights about the tokenizer trained and used for GPT-2, which is a modified BPE as defined in this paper. Additionally, it's implemented in `tiktoken` at https://github.com/openai/tiktoken
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 45

Note Attention is explained after being introduced at https://arxiv.org/abs/1409.0473, this paper proposed an Encoder-Decoder architecture, the Transformer. The whole architecture is interesting, but we'll transition into Decoder-only architectures for GPT.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 15

Note Is interesting even though not comparable to GPT-2, as it's build using the encoder blocks and it's not auto-regressive in nature, but adds context on both sides of a word to achieve better results. But nice to read before GPT-2 to understand the differences and why it's been a relevant architecture.
Generating Wikipedia by Summarizing Long Sequences

Paper • 1801.10198 • Published Jan 30, 2018 • 3

Note Introduces the concept of Decoder-only architectures, which is later on adopted by GPT.
openai-community/gpt2

Text Generation • Updated Feb 19 • 17.7M • • 2.38k

Note * GPT: "Improving Language Understanding by Generative Pre-Training" at https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf * GPT-2: "Language Models are Unsupervised Multitask Learners" at https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
openai-community/gpt2-medium

Text Generation • Updated Feb 19 • 561k • 157
openai-community/gpt2-large

Text Generation • Updated Feb 19 • 1.21M • 275
openai-community/gpt2-xl

Text Generation • Updated Feb 19 • 298k • 312

Upvote