A little guide to building Large Language Models in 2024

mishig 's Collections

zephyr story

fuck quadratic attention

A little guide to building Large Language Models in 2024

updated Apr 1

Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757

Upvote

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Note checkout their chat space: https://huggingface.co/spaces/01-ai/Yi-34B-Chat
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26 • 4
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31 • 59

Note checkout olmo suite: https://huggingface.co/collections/allenai/olmo-suite-65aeaae8fe5b6b2122b46778
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 31

Note checkout datatrove: https://github.com/huggingface/datatrove (freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.)
Bag of Tricks for Efficient Text Classification

Paper • 1607.01759 • Published Jul 6, 2016

Note read more: https://fasttext.cc/
Breadth-First Pipeline Parallelism

Paper • 2211.05953 • Published Nov 11, 2022

Note checkout: https://github.com/huggingface/nanotron (minimalistic large language model 3D-parallelism training)
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Paper • 2203.03466 • Published Mar 7, 2022 • 1

Note from creators of grok: https://huggingface.co/xai-org/grok-1
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Paper • 2304.03208 • Published Apr 6, 2023 • 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138

Note checkout transformers compatible mambas: https://huggingface.co/collections/state-spaces/transformers-compatible-mamba-65e7b40ab87e5297e45ae406
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 48

Note checkout https://huggingface.co/docs/trl (train transformer language models with reinforcement learning.)
Runtime error

125

🪁

Zephyr Gemma Chat

Note checkout https://github.com/huggingface/alignment-handbook (robust recipes to align language models with human and AI preferences)
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 11
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 8

Note read more: https://huggingface.co/blog/gptq-integration (Making LLMs lighter with AutoGPTQ and transformers)
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 4

Note read more: https://huggingface.co/docs/bitsandbytes (accessible large language models via k-bit quantization for PyTorch)
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19 • 54
Running

74

🐠

Qwen1.5 MoE A2.7B Chat Demo
Running on CPU Upgrade

11.9k

🏆

Open LLM Leaderboard 2

Track, rank and evaluate open LLMs and chatbots

Note checkout lighteval: https://github.com/huggingface/lighteval (lightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron)

Upvote

A little guide to building Large Language Models in 2024

Zephyr Gemma Chat

Qwen1.5 MoE A2.7B Chat Demo

Open LLM Leaderboard 2