|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- nlp |
|
- llm |
|
--- |
|
# K2 - Deciphering Llama 2 70B |
|
|
|
K2 is a fully transparent large language model on par with Llama 2 - 70B. |
|
|
|
## Evaluations |
|
<center><img src="eval_table_temp.png" alt="eval table"/></center> |
|
|
|
## Datasets and Mix |
|
|
|
The following data mix was used to train K2 and achieve results in line with Llama 2 70B. The full data sequence will be available soon. |
|
|
|
| Dataset | Starting Tokens | Multiplier | Total Tokens |% of Total | |
|
| ----------- | ----------- | ----------- | ----------- | ----------- | |
|
| dm-math | 4.33B | 3x | 13B | 1% | |
|
| pubmed-abstracts | 4.77B | 3x | 14.3B | 1.1% | |
|
| uspto | 4.77B | 3x | 14.3B | 1.1% | |
|
| pubmed-central | 26B | 1x | 26B | 2% | |
|
| redpajama.arxiv | 27.3B | 1x | 27.3B | 2.1% | |
|
| starcoder.spm | 67.6B | 0.5x | 33.8B | 2.6% | |
|
| starcoder.fim | 67.6B | 0.5x | 33.8B | 2.6% | |
|
| redpajama.stackexchange | 61.1B | 1x | 61.1B | 4.7% | |
|
| starcoder | 132.6B | 0.5x | 66.3B | 5.1% | |
|
| pile-of-law | 76.7B | 1x | 76.7B | 5.9% | |
|
| redpajama.book | 80.6B | 1x | 80.6B | 6.2% | |
|
| s2orc | 107.9B | 1x | 107.9B | 8.3% | |
|
| redpajama.wikipedia | 22.1B | 6x | 132.6B | 10.2% | |
|
| refinedweb | 612.3B | 1x | 612.3B | 47.1% | |
|
| Totals | - | - | 1.3T | 100% | |
|
|
|
## First 10 Checkpoints |
|
| Checkpoints | | |
|
| ----------- | ----------- | |
|
| [Checkpoint 360] (https://huggingface.co/LLM360/K2/tree/ckpt_360) | [Checkpoint 355] (https://huggingface.co/LLM360/K2/tree/ckpt_355) | |
|
| [Checkpoint 359] (https://huggingface.co/LLM360/K2/tree/ckpt_359) | [Checkpoint 354] (https://huggingface.co/LLM360/K2/tree/ckpt_354) | |
|
| [Checkpoint 358] (https://huggingface.co/LLM360/K2/tree/ckpt_358) | [Checkpoint 353] (https://huggingface.co/LLM360/K2/tree/ckpt_353) | |
|
| [Checkpoint 357] (https://huggingface.co/LLM360/K2/tree/ckpt_357) | [Checkpoint 352] (https://huggingface.co/LLM360/K2/tree/ckpt_352) | |
|
| [Checkpoint 356] (https://huggingface.co/LLM360/K2/tree/ckpt_356) | [Checkpoint 351] (https://huggingface.co/LLM360/K2/tree/ckpt_351)] | |
|
|
|
|
|
## Additional Artifacts |
|
We are working on release caliber artifacts for the dataset, code, and analysis which will be released over the next few weeks. |
|
|
|
|
|
## Model Description |
|
|
|
- **Model type:** Language model with the same architecture as LLaMA. |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache 2.0 |
|
- **Resources for more information:** |
|
- [Training Code] |
|
- [Data Preparation] |
|
- [Metrics] |
|
- [Fully processed Amber pretraining data] |
|
|
|
|
|
## About LLM360 |
|
LLM360 is an initiative for comprehensive and fully open-sourced LLMs, |
|
where all training details, model checkpoints, intermediate results, and |
|
additional analyses are made available to the community. Our goal is to advance |
|
the field by inviting the community to deepen the understanding of LLMs |
|
together. As the first step of the project LLM360, we release all intermediate |
|
model checkpoints, our fully-prepared pre-training dataset, all source code and |
|
configurations, and training details. We are |
|
committed to continually pushing the boundaries of LLMs through this open-source |
|
effort. |
|
|
|
[Visit us](https://www.llm360.ai/) |
|
|