|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
--- |
|
|
|
# llama-161M |
|
|
|
Trained on 100B tokens. |
|
- 1e-3 LR |
|
- 0.1 wd |
|
- WSD scheduler with 10% decay |
|
- 80% code, 10% NL, 10% instruction data |
|
- Dataset decontaminated against popular benchmarks following [bigcode](https://github.com/bigcode-project/bigcode-dataset/tree/main/decontamination) |
|
- 8x3090s 110~ hours |
|
|
|
|
|
This is a *base* pretrained model and requires further fine tuning to be useful. |
|
|
|
## Model Details |
|
|
|
| [openai/openai_humaneval](https://huggingface.co/datasets/openai/openai_humaneval) (greedy) | [mbpp](https://huggingface.co/datasets/google-research-datasets/mbpp) (greedy) | |
|
| :------------------ | :------------- | |
|
| 9.2% | 9.8% | |