abacaj
/

llama-161M-100B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama-161M-100B / README.md

abacaj's picture

Update README.md

a513c8c verified 5 months ago

|

691 Bytes

	---
	library_name: transformers
	license: apache-2.0
	---

	# llama-161M

	Trained on 100B tokens.
	- 1e-3 LR
	- 0.1 wd
	- WSD scheduler with 10% decay
	- 80% code, 10% NL, 10% instruction data
	- Dataset decontaminated against popular benchmarks following [bigcode](https://github.com/bigcode-project/bigcode-dataset/tree/main/decontamination)
	- 8x3090s 110~ hours


	This is a base pretrained model and requires further fine tuning to be useful.

	## Model Details

	\| [openai/openai_humaneval](https://huggingface.co/datasets/openai/openai_humaneval) (greedy) \| [mbpp](https://huggingface.co/datasets/google-research-datasets/mbpp) (greedy) \|
	\| :------------------ \| :------------- \|
	\| 9.2% \| 9.8% \|