rhysjones
/

gpt2-774M-fineweb-150B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-774M-fineweb-150B / README.md

rhysjones's picture

Create README.md

7694221 verified 5 months ago

|

history blame contribute delete

531 Bytes

	---
	license: mit
	datasets:
	- HuggingFaceFW/fineweb
	---

	This is [karpathy's](https://github.com/karpathy) model from [the llm.c project](https://github.com/karpathy/llm.c/discussions/580) converted to HF format to investigate [bfloat16 performance](https://github.com/karpathy/llm.c/pull/571).

	The training run was 150B tokens, 1.5 epochs over the 100B FineWeb sample dataset.

	There's active work underway at [https://github.com/karpathy/llm.c](https://github.com/karpathy/llm.c) so I'd suggest following the developments there!