File size: 610 Bytes
b99a8a2 9503ceb 0e77edf b99a8a2 9503ceb b99a8a2 9503ceb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
---
datasets:
- allenai/c4
language:
- en
license: apache-2.0
---
# nanoT5-mid-65kBPE-2048
> [!NOTE]
> This is a "raw" pretrained model intended to be fine-tuned on downstream tasks
A "mid" size T5 model pretrained on c4:
- trained @ context length 2048
- 16 layers, hidden size 1024, FF 3072. SiLU activations
- pretrained on `allenai/c4` (`en` subset) for 65k steps
- uses an [adapted claude3 tokenizer](https://huggingface.co/BEE-spoke-data/claude-tokenizer-forT5); vocab size 65k
More details and logs under [checkpoints/](https://huggingface.co/pszemraj/nanoT5-mid-65kBPE-2048/tree/main/checkpoints) |