pszemraj's picture
step 65537
0e77edf verified
|
raw
history blame contribute delete
No virus
610 Bytes
metadata
datasets:
  - allenai/c4
language:
  - en
license: apache-2.0

nanoT5-mid-65kBPE-2048

This is a "raw" pretrained model intended to be fine-tuned on downstream tasks

A "mid" size T5 model pretrained on c4:

  • trained @ context length 2048
  • 16 layers, hidden size 1024, FF 3072. SiLU activations
  • pretrained on allenai/c4 (en subset) for 65k steps
  • uses an adapted claude3 tokenizer; vocab size 65k

More details and logs under checkpoints/