BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training Paper • 2409.04599 • Published Sep 6 • 1
Toxicity of the Commons: Curating Open-Source Pre-Training Data Paper • 2410.22587 • Published 9 days ago • 8