view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7 • 7
argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 8.72k • 125
skymizer/pretraining-50B-llama3.2-tokenized-padded-packed-2048 Viewer • Updated 15 days ago • 22.5M • 33
skymizer/pretraining-50B-llama3.2-tokenized-padded-packed-2048 Viewer • Updated 15 days ago • 22.5M • 33