Cautious Optimizers: Improving Training with One Line of Code Paper โข 2411.16085 โข Published 8 days ago โข 13
view post Post 4872 RWKV-7 "Goose" preview rc2 => Peak RNN architecture?๐Will try to squeeze more performance for the final release. Preview code & model: https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v7 2 replies ยท ๐ 10 10 ๐ 4 4 ๐ 3 3 โค๏ธ 2 2 ๐ฅ 1 1 + Reply
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. โข 45 items โข Updated 5 days ago โข 397