view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • Nov 13, 2024 • 98
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 57
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 108
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 44
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22, 2024 • 80
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks Paper • 2402.04248 • Published Feb 6, 2024 • 30
Large Language Models as Generalizable Policies for Embodied Tasks Paper • 2310.17722 • Published Oct 26, 2023 • 7
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper • 2308.02151 • Published Aug 4, 2023 • 18