TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation Paper • 2401.14373 • Published Jan 25 • 11
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170