SMOTE: Synthetic Minority Over-sampling Technique Paper • 1106.1813 • Published Jun 9, 2011 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Paper • 1406.1078 • Published Jun 3, 2014
Distributed Representations of Sentences and Documents Paper • 1405.4053 • Published May 16, 2014
Sequence to Sequence Learning with Neural Networks Paper • 1409.3215 • Published Sep 10, 2014 • 3
Neural Machine Translation by Jointly Learning to Align and Translate Paper • 1409.0473 • Published Sep 1, 2014 • 4
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Paper • 1804.07461 • Published Apr 20, 2018 • 4
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 14
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 7
Energy and Policy Considerations for Deep Learning in NLP Paper • 1906.02243 • Published Jun 5, 2019 • 1
XLNet: Generalized Autoregressive Pretraining for Language Understanding Paper • 1906.08237 • Published Jun 19, 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 8
AR-Net: A simple Auto-Regressive Neural Network for time-series Paper • 1911.12436 • Published Nov 27, 2019 • 1
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Paper • 2003.10555 • Published Mar 23, 2020
SQuAD: 100,000+ Questions for Machine Comprehension of Text Paper • 1606.05250 • Published Jun 16, 2016 • 3
Mish: A Self Regularized Non-Monotonic Activation Function Paper • 1908.08681 • Published Aug 23, 2019 • 1
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Paper • 2101.00027 • Published Dec 31, 2020 • 6
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Paper • 2101.03961 • Published Jan 11, 2021 • 14
LoRA: Low-Rank Adaptation of Large Language Models Paper • 2106.09685 • Published Jun 17, 2021 • 30
Evaluating Large Language Models Trained on Code Paper • 2107.03374 • Published Jul 7, 2021 • 6
NeuralProphet: Explainable Forecasting at Scale Paper • 2111.15397 • Published Nov 29, 2021 • 1
LLaMA: Open and Efficient Foundation Language Models Paper • 2302.13971 • Published Feb 27, 2023 • 13
PyTorch: An Imperative Style, High-Performance Deep Learning Library Paper • 1912.01703 • Published Dec 3, 2019 • 1
TensorFlow: A system for large-scale machine learning Paper • 1605.08695 • Published May 27, 2016 • 1
Theano: A Python framework for fast computation of mathematical expressions Paper • 1605.02688 • Published May 9, 2016 • 1
Caffe: Convolutional Architecture for Fast Feature Embedding Paper • 1408.5093 • Published Jun 20, 2014 • 1
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents Paper • 1901.08149 • Published Jan 23, 2019 • 3
Annotated History of Modern AI and Deep Learning Paper • 2212.11279 • Published Dec 21, 2022 • 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 116
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86