papers - a adamelliotfields Collection

adamelliotfields 's Collections

spaces

small language models

prompt expansion

papers

papers

updated Oct 9

machine learning and neural network papers 📜

SMOTE: Synthetic Minority Over-sampling Technique

Paper • 1106.1813 • Published Jun 9, 2011 • 1
Scikit-learn: Machine Learning in Python

Paper • 1201.0490 • Published Jan 2, 2012 • 1
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Paper • 1406.1078 • Published Jun 3, 2014
Distributed Representations of Sentences and Documents

Paper • 1405.4053 • Published May 16, 2014
Sequence to Sequence Learning with Neural Networks

Paper • 1409.3215 • Published Sep 10, 2014 • 3
Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 4
Text Understanding from Scratch

Paper • 1502.01710 • Published Feb 5, 2015
XGBoost: A Scalable Tree Boosting System

Paper • 1603.02754 • Published Mar 9, 2016 • 1
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 47
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Paper • 1804.07461 • Published Apr 20, 2018 • 4
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 15
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
Energy and Policy Considerations for Deep Learning in NLP

Paper • 1906.02243 • Published Jun 5, 2019 • 1
XLNet: Generalized Autoregressive Pretraining for Language Understanding

Paper • 1906.08237 • Published Jun 19, 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 9
AR-Net: A simple Auto-Regressive Neural Network for time-series

Paper • 1911.12436 • Published Nov 27, 2019 • 1
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 1
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Paper • 2003.10555 • Published Mar 23, 2020
SQuAD: 100,000+ Questions for Machine Comprehension of Text

Paper • 1606.05250 • Published Jun 16, 2016 • 3
Mish: A Self Regularized Non-Monotonic Activation Function

Paper • 1908.08681 • Published Aug 23, 2019 • 1
The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Paper • 2101.00027 • Published Dec 31, 2020 • 6
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 14
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 30
Evaluating Large Language Models Trained on Code

Paper • 2107.03374 • Published Jul 7, 2021 • 6
NeuralProphet: Explainable Forecasting at Scale

Paper • 2111.15397 • Published Nov 29, 2021 • 1
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 13
Searching for Activation Functions

Paper • 1710.05941 • Published Oct 16, 2017 • 1
PyTorch: An Imperative Style, High-Performance Deep Learning Library

Paper • 1912.01703 • Published Dec 3, 2019 • 1
TensorFlow: A system for large-scale machine learning

Paper • 1605.08695 • Published May 27, 2016 • 1
Theano: A Python framework for fast computation of mathematical expressions

Paper • 1605.02688 • Published May 9, 2016 • 1
Caffe: Convolutional Architecture for Fast Feature Embedding

Paper • 1408.5093 • Published Jun 20, 2014 • 1
Teaching Machines to Read and Comprehend

Paper • 1506.03340 • Published Jun 10, 2015 • 2
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

Paper • 1901.08149 • Published Jan 23, 2019 • 3
Annotated History of Modern AI and Deep Learning

Paper • 2212.11279 • Published Dec 21, 2022 • 1
KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30 • 108
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 118
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 150
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 86
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 86
Were RNNs All We Needed?

Paper • 2410.01201 • Published Oct 2 • 47