nbroad
's Collections
attention and long context
updated
Efficient Streaming Language Models with Attention Sinks
Paper
•
2309.17453
•
Published
•
13
Effective Long-Context Scaling of Foundation Models
Paper
•
2309.16039
•
Published
•
30
allenai/longformer-base-4096
Updated
•
6.63M
•
167
google/bigbird-roberta-base
Updated
•
25.7k
•
49
uw-madison/yoso-4096
Fill-Mask
•
Updated
•
1.26k
Yukang/Llama-2-7b-longlora-100k-ft
Text Generation
•
Updated
•
1.4k
•
51
mosaicml/mpt-7b-storywriter
Text Generation
•
Updated
•
2.11k
•
822
allenai/led-base-16384
Text2Text Generation
•
Updated
•
21.3k
•
41
RRWKV: Capturing Long-range Dependencies in RWKV
Paper
•
2306.05176
•
Published
Retentive Network: A Successor to Transformer for Large Language Models
Paper
•
2307.08621
•
Published
•
170
Hyena Hierarchy: Towards Larger Convolutional Language Models
Paper
•
2302.10866
•
Published
•
7
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution
Paper
•
2306.15794
•
Published
•
17
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Paper
•
2212.14052
•
Published
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
•
2310.01889
•
Published
•
10
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
87
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
•
2303.09752
•
Published
•
2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
•
2112.07916
•
Published
•
2
Investigating Efficiently Extending Transformers for Long Input
Summarization
Paper
•
2208.04347
•
Published
Train Short, Test Long: Attention with Linear Biases Enables Input
Length Extrapolation
Paper
•
2108.12409
•
Published
•
5
amazon/MistralLite
Text Generation
•
Updated
•
8.5k
•
427
NousResearch/Yarn-Mistral-7b-128k
Text Generation
•
Updated
•
19.5k
•
570
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
111