Minsoo Kim's picture

11 5

Minsoo Kim

minsoo2333

https://marsjacobs.github.io

AI & ML interests

LLM compression

Organizations

None yet

minsoo2333's activity

upvoted 3 papers about 2 months ago

Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31 • 13

Characterizing Prompt Compression Methods for Long Context Inference

Paper • 2407.08892 • Published Jul 11 • 9

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Paper • 2407.14057 • Published Jul 19 • 41

upvoted a collection 2 months ago

Gradient's Long Context Models

6 items • Updated Jun 13 • 2

upvoted 2 papers 3 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 36

upvoted a paper 5 months ago

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43

upvoted a collection 5 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673

upvoted a paper 5 months ago

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 110

upvoted a paper 7 months ago

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16 • 41

upvoted a paper 12 months ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 43