8 26 3

Melisa Russak

melisa

melisa-writer

AI & ML interests

I love definitions

Recent Activity

upvoted a paper 4 days ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

upvoted a paper 10 days ago

Adaptive Decoding via Latent Preference Optimization

upvoted an article 11 days ago

Fine-tuning LLMs with Singular Value Decomposition

View all activity

Organizations

melisa's activity

upvoted a paper 4 days ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published 5 days ago • 28

upvoted a paper 10 days ago

Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published 16 days ago • 10

upvoted an article 11 days ago

Article

Fine-tuning LLMs with Singular Value Decomposition

•

Jun 2

• 8

upvoted a paper 14 days ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published 17 days ago • 40

upvoted 2 papers about 1 month ago

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Paper • 2410.21333 • Published Oct 27 • 9

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

Paper • 2410.18565 • Published Oct 24 • 42

liked a model about 1 month ago

YuWangX/memoryllm-8b

Updated Sep 5 • 90 • 3

upvoted a paper about 1 month ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

upvoted a paper about 2 months ago

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30 • 53

upvoted 3 papers 3 months ago

commented a paper 3 months ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 138 •

upvoted a paper 3 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37

posted an update 3 months ago

Post

2978

🔥 Introducing "Writing in the Margins (WiM)" - better inference pattern for long context LLMs that solves the Lost-in-the-Middle problem 🔥

Paper page: Writing in the Margins: Better Inference Pattern for Long Context Retrieval (2408.14906)

TL;DR
Make your model write "margin notes" as you chunk prefill the KV cache. Then ask it reread all notes before it speaks up.
Works with humans, works with AI 🤖

WiM leverages the chunked prefill of the key-value cache, which concurrently generates query-based extractive summaries at each step of the prefill that are subsequently reintegrated at the end of the computation. We term these intermediate outputs “margins”, drawing inspiration from the practice of making margin notes for improved comprehension of long contexts in human reading. We show that this technique, which adds only minimal additional computation, significantly improves LLMs long context reasoning capabilities.

Think: Every chunk has a chance to be attended to/ be at the end of the context at least once. 🎉

📊 Results:
- An average accuracy boost of 7.5% in multi-hop reasoning tasks like HotpotQA and MultiHop-RAG.
- Even a 30% increase in F1-score for summarisation-like tasks (CWE).

Plus, WiM fits seamlessly into interactive applications (think: progress bar!). It can provide real-time progress updates during data retrieval and integration, making it user-friendly and transparent - a stark contrast to feeding 1mln tokens to an LLMs and waiting 6 min for the first token. 🤯

👩‍💻🧑‍💻 Check it out and contribute to our open-source project here: https://github.com/writer/writing-in-the-margins

🧠 More about chunked prefill: https://docs.vllm.ai/en/latest/models/performance.html#chunked-prefill