arxiv:2404.13028

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

Published on Apr 19

Upvote

Authors:

Stephen Choi ,

William Gazeley

Abstract

This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preserving previously acquired knowledge. We demonstrate LLM-ADE's effectiveness on the TinyLlama model across various general knowledge benchmarks, showing significant performance improvements without the drawbacks of traditional continuous training methods. This approach promises a more versatile and robust way to keep LLMs current and efficient in real-world applications.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

Abstract

Community

Models citing this paper 1

Datasets citing this paper 2

Spaces citing this paper 1

Collections including this paper 1