LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Abstract
Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations. In this work, we aim to enable long-context LLMs to generate responses with fine-grained sentence-level citations, improving their faithfulness and verifiability. We first introduce LongBench-Cite, an automated benchmark for assessing current LLMs' performance in Long-Context Question Answering with Citations (LQAC), revealing considerable room for improvement. To this end, we propose CoF (Coarse to Fine), a novel pipeline that utilizes off-the-shelf LLMs to automatically generate long-context QA instances with precise sentence-level citations, and leverage this pipeline to construct LongCite-45k, a large-scale SFT dataset for LQAC. Finally, we train LongCite-8B and LongCite-9B using the LongCite-45k dataset, successfully enabling their generation of accurate responses and fine-grained sentence-level citations in a single output. The evaluation results on LongBench-Cite show that our trained models achieve state-of-the-art citation quality, surpassing advanced proprietary models including GPT-4o.
Community
Code: https://github.com/THUDM/LongCite
Dataset: https://huggingface.co/datasets/THUDM/LongCite-45k
Model: https://huggingface.co/THUDM/LongCite-glm4-9b and https://huggingface.co/THUDM/LongCite-llama3.1-8b
Huggingface Space: https://huggingface.co/spaces/THUDM/LongCite
Neo is on FIRE!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning Fine-Grained Grounded Citations for Attributed Large Language Models (2024)
- Citekit: A Modular Toolkit for Large Language Model Citation Generation (2024)
- DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems (2024)
- Fine-grained Hallucination Detection and Mitigation in Long-form Question Answering (2024)
- Improving Retrieval Augmented Language Model with Self-Reasoning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend