Abstract
The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training.Our source codes are publicly available at \url{https://github.com/gmftbyGMFTBY/Copyisallyouneed.}
Community
很实用的技术,尤其是在特定领域需要输出受限的文本内容场景下,例如法律条文、医学术语等等。不过为什么我会想起曾经的“背诵毛语录”历史时期呢[doge]
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation (2024)
- FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference (2024)
- Visually Guided Generative Text-Layout Pre-training for Document Intelligence (2024)
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders (2024)
- Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Revolutionizing Text Generation: The Power of Copy-Generator (CoG)
Links 🔗:
👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper