AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling Paper • 2412.15084 • Published 7 days ago • 12
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 57
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Paper • 2407.14482 • Published Jul 19 • 25
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 27
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models Paper • 2308.07922 • Published Aug 15, 2023 • 17
Multi-Stage Prompting for Knowledgeable Dialogue Generation Paper • 2203.08745 • Published Mar 16, 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Paper • 2201.11990 • Published Jan 28, 2022 • 1
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining Paper • 2310.07713 • Published Oct 11, 2023 • 3
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study Paper • 2304.06762 • Published Apr 13, 2023 • 1
Reducing Activation Recomputation in Large Transformer Models Paper • 2205.05198 • Published May 10, 2022
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM Paper • 2104.04473 • Published Apr 9, 2021
End-to-End Training of Neural Retrievers for Open-Domain Question Answering Paper • 2101.00408 • Published Jan 2, 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases Paper • 2112.07868 • Published Dec 15, 2021