Mohammad Shoeybi's picture

1

Mohammad Shoeybi

shoeybi

·

AI & ML interests

None yet

Recent Activity

authored a paper 7 days ago

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

View all activity

Organizations

shoeybi's activity

authored a paper 7 days ago

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published 7 days ago • 12

authored a paper 3 months ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 72

authored a paper 4 months ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 57

authored a paper 5 months ago

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Paper • 2407.14482 • Published Jul 19 • 25

authored 16 papers 7 months ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27

BioMegatron: Larger Biomedical Domain Language Model

Paper • 2010.06060 • Published Oct 12, 2020 • 1

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

Paper • 2308.07922 • Published Aug 15, 2023 • 17

Multi-Stage Prompting for Knowledgeable Dialogue Generation

Paper • 2203.08745 • Published Mar 16, 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Paper • 2201.11990 • Published Jan 28, 2022 • 1

Retrieval meets Long Context Large Language Models

Paper • 2310.03025 • Published Oct 4, 2023 • 4

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

Paper • 2310.07713 • Published Oct 11, 2023 • 3

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

Paper • 2304.06762 • Published Apr 13, 2023 • 1

VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 20

FP8 Formats for Deep Learning

Paper • 2209.05433 • Published Sep 12, 2022

ChatQA: Building GPT-4 Level Conversational QA Models

Paper • 2401.10225 • Published Jan 18 • 34

Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Paper • 2402.07319 • Published Feb 11 • 13

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Paper • 2104.04473 • Published Apr 9, 2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Paper • 2101.00408 • Published Jan 2, 2021

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Paper • 2112.07868 • Published Dec 15, 2021