Xiaosen Zheng's picture

Xiaosen Zheng

xszheng2020

·

AI & ML interests

Data-Centric AI and AI Safety.

Recent Activity

liked a dataset 20 days ago

proj-persona/PersonaHub

liked a model 21 days ago

Qwen/Qwen2.5-3B-Instruct-AWQ

liked a model 21 days ago

Qwen/Qwen2.5-1.5B-Instruct-AWQ

View all activity

Organizations

xszheng2020's activity

upvoted a paper 22 days ago

Sample-Efficient Alignment for LLMs

Paper • 2411.01493 • Published 25 days ago • 10

upvoted a paper about 1 month ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65

upvoted a collection about 1 month ago

MagpieLM

Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated Sep 22 • 15

upvoted a paper about 1 month ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24 • 40

upvoted 3 collections about 1 month ago

ScaleQuest

We introduce ScaleQuest, a scalable and novel data synthesis method. Project Page: https://scalequest.github.io/ • 8 items • Updated Oct 25 • 4

C4AI Aya Expanse

Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 3 items • Updated Oct 24 • 26

BGE

23 items • Updated Oct 24 • 63

upvoted an article about 1 month ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 272

upvoted a paper about 1 month ago

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 48

upvoted an article about 2 months ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20

• 67

upvoted a paper about 2 months ago

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Paper • 2410.07137 • Published Oct 9 • 7

upvoted a paper 2 months ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25 • 59

upvoted a paper 3 months ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23 • 22

upvoted a collection 3 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17 • 15

upvoted a paper 4 months ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 78

upvoted 2 collections 4 months ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 627

Model with Circuit Breakers

SoTA models with circuit breakers inserted. Top safety performance without losing capabilities. • 3 items • Updated Oct 25 • 4

upvoted a paper 4 months ago

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18 • 52

upvoted 2 articles 5 months ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11

• 104

Article

RegMix: Data Mixture as Regression for Language Model Pre-training

By

•

Jul 11

• 10