75 27 125

Yaowei Zheng

hiyouga

https://github.com/hiyouga

AI & ML interests

LLM Knowledge Management

Recent Activity

liked a model 6 days ago

Skywork/Skywork-o1-Open-Llama-3.1-8B

updated a model 6 days ago

hiyouga/Qwen2-VL-7B-Pokemon

updated a dataset 6 days ago

llamafactory/OpenO1-SFT

View all activity

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20

• 25

Organizations

hiyouga's activity

upvoted 2 papers 19 days ago

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published 20 days ago • 26

Hyper-Connections

Paper • 2409.19606 • Published Sep 29 • 20

upvoted a paper about 1 month ago

LLM-based Optimization of Compound AI Systems: A Survey

Paper • 2410.16392 • Published Oct 21 • 13

upvoted an article about 2 months ago

Article

A Short Summary of Chinese AI Global Expansion

Oct 3

• 15

upvoted a collection 3 months ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 5 days ago • 397

upvoted an article 3 months ago

Article

Meet Yi-Coder: A Small but Mighty LLM for Code

•

Sep 4

• 12

upvoted a paper 3 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

upvoted an article 3 months ago

Article

Understanding Vector Quantization in VQ-VAE

•

Aug 28

• 11

upvoted 3 papers 3 months ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31 • 38

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28 • 11

upvoted 2 papers 4 months ago

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

Paper • 2408.08072 • Published Aug 15 • 32

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 68

upvoted 3 papers 5 months ago

upvoted 2 papers 6 months ago

Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 55

DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints

Paper • 2405.19026 • Published May 29 • 7

upvoted a paper 7 months ago

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20 • 34

upvoted a collection 7 months ago

ZeroGPU Spaces

Collection

ZeroGPU Spaces made by the community • 17 items • Updated Jun 6 • 231