2 8 9

Xiaotian Han

xiaotianhan

https://hanxiaotian.github.io/

hanxiaotian

AI & ML interests

Multimodal LLM

Recent Activity

upvoted a paper 2 days ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

authored a paper 29 days ago

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

liked a model 29 days ago

shallowdream204/DreamClear

View all activity

Organizations

Posts 3

Post

873

🚀 Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮✨

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF: InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)
📂 Dataset: Infi-MM/InfiMM-WebMath-40B

Post

2089

🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

View all posts

Papers 8

models

None public yet

datasets

None public yet