Wenhao Chai's picture

Wenhao Chai

wchai

·

http://rese1f.github.io

AI & ML interests

computer vision, artificial intelligence

Organizations

wchai's activity

upvoted a paper 13 days ago

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published 16 days ago • 24

upvoted an article 15 days ago

Article

Allegro: Advanced Video Generation Model

By

•

17 days ago

• 55

upvoted a collection 18 days ago

Aurora Series: AuroraCap

Efficient, Performant Video Detailed Captioning and a New Benchmark • 8 items • Updated 12 days ago • 1

upvoted 4 papers about 1 month ago

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3 • 22

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4 • 3

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published Oct 2 • 40

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published Sep 30 • 51

upvoted an article about 1 month ago

Article

Llama can now see and run on your device - welcome Llama 3.2

Sep 25

• 163

upvoted a collection about 1 month ago

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18 • 11

upvoted 4 papers about 2 months ago

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20 • 37

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Paper • 2406.13897 • Published May 30 • 12

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

See and Think: Embodied Agent in Virtual Environment

Paper • 2311.15209 • Published Nov 26, 2023 • 2

upvoted a collection about 2 months ago

Video Caption

Based on AuroraCap • 3 items • Updated Sep 20 • 1

upvoted a paper 6 months ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 22

upvoted a collection 7 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 680

upvoted 2 papers 7 months ago

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 44

sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 39

upvoted a paper 9 months ago

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Paper • 2402.07865 • Published Feb 12 • 12

upvoted a collection 9 months ago

Qwen1.5

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Sep 18 • 206