Xiaohan Wang's picture

2 16

Xiaohan Wang

nicholswang

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

authored a paper 10 days ago

Action Sensitivity Learning for Temporal Action Localization

authored a paper 10 days ago

Whitening-based Contrastive Learning of Sentence Embeddings

View all activity

Organizations

nicholswang's activity

upvoted a paper 8 days ago

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Paper • 2412.13180 • Published 9 days ago • 12

authored 9 papers 10 days ago

Action Sensitivity Learning for Temporal Action Localization

Paper • 2305.15701 • Published May 25, 2023

Whitening-based Contrastive Learning of Sentence Embeddings

Paper • 2305.17746 • Published May 28, 2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Paper • 2305.18010 • Published May 29, 2023

Describing Differences in Image Sets with Natural Language

Paper • 2312.02974 • Published Dec 5, 2023 • 13

Clustering based Point Cloud Representation Learning for 3D Analysis

Paper • 2307.14605 • Published Jul 27, 2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

Paper • 2307.16377 • Published Jul 31, 2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

Paper • 2308.04758 • Published Aug 9, 2023

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15 • 32

Why are Visually-Grounded Language Models Bad at Image Classification?

Paper • 2405.18415 • Published May 28

authored a paper 11 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 13 days ago • 131

upvoted a paper 11 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 13 days ago • 131

upvoted 8 papers 3 months ago

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Paper • 2409.19603 • Published Sep 29 • 18

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

Paper • 2410.00890 • Published Oct 1 • 18

SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs

Paper • 2410.00337 • Published Oct 1 • 10

Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation

Paper • 2409.18313 • Published Sep 26 • 3

What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study

Paper • 2410.00545 • Published Oct 1 • 5

Illustrious: an Open Advanced Illustration Model

Paper • 2409.19946 • Published Sep 30 • 13

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Paper • 2410.00086 • Published Sep 30 • 10

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Paper • 2410.00418 • Published Oct 1 • 9