17 28 223

Xin Li PRO

lixin4ever

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

upvoted a collection 1 day ago

Inf-CL

liked a dataset 5 days ago

longvideobench/LongVideoBench

liked a model 9 days ago

InstantX/FLUX.1-dev-IP-Adapter

View all activity

Organizations

lixin4ever's activity

upvoted a collection 1 day ago

Inf-CL

Collection

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Oct 25 • 3

upvoted a collection 21 days ago

OpenCoder Datasets

Collection

OpenCoder datasets! • 6 items • Updated 16 days ago • 37

upvoted a paper 26 days ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30 • 19

upvoted 4 papers about 1 month ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 88

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16 • 30

upvoted 2 papers about 2 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 166

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3 • 37

upvoted 2 papers 2 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 136

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18 • 43

upvoted a paper 3 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

upvoted a paper 4 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 55

upvoted 4 papers 6 months ago

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7 • 27

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 92

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32

upvoted a paper 10 months ago

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Paper • 2402.03161 • Published Feb 5 • 14

upvoted 2 papers 11 months ago

A Vision Check-up for Language Models

Paper • 2401.01862 • Published Jan 3 • 9

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Paper • 2312.14125 • Published Dec 21, 2023 • 44