Byung-Kwan Lee's picture

Byung-Kwan Lee

BK-Lee

·

https://sites.google.com/view/byungkwanlee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Organizations

BK-Lee's activity

upvoted a paper 21 days ago

FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published 29 days ago • 12

upvoted 3 papers 30 days ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9 • 59

Intriguing Properties of Large Language and Vision Models

Paper • 2410.04751 • Published Oct 7 • 16

MM-Ego: Towards Building Egocentric Multimodal LLMs

Paper • 2410.07177 • Published Oct 9 • 20

upvoted 5 papers about 1 month ago

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25 • 27

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26 • 49

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 89

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26 • 46

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 99

upvoted 5 papers about 2 months ago

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24 • 16

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23 • 27

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 73

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 71

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

upvoted a paper 2 months ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52

upvoted 5 papers 3 months ago

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

Paper • 2408.12114 • Published Aug 22 • 11

ShortCircuit: AlphaZero-Driven Circuit Design

Paper • 2408.09858 • Published Aug 19 • 16

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 21

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97