PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models Paper • 2402.08714 • Published Feb 13 • 10
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 21
RLVF: Learning from Verbal Feedback without Overgeneralization Paper • 2402.10893 • Published Feb 16 • 10
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper • 2402.14658 • Published Feb 22 • 82
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22 • 19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Paper • 2402.16822 • Published Feb 26 • 15
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 49
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Paper • 2402.11753 • Published Feb 19 • 5
How Far Are We from Intelligent Visual Deductive Reasoning? Paper • 2403.04732 • Published Mar 7 • 18
Evaluating and Mitigating Discrimination in Language Model Decisions Paper • 2312.03689 • Published Dec 6, 2023 • 1
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits Paper • 2305.02547 • Published May 4, 2023 • 7
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History Paper • 2402.18216 • Published Feb 28 • 1
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 25
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 64
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 20
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations Paper • 2404.07153 • Published Apr 10 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 47
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 29
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 30
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22 • 18
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 44
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 71
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118
Observational Scaling Laws and the Predictability of Language Model Performance Paper • 2405.10938 • Published May 17 • 11
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20 • 27
LANISTR: Multimodal Learning from Structured and Unstructured Data Paper • 2305.16556 • Published May 26, 2023 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published May 18 • 17
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24 • 25
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published May 30 • 19
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published May 29 • 16
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published Jun 2 • 30
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Paper • 2406.02884 • Published Jun 5 • 14
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 36
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11 • 32
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
Large Language Model Unlearning via Embedding-Corrupted Prompts Paper • 2406.07933 • Published Jun 12 • 7
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13 • 50
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14 • 76
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published Jun 13 • 85
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published Jun 17 • 61
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published Jun 17 • 37
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published Jun 17 • 30
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published Jun 18 • 49
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published Jun 17 • 13
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15 • 65
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems Paper • 2406.14972 • Published Jun 21 • 7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published Jun 19 • 16
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers Paper • 2406.12430 • Published Jun 18 • 7
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 17
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Paper • 2406.17419 • Published Jun 25 • 16
Large Language Models Assume People are More Rational than We Really are Paper • 2406.17055 • Published Jun 24 • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24 • 54
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published Jun 24 • 25
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Paper • 2406.20095 • Published Jun 28 • 17
MagMax: Leveraging Model Merging for Seamless Continual Learning Paper • 2407.06322 • Published Jul 8 • 1
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 128
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training Paper • 2407.09121 • Published Jul 12 • 5
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data Paper • 2407.08726 • Published Jul 11 • 8
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11 • 17
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness? Paper • 2402.07282 • Published Feb 11 • 1
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Paper • 2407.00256 • Published Jun 28 • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback Paper • 2403.00409 • Published Mar 1 • 1
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published Aug 27 • 23
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published Sep 6 • 22