Interesting Papers - a oaishi Collection

oaishi 's Collections

Interesting Papers

Interesting Papers

updated 5 days ago

SLiMe: Segment Like Me

Paper • 2309.03179 • Published Sep 6, 2023 • 29
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Paper • 2309.02591 • Published Sep 5, 2023 • 14
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

Paper • 2309.06440 • Published Sep 12, 2023 • 9
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

Paper • 2309.06380 • Published Sep 12, 2023 • 32
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

Paper • 2309.05793 • Published Sep 11, 2023 • 50
AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Paper • 2309.06126 • Published Sep 12, 2023 • 16
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 9
Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Paper • 2309.10150 • Published Sep 18, 2023 • 24
A Large-scale Dataset for Audio-Language Representation Learning

Paper • 2309.11500 • Published Sep 20, 2023 • 9
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 63
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Paper • 2309.11998 • Published Sep 21, 2023 • 24
Calibrating LLM-Based Evaluator

Paper • 2309.13308 • Published Sep 23, 2023 • 11
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test

Paper • 2309.13356 • Published Sep 23, 2023 • 36
PaLI-3 Vision Language Models: Smaller, Faster, Stronger

Paper • 2310.09199 • Published Oct 13, 2023 • 24
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 74
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Paper • 2310.13671 • Published Oct 20, 2023 • 18
TiC-CLIP: Continual Training of CLIP Models

Paper • 2310.16226 • Published Oct 24, 2023 • 8
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Paper • 2311.00899 • Published Nov 1, 2023 • 7
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Paper • 2311.01455 • Published Nov 2, 2023 • 28
FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 35
OtterHD: A High-Resolution Multi-modality Model

Paper • 2311.04219 • Published Nov 7, 2023 • 31
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

Paper • 2311.09180 • Published Nov 15, 2023 • 7
Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 6
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 32
StarVector: Generating Scalable Vector Graphics Code from Images

Paper • 2312.11556 • Published Dec 17, 2023 • 27
AppAgent: Multimodal Agents as Smartphone Users

Paper • 2312.13771 • Published Dec 21, 2023 • 51
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support

Paper • 2401.14688 • Published Jan 26 • 13
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Paper • 2401.15071 • Published Jan 26 • 34
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

Paper • 2401.16158 • Published Jan 29 • 17
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Paper • 2402.16671 • Published Feb 26 • 26
MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21 • 15
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 43
The Rise and Potential of Large Language Model Based Agents: A Survey

Paper • 2309.07864 • Published Sep 14, 2023 • 5
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6 • 52
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

Paper • 2409.06210 • Published 10 days ago • 24
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published 13 days ago • 19
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 7 days ago • 39