ChaangHaan
's Collections
aMUSEd: An Open MUSE Reproduction
Paper
•
2401.01808
•
Published
•
28
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper
•
2401.01885
•
Published
•
27
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via
Stein Identity
Paper
•
2401.00604
•
Published
•
4
LARP: Language-Agent Role Play for Open-World Games
Paper
•
2312.17653
•
Published
•
30
Learning Vision from Models Rivals Learning Vision from Data
Paper
•
2312.17742
•
Published
•
15
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
•
2312.16862
•
Published
•
30
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
Paper
•
2312.16457
•
Published
•
13
InsActor: Instruction-driven Physics-based Characters
Paper
•
2312.17135
•
Published
•
9
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion
Paper
•
2312.16486
•
Published
•
6
SSR-Encoder: Encoding Selective Subject Representation for
Subject-Driven Generation
Paper
•
2312.16272
•
Published
•
6
Prompt Expansion for Adaptive Text-to-Image Generation
Paper
•
2312.16720
•
Published
•
5
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Make-A-Character: High Quality Text-to-3D Character Generation within
Minutes
Paper
•
2312.15430
•
Published
•
28
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper
•
2312.15715
•
Published
•
19
LangSplat: 3D Language Gaussian Splatting
Paper
•
2312.16084
•
Published
•
14
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and
Erasing Applications
Paper
•
2312.16145
•
Published
•
8
Supervised Knowledge Makes Large Language Models Better In-context
Learners
Paper
•
2312.15918
•
Published
•
8
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper
•
2312.14233
•
Published
•
15
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
Visual-Linguistic Tasks
Paper
•
2312.14238
•
Published
•
14
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Paper
•
2312.14878
•
Published
•
13
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Paper
•
2312.14385
•
Published
•
5
Shai: A large language model for asset management
Paper
•
2312.14203
•
Published
•
4
LLM4VG: Large Language Models Evaluation for Video Grounding
Paper
•
2312.14206
•
Published
•
2
DreamTuner: Single Image is Enough for Subject-Driven Generation
Paper
•
2312.13691
•
Published
•
26
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Paper
•
2312.13913
•
Published
•
22
Time is Encoded in the Weights of Finetuned Language Models
Paper
•
2312.13401
•
Published
•
19
PIA: Your Personalized Image Animator via Plug-and-Play Modules in
Text-to-Image Models
Paper
•
2312.13964
•
Published
•
18
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image
Inpainting with Diffusion Models
Paper
•
2312.14091
•
Published
•
15
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Paper
•
2312.13789
•
Published
•
13
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion
Models with RL Finetuning
Paper
•
2312.13980
•
Published
•
13
Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation
Paper
•
2312.13469
•
Published
•
10
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models
Paper
•
2312.13763
•
Published
•
9
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors
Paper
•
2312.13324
•
Published
•
9
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper
•
2312.13314
•
Published
•
7
HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs
Paper
•
2312.14140
•
Published
•
6
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
•
2312.12456
•
Published
•
41
Generative Multimodal Models are In-Context Learners
Paper
•
2312.13286
•
Published
•
34
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Paper
•
2312.13252
•
Published
•
27
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
•
2312.12490
•
Published
•
17
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
12
Repaint123: Fast and High-quality One Image to 3D Generation with
Progressive Controllable 2D Repainting
Paper
•
2312.13271
•
Published
•
4
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
•
2312.11514
•
Published
•
258
StarVector: Generating Scalable Vector Graphics Code from Images
Paper
•
2312.11556
•
Published
•
27
3D-LFM: Lifting Foundation Model
Paper
•
2312.11894
•
Published
•
13
HAAR: Text-Conditioned Generative Model of 3D Strand-based Human
Hairstyles
Paper
•
2312.11666
•
Published
•
12
Jack of All Tasks, Master of Many: Designing General-purpose
Coarse-to-Fine Vision-Language Model
Paper
•
2312.12423
•
Published
•
12
MixRT: Mixed Neural Representations For Real-Time NeRF Rendering
Paper
•
2312.11841
•
Published
•
10
Tracking Any Object Amodally
Paper
•
2312.12433
•
Published
•
11
FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple
Super-Resolution Pipeline
Paper
•
2312.11537
•
Published
•
6
TIP: Text-Driven Image Processing with Semantic and Restoration
Instructions
Paper
•
2312.11595
•
Published
•
5
Text-Conditioned Resampler For Long Form Video Understanding
Paper
•
2312.11897
•
Published
•
5
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided
Document Generation
Paper
•
2312.11532
•
Published
•
5
Customize-It-3D: High-Quality 3D Creation from A Single Image Using
Subject-Specific Knowledge Prior
Paper
•
2312.11535
•
Published
•
6
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method
Paper
•
2312.12030
•
Published
•
4
VecFusion: Vector Font Generation with Diffusion
Paper
•
2312.10540
•
Published
•
21
Rich Human Feedback for Text-to-Image Generation
Paper
•
2312.10240
•
Published
•
19
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
•
2312.11392
•
Published
•
19
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Paper
•
2312.11370
•
Published
•
20
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Paper
•
2312.10763
•
Published
•
18
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper
•
2312.11461
•
Published
•
18
MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
Storytelling via Multi-Layered Semantic-Aware Denoising
Paper
•
2312.10899
•
Published
•
14
MAG-Edit: Localized Image Editing in Complex Scenarios via
Mask-Based Attention-Adjusted
Guidance
Paper
•
2312.11396
•
Published
•
10
Cascade Speculative Drafting for Even Faster LLM Inference
Paper
•
2312.11462
•
Published
•
8
Silkie: Preference Distillation for Large Visual Language Models
Paper
•
2312.10665
•
Published
•
11
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
•
2312.10656
•
Published
•
10
ProTIP: Progressive Tool Retrieval Improves Planning
Paper
•
2312.10332
•
Published
•
7
Your Student is Better Than Expected: Adaptive Teacher-Student
Collaboration for Text-Conditional Diffusion Models
Paper
•
2312.10835
•
Published
•
6
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient
Volumetric Encoder
Paper
•
2312.11459
•
Published
•
5
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View
Synthesis
Paper
•
2312.11458
•
Published
•
4
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper
•
2312.09911
•
Published
•
53
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper
•
2312.10003
•
Published
•
35
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
•
2312.09767
•
Published
•
25
MobileSAMv2: Faster Segment Anything to Everything
Paper
•
2312.09579
•
Published
•
20
Point Transformer V3: Simpler, Faster, Stronger
Paper
•
2312.10035
•
Published
•
17
Weight subcloning: direct initialization of transformers using larger
pretrained ones
Paper
•
2312.09299
•
Published
•
17
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion
Models
Paper
•
2312.09608
•
Published
•
13
Self-Evaluation Improves Selective Generation in Large Language Models
Paper
•
2312.09300
•
Published
•
14
Stable Score Distillation for High-Quality 3D Generation
Paper
•
2312.09305
•
Published
•
7
Faithful Persona-based Conversational Dataset Generation with Large
Language Models
Paper
•
2312.10007
•
Published
•
6
StemGen: A music generation model that listens
Paper
•
2312.08723
•
Published
•
47
TinyGSM: achieving >80% on GSM8k with small language models
Paper
•
2312.09241
•
Published
•
37
CogAgent: A Visual Language Model for GUI Agents
Paper
•
2312.08914
•
Published
•
29
VideoLCM: Video Latent Consistency Model
Paper
•
2312.09109
•
Published
•
22
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style
Models on Dense Captions
Paper
•
2312.08578
•
Published
•
16
Pixel Aligned Language Models
Paper
•
2312.09237
•
Published
•
14
SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained
Geometry and Appearance
Paper
•
2312.08889
•
Published
•
11
Vision-Language Models as a Source of Rewards
Paper
•
2312.09187
•
Published
•
11
FineControlNet: Fine-level Text Control for Image Generation with
Spatially Aligned Text Control Injection
Paper
•
2312.09252
•
Published
•
9
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Paper
•
2312.09067
•
Published
•
13
LIME: Localized Image Editing via Attention Regularization in Diffusion
Models
Paper
•
2312.09256
•
Published
•
8
General Object Foundation Model for Images and Videos at Scale
Paper
•
2312.09158
•
Published
•
8
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D
Generation
Paper
•
2312.08754
•
Published
•
6
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
Understanding and Generation
Paper
•
2312.09251
•
Published
•
6
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Paper
•
2312.09246
•
Published
•
5
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
40
Distributed Inference and Fine-tuning of Large Language Models Over The
Internet
Paper
•
2312.08361
•
Published
•
25
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Paper
•
2312.07661
•
Published
•
16
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper
•
2312.07843
•
Published
•
14
Invariant Graph Transformer
Paper
•
2312.07859
•
Published
•
6
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Paper
•
2312.08344
•
Published
•
9
ProNeRF: Learning Efficient Projection-Aware Ray Sampling for
Fine-Grained Implicit Neural Radiance Fields
Paper
•
2312.08136
•
Published
•
3
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
•
2312.07537
•
Published
•
26
VILA: On Pre-training for Visual Language Models
Paper
•
2312.07533
•
Published
•
20
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
•
2312.07536
•
Published
•
16
Interfacing Foundation Models' Embeddings
Paper
•
2312.07532
•
Published
•
10
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
Paper
•
2312.06971
•
Published
•
10
Steering Llama 2 via Contrastive Activation Addition
Paper
•
2312.06681
•
Published
•
11
Honeybee: Locality-enhanced Projector for Multimodal LLM
Paper
•
2312.06742
•
Published
•
9
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point
Clouds Generation
Paper
•
2312.07231
•
Published
•
6
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper
•
2312.07509
•
Published
•
7
"I Want It That Way": Enabling Interactive Decision Support Using Large
Language Models and Constraint Programming
Paper
•
2312.06908
•
Published
•
5
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
56
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
Paper
•
2312.06655
•
Published
•
23
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•
Published
•
23
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Paper
•
2312.06109
•
Published
•
20
Context Tuning for Retrieval Augmented Generation
Paper
•
2312.05708
•
Published
•
16
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Paper
•
2312.06571
•
Published
•
12
Efficient Quantization Strategies for Latent Diffusion Models
Paper
•
2312.05431
•
Published
•
11
Federated Full-Parameter Tuning of Billion-Sized Language Models with
Communication Cost under 18 Kilobytes
Paper
•
2312.06353
•
Published
•
5
Evaluation of Large Language Models for Decision Making in Autonomous
Driving
Paper
•
2312.06351
•
Published
•
5
Using Captum to Explain Generative Language Models
Paper
•
2312.05491
•
Published
•
3
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing
Paper
•
2312.05605
•
Published
•
1
DreaMoving: A Human Dance Video Generation Framework based on Diffusion
Models
Paper
•
2312.05107
•
Published
•
38
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Paper
•
2312.04655
•
Published
•
20
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors
Paper
•
2312.04963
•
Published
•
16
Customizing Motion in Text-to-Video Diffusion Models
Paper
•
2312.04966
•
Published
•
10
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper
•
2312.05180
•
Published
•
9
MVDD: Multi-View Depth Diffusion Models
Paper
•
2312.04875
•
Published
•
9
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
Models with 3D Parallelism
Paper
•
2312.04916
•
Published
•
6
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Paper
•
2312.04837
•
Published
•
2
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
•
2312.03818
•
Published
•
32
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
•
2312.04474
•
Published
•
29
Controllable Human-Object Interaction Synthesis
Paper
•
2312.03913
•
Published
•
22
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
57
Pearl: A Production-ready Reinforcement Learning Agent
Paper
•
2312.03814
•
Published
•
14
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
•
2312.04410
•
Published
•
14
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
•
2312.04557
•
Published
•
12
NeRFiller: Completing Scenes via Generative 3D Inpainting
Paper
•
2312.04560
•
Published
•
11
Large Language Models for Mathematicians
Paper
•
2312.04556
•
Published
•
11
Gen2Det: Generate to Detect
Paper
•
2312.04566
•
Published
•
9
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper
•
2312.04483
•
Published
•
7
Efficient Monotonic Multihead Attention
Paper
•
2312.04515
•
Published
•
6
Generating Illustrated Instructions
Paper
•
2312.04552
•
Published
•
7
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction
Tuning
Paper
•
2312.03849
•
Published
•
5
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
34
Relightable Gaussian Codec Avatars
Paper
•
2312.03704
•
Published
•
29
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic
Gaussians
Paper
•
2312.03029
•
Published
•
23
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
•
2312.03641
•
Published
•
20
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Paper
•
2312.03209
•
Published
•
17
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian
Splatting
Paper
•
2312.03461
•
Published
•
15
Context Diffusion: In-Context Aware Image Generation
Paper
•
2312.03584
•
Published
•
14
LooseControl: Lifting ControlNet for Generalized Depth Conditioning
Paper
•
2312.03079
•
Published
•
12
DreamComposer: Controllable 3D Object Generation via Multi-View
Conditions
Paper
•
2312.03611
•
Published
•
7
MagicStick: Controllable Video Editing via Control Handle
Transformations
Paper
•
2312.03047
•
Published
•
9
Self-conditioned Image Generation via Generating Representations
Paper
•
2312.03701
•
Published
•
7
Generative agent-based modeling with actions grounded in physical,
social, or digital space using Concordia
Paper
•
2312.03664
•
Published
•
8
Language-Informed Visual Concept Learning
Paper
•
2312.03587
•
Published
•
5
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
•
2312.02238
•
Published
•
25
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
•
2312.02928
•
Published
•
16
Describing Differences in Image Sets with Natural Language
Paper
•
2312.02974
•
Published
•
13
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper
•
2312.02432
•
Published
•
12
DragVideo: Interactive Drag-style Video Editing
Paper
•
2312.02216
•
Published
•
10
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures
Paper
•
2312.02963
•
Published
•
9
Fine-grained Controllable Video Generation via Object Appearance and
Context
Paper
•
2312.02919
•
Published
•
10
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper
•
2312.02981
•
Published
•
8
Training Chain-of-Thought via Latent-Variable Inference
Paper
•
2312.02179
•
Published
•
8
Alchemist: Parametric Control of Material Properties with Diffusion
Models
Paper
•
2312.02970
•
Published
•
7
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Paper
•
2312.02949
•
Published
•
11
GPT4Point: A Unified Framework for Point-Language Understanding and
Generation
Paper
•
2312.02980
•
Published
•
7
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions
Paper
•
2312.02772
•
Published
•
6
Magicoder: Source Code Is All You Need
Paper
•
2312.02120
•
Published
•
79
VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models
Paper
•
2312.00845
•
Published
•
36
DeepCache: Accelerating Diffusion Models for Free
Paper
•
2312.00858
•
Published
•
21
Nash Learning from Human Feedback
Paper
•
2312.00886
•
Published
•
14
DiffiT: Diffusion Vision Transformers for Image Generation
Paper
•
2312.02139
•
Published
•
13
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for
Real-time Human Novel View Synthesis
Paper
•
2312.02155
•
Published
•
12
Object Recognition as Next Token Prediction
Paper
•
2312.02142
•
Published
•
11
GIVT: Generative Infinite-Vocabulary Transformers
Paper
•
2312.02116
•
Published
•
10
Paper
•
2312.00860
•
Published
•
8
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
•
2312.00849
•
Published
•
8
Style Aligned Image Generation via Shared Attention
Paper
•
2312.02133
•
Published
•
8
Generative Rendering: Controllable 4D-Guided Video Generation with 2D
Diffusion Models
Paper
•
2312.01409
•
Published
•
8
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Paper
•
2312.01407
•
Published
•
6
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training
Paper
•
2312.01663
•
Published
•
3
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
Merlin:Empowering Multimodal LLMs with Foresight Minds
Paper
•
2312.00589
•
Published
•
24
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
•
2312.00777
•
Published
•
21
SeaLLMs -- Large Language Models for Southeast Asia
Paper
•
2312.00738
•
Published
•
23
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
•
2312.00063
•
Published
•
15
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Paper
•
2312.00093
•
Published
•
14
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion
Models
Paper
•
2312.00079
•
Published
•
14
Dolphins: Multimodal Language Model for Driving
Paper
•
2312.00438
•
Published
•
12
Instruction-tuning Aligns LLMs to the Human Brain
Paper
•
2312.00575
•
Published
•
11
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style
Adapter
Paper
•
2312.00330
•
Published
•
10
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Paper
•
2312.00109
•
Published
•
9
PyNeRF: Pyramidal Neural Radiance Fields
Paper
•
2312.00252
•
Published
•
8
Towards Accurate Differential Diagnosis with Large Language Models
Paper
•
2312.00164
•
Published
•
8
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Paper
•
2312.00451
•
Published
•
9
Text-Guided 3D Face Synthesis -- From Generation to Editing
Paper
•
2312.00375
•
Published
•
8
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap
Between Text-to-2D and Text-to-3D Generation
Paper
•
2312.00085
•
Published
•
6
FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline
Paper
•
2311.13073
•
Published
•
56
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper
•
2311.13384
•
Published
•
50
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
47
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
•
2311.13231
•
Published
•
26
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Paper
•
2311.13435
•
Published
•
16
Visual In-Context Prompting
Paper
•
2311.13601
•
Published
•
16
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on
Diffusion Models
Paper
•
2311.13141
•
Published
•
13
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
•
2311.12052
•
Published
•
32
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Paper
•
2311.12198
•
Published
•
22
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
Text-to-Image Generation
Paper
•
2311.12229
•
Published
•
26
Exponentially Faster Language Modelling
Paper
•
2311.10770
•
Published
•
118
Make Pixels Dance: High-Dynamic Video Generation
Paper
•
2311.10982
•
Published
•
68
Orca 2: Teaching Small Language Models How to Reason
Paper
•
2311.11045
•
Published
•
70
System 2 Attention (is something you might need too)
Paper
•
2311.11829
•
Published
•
39
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
•
2311.11501
•
Published
•
33
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
•
2311.10794
•
Published
•
24
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
•
2311.11243
•
Published
•
14
Drivable 3D Gaussian Avatars
Paper
•
2311.08581
•
Published
•
46
GRIM: GRaph-based Interactive narrative visualization for gaMes
Paper
•
2311.09213
•
Published
•
12
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
Paper
•
2311.08469
•
Published
•
10
PEARL: Personalizing Large Language Model Writing Assistants with
Generation-Calibrated Retrievers
Paper
•
2311.09180
•
Published
•
7
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads
to Answers Faster
Paper
•
2311.08263
•
Published
•
15