SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 39
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26 • 30
Floating No More: Object-Ground Reconstruction from a Single Image Paper • 2407.18914 • Published Jul 26 • 18
Harvesting Textual and Structured Data from the HAL Publication Repository Paper • 2407.20595 • Published Jul 30 • 21
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings Paper • 2407.20581 • Published Jul 30 • 23
WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds Paper • 2407.18946 • Published Jul 11 • 12
TAPTRv2: Attention-based Position Update Improves Tracking Any Point Paper • 2407.16291 • Published Jul 23 • 10
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models Paper • 2407.19914 • Published Jul 29 • 12
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks Paper • 2407.19795 • Published Jul 29 • 10
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 56
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 61
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning Paper • 2309.16650 • Published Sep 28, 2023 • 10
RealFill: Reference-Driven Generation for Authentic Image Completion Paper • 2309.16668 • Published Sep 28, 2023 • 14
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning Paper • 2407.18248 • Published Jul 25 • 30
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers Paper • 2308.13494 • Published Aug 25, 2023 • 9