Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Paper
•
2412.04424
•
Published
•
59
Computer Vision in Focus: Advances in Computer Vision and Video Understanding