NVEagle
/

Eagle-X5-13B

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

Fuxiao commited on 26 days ago

Commit

10c0d3e

•

1 Parent(s): 594481b

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -14,6 +14,7 @@ tags:
 **Model type:**
 Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs. It presents a thorough exploration to strengthen multimodal LLM perception with a mixture of vision encoders and different input resolutions. The model contains a channel-concatenation-based "CLIP+X" fusion for vision experts with different architectures (ViT/ConvNets) and knowledge (detection/segmentation/OCR/SSL). The resulting family of Eagle models support up to over 1K input resolution and obtain strong results on multimodal LLM benchmarks, especially resolution-sensitive tasks such as optical character recognition and document understanding.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64618b9496259bec21d44704/BdAIMvo--yG7SpG5xDeYN.png)

 **Model type:**
 Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs. It presents a thorough exploration to strengthen multimodal LLM perception with a mixture of vision encoders and different input resolutions. The model contains a channel-concatenation-based "CLIP+X" fusion for vision experts with different architectures (ViT/ConvNets) and knowledge (detection/segmentation/OCR/SSL). The resulting family of Eagle models support up to over 1K input resolution and obtain strong results on multimodal LLM benchmarks, especially resolution-sensitive tasks such as optical character recognition and document understanding.
+[arXiv](TODO) / [Demo](https://huggingface.co/spaces/NVEagle/Eagle-X5-13B-Chat)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64618b9496259bec21d44704/BdAIMvo--yG7SpG5xDeYN.png)