Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,8 @@ tags:
|
|
15 |
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs. It presents a thorough exploration to strengthen multimodal LLM perception with a mixture of vision encoders and different input resolutions. The model contains a channel-concatenation-based "CLIP+X" fusion for vision experts with different architectures (ViT/ConvNets) and knowledge (detection/segmentation/OCR/SSL). The resulting family of Eagle models support up to over 1K input resolution and obtain strong results on multimodal LLM benchmarks, especially resolution-sensitive tasks such as optical character recognition and document understanding.
|
16 |
|
17 |
|
|
|
|
|
18 |
**Paper or resources for more information:**
|
19 |
https://github.com/NVlabs/Eagle
|
20 |
|
|
|
15 |
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs. It presents a thorough exploration to strengthen multimodal LLM perception with a mixture of vision encoders and different input resolutions. The model contains a channel-concatenation-based "CLIP+X" fusion for vision experts with different architectures (ViT/ConvNets) and knowledge (detection/segmentation/OCR/SSL). The resulting family of Eagle models support up to over 1K input resolution and obtain strong results on multimodal LLM benchmarks, especially resolution-sensitive tasks such as optical character recognition and document understanding.
|
16 |
|
17 |
|
18 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64618b9496259bec21d44704/BdAIMvo--yG7SpG5xDeYN.png)
|
19 |
+
|
20 |
**Paper or resources for more information:**
|
21 |
https://github.com/NVlabs/Eagle
|
22 |
|