h2oai
/

h2ovl-mississippi-2b

@@ -85,6 +85,25 @@ print(f'User: {question}\nAssistant: {response}')
 ```
 ## Acknowledgments
 We would like to express our gratitude to the [InternVL team at OpenGVLab](https://github.com/OpenGVLab/InternVL) for their research and codebases, upon which we have built and expanded. We also acknowledge the work of the [LLaVA team](https://github.com/haotian-liu/LLaVA) and the [Monkey team](https://github.com/Yuliang-Liu/Monkey/tree/main/project/mini_monkey) for their insights and techniques used in improving multimodal models.

 ```
+## Benchmarks
+### Performance Comparison of Similar Sized Models Across Multiple Benchmarks - OpenVLM Leaderboard
+| **Models**                 | **Params (B)** | **Avg. Score** | **MMBench** | **MMStar** | **MMMU<sub>VAL</sub>** | **Math Vista** | **Hallusion** | **AI2D<sub>TEST</sub>** | **OCRBench** | **MMVet** |
+|----------------------------|----------------|----------------|-------------|------------|-----------------------|----------------|---------------|-------------------------|--------------|-----------|
+| Qwen2-VL-2B                | 2.1            | **57.2**       | **72.2**    | 47.5       | 42.2                  | 47.8           | **42.4**      | 74.7                    | **797**      | **51.5**  |
+| **H2OVL-Mississippi-2B**    | 2.1            | 54.4           | 64.8        | 49.6       | 35.2                  | **56.8**       | 36.4          | 69.9                    | 782          | 44.7      |
+| InternVL2-2B               | 2.1            | 53.9           | 69.6        | **49.8**   | 36.3                  | 46.0           | 38.0          | 74.1                    | 781          | 39.7      |
+| Phi-3-Vision               | 4.2            | 53.6           | 65.2        | 47.7       | **46.1**              | 44.6           | 39.0          | **78.4**                 | 637          | 44.1      |
+| MiniMonkey                 | 2.2            | 52.7           | 68.9        | 48.1       | 35.7                  | 45.3           | 30.9          | 73.7                    | **794**      | 39.8      |
+| MiniCPM-V-2                | 2.8            | 47.9           | 65.8        | 39.1       | 38.2                  | 39.8           | 36.1          | 62.9                    | 605          | 41.0      |
+| InternVL2-1B               | 0.8            | 48.3           | 59.7        | 45.6       | 36.7                  | 39.4           | 34.3          | 63.8                    | 755          | 31.5      |
+| PaliGemma-3B-mix-448       | 2.9            | 46.5           | 65.6        | 48.3       | 34.9                  | 28.7           | 32.2          | 68.3                    | 614          | 33.1      |
+| **H2OVL-Mississippi-0.8B** | 0.8            | 43.5           | 47.7        | 39.1       | 34.0                  | 39.0           | 29.6          | 53.6                    | 751          | 30.0      |
+| DeepSeek-VL-1.3B           | 2.0            | 39.6           | 63.8        | 39.9       | 33.8                  | 29.8           | 27.6          | 51.5                    | 413          | 29.2      |
 ## Acknowledgments
 We would like to express our gratitude to the [InternVL team at OpenGVLab](https://github.com/OpenGVLab/InternVL) for their research and codebases, upon which we have built and expanded. We also acknowledge the work of the [LLaVA team](https://github.com/haotian-liu/LLaVA) and the [Monkey team](https://github.com/Yuliang-Liu/Monkey/tree/main/project/mini_monkey) for their insights and techniques used in improving multimodal models.