5CD-AI
/

Vintern-1B-v3_5

@@ -10,16 +10,19 @@ base_model:
 pipeline_tag: image-text-to-text
 ---
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/-G297bBqMzYvTbD6_Bkd9.png)
 # Vintern-1B-v3.5 ❄️ (Viet-InternVL2-1B-v3.5) - The Ultimate Multimodal Solution 🌏
 We introduce **Vintern-1B-v3.5**, the latest version in the Vintern series, offering significant improvements over v2 across all evaluation benchmarks. This model has been fine-tuned from **InternVL-1B-2.5**, which already good in Vietnamese tasks because it used [Viet-ShareGPT-4o-Text-VQA](https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA) data during its fine-tuning process by the InternVL 2.5 [1] team.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/a1V1DA1o4Gf_MJblWTz-L.png)
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/36jb5bgyYCoVKx3NE8Iuv.png)
 To further enhance its performance in Vietnamese while maintaining robust capabilities on existing English datasets, **Vintern-1B-v3.5** has been fine-tuned using a vast amount of Vietnamese-specific data. This results in a model that is exceptionally powerful in text recognition, OCR, and understanding Vietnam-specific documents.
@@ -41,7 +44,9 @@ The model can be customized for specific tasks with minimal effort.
 ## Benchmarks 📈
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/DrUCZuXuMz47uVU4zqnJ4.png)
 | Benchmark     | InternVL2_5 1B | Vintern-1B-v2 | Vintern-1B-v3.5 |
 |:-------------:|:--------------:|:-------------:|:---------------:|
@@ -57,54 +62,26 @@ The model can be customized for specific tasks with minimal effort.
 ## Examples
 <div align="center">
-  <img src="ex_images/1.png" width="500"/>
 </div>
-```
-```
 <div align="center">
-  <img src="ex_images/4.jpg" width="500"/>
 </div>
-```
-```
 <div align="center">
-  <img src="ex_images/2.jpg" width="500"/>
 </div>
-```
-```
-<div align="center">
-  <img src="ex_images/3.png" width="400"/>
-</div>
-```
-```
 <div align="center">
-  <img src="ex_images/5.jpg" width="400"/>
 </div>
-```
-```
-<div align="center">
-  <img src="ex_images/6.png" width="400"/>
-</div>
-```
-```
 ## Quickstart
 Here provides a code snippet to show you how to load the tokenizer and model and how to generate contents.

 pipeline_tag: image-text-to-text
 ---
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/-G297bBqMzYvTbD6_Bkd9.png" width="500"/>
+</div>
 # Vintern-1B-v3.5 ❄️ (Viet-InternVL2-1B-v3.5) - The Ultimate Multimodal Solution 🌏
 We introduce **Vintern-1B-v3.5**, the latest version in the Vintern series, offering significant improvements over v2 across all evaluation benchmarks. This model has been fine-tuned from **InternVL-1B-2.5**, which already good in Vietnamese tasks because it used [Viet-ShareGPT-4o-Text-VQA](https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA) data during its fine-tuning process by the InternVL 2.5 [1] team.
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/a1V1DA1o4Gf_MJblWTz-L.png" width="500"/>
+</div>
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/36jb5bgyYCoVKx3NE8Iuv.png" width="500"/>
+</div>
 To further enhance its performance in Vietnamese while maintaining robust capabilities on existing English datasets, **Vintern-1B-v3.5** has been fine-tuned using a vast amount of Vietnamese-specific data. This results in a model that is exceptionally powerful in text recognition, OCR, and understanding Vietnam-specific documents.
 ## Benchmarks 📈
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/DrUCZuXuMz47uVU4zqnJ4.png" width="500"/>
+</div>
 | Benchmark     | InternVL2_5 1B | Vintern-1B-v2 | Vintern-1B-v3.5 |
 |:-------------:|:--------------:|:-------------:|:---------------:|
 ## Examples
 <div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/1yos0APs6laTCAGhUbN9n.png" width="300"/>
 </div>
 <div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/L5n35_3sz_Wp9fo0C7snq.png" width="300"/>
 </div>
 <div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/E6aqBwFqK38XE1LL9lF2W.png" width="500"/>
 </div>
 <div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6336b5c831efcb5647f00170/Lkt8YLYlDP_VByFjFQX_t.png" width="500"/>
 </div>
 ## Quickstart
 Here provides a code snippet to show you how to load the tokenizer and model and how to generate contents.