malaysia-ai
/

YOLOv8X-DocLayNet-Full-1024-42

Model card Files Files and versions Community

Hiraishin commited on Apr 9

Commit

8c1a811

•

1 Parent(s): 7390b54

Update README.md

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -1,3 +1,62 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.
+This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.
+## Dataset Classes
+The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
+- Caption
+- Footnote
+- Formula
+- List-item
+- Page-footer
+- Page-header
+- Picture
+- Section-header
+- Table
+- Text
+- Title
+## Benchmark Results
+The performance of the trained model was evaluated on the validation set, yielding the following metrics:
+|      Class      | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
+|-----------------|--------|-----------|--------|--------|-------|-----|
+| all             | 6476   | 98604     | 0.905  | 0.866  | 0.925 | 0.759 |
+| Caption         | 6476   | 1763      | 0.921  | 0.868  | 0.949 | 0.878 |
+| Footnote        | 6476   | 312       | 0.888  | 0.779  | 0.839 | 0.637 |
+| Formula         | 6476   | 1894      | 0.893  | 0.839  | 0.914 | 0.748 |
+| List-item       | 6476   | 13320     | 0.905  | 0.915  | 0.94  | 0.807 |
+| Page-footer     | 6476   | 5571      | 0.94   | 0.941  | 0.974 | 0.651 |
+| Page-header     | 6476   | 6683      | 0.952  | 0.862  | 0.957 | 0.702 |
+| Picture         | 6476   | 1565      | 0.834  | 0.827  | 0.88  | 0.81  |
+| Section-header  | 6476   | 15744     | 0.919  | 0.902  | 0.962 | 0.635 |
+| Table           | 6476   | 2269      | 0.87   | 0.873  | 0.92  | 0.865 |
+| Text            | 6476   | 49185     | 0.937  | 0.923  | 0.967 | 0.833 |
+| Title           | 6476   | 298       | 0.898  | 0.792  | 0.873 | 0.779 |
+These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.
+## Quick Guide to Run Inferencing
+```python
+from ultralytics import YOLO
+from PIL import Image
+onnx_model = YOLO("best.onnx")
+results = onnx_model("<path_to_image>", imgsz=512)
+for i, r in enumerate(results):
+    im_bgr = r.plot()
+    im_rgb = Image.fromarray(im_bgr[..., ::-1])
+    r.show()
+    r.save(filename=f'results{i}.jpg')
+```