metadata
license: mit
YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.
This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.
Dataset Classes
The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
- Caption
- Footnote
- Formula
- List-item
- Page-footer
- Page-header
- Picture
- Section-header
- Table
- Text
- Title
Benchmark Results
The performance of the trained model was evaluated on the validation set, yielding the following metrics:
Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
---|---|---|---|---|---|---|
all | 6476 | 98604 | 0.905 | 0.866 | 0.925 | 0.759 |
Caption | 6476 | 1763 | 0.921 | 0.868 | 0.949 | 0.878 |
Footnote | 6476 | 312 | 0.888 | 0.779 | 0.839 | 0.637 |
Formula | 6476 | 1894 | 0.893 | 0.839 | 0.914 | 0.748 |
List-item | 6476 | 13320 | 0.905 | 0.915 | 0.94 | 0.807 |
Page-footer | 6476 | 5571 | 0.94 | 0.941 | 0.974 | 0.651 |
Page-header | 6476 | 6683 | 0.952 | 0.862 | 0.957 | 0.702 |
Picture | 6476 | 1565 | 0.834 | 0.827 | 0.88 | 0.81 |
Section-header | 6476 | 15744 | 0.919 | 0.902 | 0.962 | 0.635 |
Table | 6476 | 2269 | 0.87 | 0.873 | 0.92 | 0.865 |
Text | 6476 | 49185 | 0.937 | 0.923 | 0.967 | 0.833 |
Title | 6476 | 298 | 0.898 | 0.792 | 0.873 | 0.779 |
These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.
Quick Guide to Run Inferencing
from ultralytics import YOLO
from PIL import Image
onnx_model = YOLO("best.onnx")
results = onnx_model("<path_to_image>", imgsz=1024)
for i, r in enumerate(results):
im_bgr = r.plot()
im_rgb = Image.fromarray(im_bgr[..., ::-1])
r.show()
r.save(filename=f'results{i}.jpg')