---
license: mit
---
# YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.

This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.


## Dataset Classes
The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
- Caption
- Footnote
- Formula
- List-item
- Page-footer
- Page-header
- Picture
- Section-header
- Table
- Text
- Title


## Benchmark Results
The performance of the trained model was evaluated on the validation set, yielding the following metrics:

|      Class      | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
|-----------------|--------|-----------|--------|--------|-------|-----|
| all             | 6476   | 98604     | 0.905  | 0.866  | 0.925 | 0.759 |
| Caption         | 6476   | 1763      | 0.921  | 0.868  | 0.949 | 0.878 |
| Footnote        | 6476   | 312       | 0.888  | 0.779  | 0.839 | 0.637 |
| Formula         | 6476   | 1894      | 0.893  | 0.839  | 0.914 | 0.748 |
| List-item       | 6476   | 13320     | 0.905  | 0.915  | 0.94  | 0.807 |
| Page-footer     | 6476   | 5571      | 0.94   | 0.941  | 0.974 | 0.651 |
| Page-header     | 6476   | 6683      | 0.952  | 0.862  | 0.957 | 0.702 |
| Picture         | 6476   | 1565      | 0.834  | 0.827  | 0.88  | 0.81  |
| Section-header  | 6476   | 15744     | 0.919  | 0.902  | 0.962 | 0.635 |
| Table           | 6476   | 2269      | 0.87   | 0.873  | 0.92  | 0.865 |
| Text            | 6476   | 49185     | 0.937  | 0.923  | 0.967 | 0.833 |
| Title           | 6476   | 298       | 0.898  | 0.792  | 0.873 | 0.779 |

These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.


## Quick Guide to Run Inferencing

```python
from ultralytics import YOLO
from PIL import Image

onnx_model = YOLO("best.onnx")

results = onnx_model("<path_to_image>", imgsz=1024)

for i, r in enumerate(results):
    im_bgr = r.plot()  
    im_rgb = Image.fromarray(im_bgr[..., ::-1]) 

    r.show()

    r.save(filename=f'results{i}.jpg')
```