Hiraishin's picture
Update README.md
9023b2c verified
metadata
license: mit

YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.

This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.

Dataset Classes

The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:

  • Caption
  • Footnote
  • Formula
  • List-item
  • Page-footer
  • Page-header
  • Picture
  • Section-header
  • Table
  • Text
  • Title

Benchmark Results

The performance of the trained model was evaluated on the validation set, yielding the following metrics:

Class Images Instances Box(P) Box(R) mAP50 mAP
all 6476 98604 0.905 0.866 0.925 0.759
Caption 6476 1763 0.921 0.868 0.949 0.878
Footnote 6476 312 0.888 0.779 0.839 0.637
Formula 6476 1894 0.893 0.839 0.914 0.748
List-item 6476 13320 0.905 0.915 0.94 0.807
Page-footer 6476 5571 0.94 0.941 0.974 0.651
Page-header 6476 6683 0.952 0.862 0.957 0.702
Picture 6476 1565 0.834 0.827 0.88 0.81
Section-header 6476 15744 0.919 0.902 0.962 0.635
Table 6476 2269 0.87 0.873 0.92 0.865
Text 6476 49185 0.937 0.923 0.967 0.833
Title 6476 298 0.898 0.792 0.873 0.779

These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.

Quick Guide to Run Inferencing

from ultralytics import YOLO
from PIL import Image

onnx_model = YOLO("best.onnx")

results = onnx_model("<path_to_image>", imgsz=1024)

for i, r in enumerate(results):
    im_bgr = r.plot()  
    im_rgb = Image.fromarray(im_bgr[..., ::-1]) 

    r.show()

    r.save(filename=f'results{i}.jpg')