--- license: mit --- # YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size. This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation. ## Dataset Classes The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes: - Caption - Footnote - Formula - List-item - Page-footer - Page-header - Picture - Section-header - Table - Text - Title ## Benchmark Results The performance of the trained model was evaluated on the validation set, yielding the following metrics: | Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP | |-----------------|--------|-----------|--------|--------|-------|-----| | all | 6476 | 98604 | 0.905 | 0.866 | 0.925 | 0.759 | | Caption | 6476 | 1763 | 0.921 | 0.868 | 0.949 | 0.878 | | Footnote | 6476 | 312 | 0.888 | 0.779 | 0.839 | 0.637 | | Formula | 6476 | 1894 | 0.893 | 0.839 | 0.914 | 0.748 | | List-item | 6476 | 13320 | 0.905 | 0.915 | 0.94 | 0.807 | | Page-footer | 6476 | 5571 | 0.94 | 0.941 | 0.974 | 0.651 | | Page-header | 6476 | 6683 | 0.952 | 0.862 | 0.957 | 0.702 | | Picture | 6476 | 1565 | 0.834 | 0.827 | 0.88 | 0.81 | | Section-header | 6476 | 15744 | 0.919 | 0.902 | 0.962 | 0.635 | | Table | 6476 | 2269 | 0.87 | 0.873 | 0.92 | 0.865 | | Text | 6476 | 49185 | 0.937 | 0.923 | 0.967 | 0.833 | | Title | 6476 | 298 | 0.898 | 0.792 | 0.873 | 0.779 | These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall. ## Quick Guide to Run Inferencing ```python from ultralytics import YOLO from PIL import Image onnx_model = YOLO("best.onnx") results = onnx_model("", imgsz=1024) for i, r in enumerate(results): im_bgr = r.plot() im_rgb = Image.fromarray(im_bgr[..., ::-1]) r.show() r.save(filename=f'results{i}.jpg') ```