Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,62 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
# YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.
|
5 |
+
|
6 |
+
This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.
|
7 |
+
|
8 |
+
|
9 |
+
## Dataset Classes
|
10 |
+
The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
|
11 |
+
- Caption
|
12 |
+
- Footnote
|
13 |
+
- Formula
|
14 |
+
- List-item
|
15 |
+
- Page-footer
|
16 |
+
- Page-header
|
17 |
+
- Picture
|
18 |
+
- Section-header
|
19 |
+
- Table
|
20 |
+
- Text
|
21 |
+
- Title
|
22 |
+
|
23 |
+
|
24 |
+
## Benchmark Results
|
25 |
+
The performance of the trained model was evaluated on the validation set, yielding the following metrics:
|
26 |
+
|
27 |
+
| Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
|
28 |
+
|-----------------|--------|-----------|--------|--------|-------|-----|
|
29 |
+
| all | 6476 | 98604 | 0.905 | 0.866 | 0.925 | 0.759 |
|
30 |
+
| Caption | 6476 | 1763 | 0.921 | 0.868 | 0.949 | 0.878 |
|
31 |
+
| Footnote | 6476 | 312 | 0.888 | 0.779 | 0.839 | 0.637 |
|
32 |
+
| Formula | 6476 | 1894 | 0.893 | 0.839 | 0.914 | 0.748 |
|
33 |
+
| List-item | 6476 | 13320 | 0.905 | 0.915 | 0.94 | 0.807 |
|
34 |
+
| Page-footer | 6476 | 5571 | 0.94 | 0.941 | 0.974 | 0.651 |
|
35 |
+
| Page-header | 6476 | 6683 | 0.952 | 0.862 | 0.957 | 0.702 |
|
36 |
+
| Picture | 6476 | 1565 | 0.834 | 0.827 | 0.88 | 0.81 |
|
37 |
+
| Section-header | 6476 | 15744 | 0.919 | 0.902 | 0.962 | 0.635 |
|
38 |
+
| Table | 6476 | 2269 | 0.87 | 0.873 | 0.92 | 0.865 |
|
39 |
+
| Text | 6476 | 49185 | 0.937 | 0.923 | 0.967 | 0.833 |
|
40 |
+
| Title | 6476 | 298 | 0.898 | 0.792 | 0.873 | 0.779 |
|
41 |
+
|
42 |
+
These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.
|
43 |
+
|
44 |
+
|
45 |
+
## Quick Guide to Run Inferencing
|
46 |
+
|
47 |
+
```python
|
48 |
+
from ultralytics import YOLO
|
49 |
+
from PIL import Image
|
50 |
+
|
51 |
+
onnx_model = YOLO("best.onnx")
|
52 |
+
|
53 |
+
results = onnx_model("<path_to_image>", imgsz=512)
|
54 |
+
|
55 |
+
for i, r in enumerate(results):
|
56 |
+
im_bgr = r.plot()
|
57 |
+
im_rgb = Image.fromarray(im_bgr[..., ::-1])
|
58 |
+
|
59 |
+
r.show()
|
60 |
+
|
61 |
+
r.save(filename=f'results{i}.jpg')
|
62 |
+
```
|