Hiraishin commited on
Commit
8c1a811
1 Parent(s): 7390b54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -1,3 +1,62 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # YOLOv8X Trained on the full DocLayNet Dataset with 1024x1024 image size and 42 batch size.
5
+
6
+ This repository contains the YOLOv8X model trained on the entire DocLayNet dataset, comprising ~41GB of annotated document layout images. The training was conducted utilizing a single A100 GPU with 80GB of memory. The batch size was set to 42, and images were resized to 1024x1024 pixels while retaining the default hyperparameters for image augmentation.
7
+
8
+
9
+ ## Dataset Classes
10
+ The model was trained on all the class labels available in the DocLayNet dataset, which include the following classes:
11
+ - Caption
12
+ - Footnote
13
+ - Formula
14
+ - List-item
15
+ - Page-footer
16
+ - Page-header
17
+ - Picture
18
+ - Section-header
19
+ - Table
20
+ - Text
21
+ - Title
22
+
23
+
24
+ ## Benchmark Results
25
+ The performance of the trained model was evaluated on the validation set, yielding the following metrics:
26
+
27
+ | Class | Images | Instances | Box(P) | Box(R) | mAP50 | mAP |
28
+ |-----------------|--------|-----------|--------|--------|-------|-----|
29
+ | all | 6476 | 98604 | 0.905 | 0.866 | 0.925 | 0.759 |
30
+ | Caption | 6476 | 1763 | 0.921 | 0.868 | 0.949 | 0.878 |
31
+ | Footnote | 6476 | 312 | 0.888 | 0.779 | 0.839 | 0.637 |
32
+ | Formula | 6476 | 1894 | 0.893 | 0.839 | 0.914 | 0.748 |
33
+ | List-item | 6476 | 13320 | 0.905 | 0.915 | 0.94 | 0.807 |
34
+ | Page-footer | 6476 | 5571 | 0.94 | 0.941 | 0.974 | 0.651 |
35
+ | Page-header | 6476 | 6683 | 0.952 | 0.862 | 0.957 | 0.702 |
36
+ | Picture | 6476 | 1565 | 0.834 | 0.827 | 0.88 | 0.81 |
37
+ | Section-header | 6476 | 15744 | 0.919 | 0.902 | 0.962 | 0.635 |
38
+ | Table | 6476 | 2269 | 0.87 | 0.873 | 0.92 | 0.865 |
39
+ | Text | 6476 | 49185 | 0.937 | 0.923 | 0.967 | 0.833 |
40
+ | Title | 6476 | 298 | 0.898 | 0.792 | 0.873 | 0.779 |
41
+
42
+ These results demonstrate the model's capability in detecting various elements of document layouts with high precision and recall.
43
+
44
+
45
+ ## Quick Guide to Run Inferencing
46
+
47
+ ```python
48
+ from ultralytics import YOLO
49
+ from PIL import Image
50
+
51
+ onnx_model = YOLO("best.onnx")
52
+
53
+ results = onnx_model("<path_to_image>", imgsz=512)
54
+
55
+ for i, r in enumerate(results):
56
+ im_bgr = r.plot()
57
+ im_rgb = Image.fromarray(im_bgr[..., ::-1])
58
+
59
+ r.show()
60
+
61
+ r.save(filename=f'results{i}.jpg')
62
+ ```