File size: 1,387 Bytes
c748342 257c25c c748342 257c25c 2fe6dfe 257c25c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
library_name: transformers
pipeline_tag: image-segmentation
tags:
- vision
- image-segmentation
- dit
datasets:
- ds4sd/DocLayNet-v1.1
widget:
- src: >-
https://upload.wikimedia.org/wikipedia/commons/c/c3/LibreOffice_Writer_6.3.png
example_title: Wiki
---
Trained for 4 epochs.
Usage:
```
image_processor = AutoImageProcessor.from_pretrained("microsoft/dit-large")
model = BeitForSemanticSegmentation.from_pretrained("jzju/dit-doclaynet")
image = Image.open('img.png').convert('RGB')
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)
# logits are of shape (batch_size, num_labels, height, width)
logits = outputs.logits
out = logits[0].detach()
out.size()
for i in range(11):
plt.imshow(out[i])
plt.show()
```
Labels:
```
1: Caption
2: Footnote
3: Formula
4: List-item
5: Page-footer
6: Page-header
7: Picture
8: Section-header
9: Table
10: Text
11: Title
```
Data label convert:
```
model = BeitForSemanticSegmentation.from_pretrained("microsoft/dit-base", num_labels=11)
ds = load_dataset("ds4sd/DocLayNet-v1.1")
mask = np.zeros([11, 1025, 1025])
for b, c in zip(d["bboxes"], d["category_id"]):
b = [np.clip(int(bb), 0, 1025) for bb in b]
mask[c - 1][b[1]:b[1]+b[3], b[0]:b[0]+b[2]] = 1
mask = [cv2.resize(a, dsize=(56, 56), interpolation=cv2.INTER_AREA) for a in mask]
d["label"] = np.stack(mask)
```
|