Spaces:
Running
A newer version of the Gradio SDK is available:
5.9.0
π₯ 2024.4.28: Good news! The code and pre-trained model of DocScanner are now released!
π Good news! The online demo for DocScanner is now live, allowing for easy image upload and correction.
π₯ Good news! Our new work DocTr++: Deep Unrestricted Document Image Rectification comes out, capable of rectifying various distorted document images in the wild.
π₯ Good news! A comprehensive list of Awesome Document Image Rectification methods is available.
DocScanner
This is a PyTorch/GPU re-implementation of the paper DocScanner: Robust Document Image Rectification with Progressive Learning.
π Demo (Link)
NoteοΌThe model version used in the demo corresponds to "DocScanner-L" as described in the paper.
- Upload the distorted document image to be rectified in the left box.
- Click the "Submit" button.
- The rectified image will be displayed in the right box.
Examples
Training
- We train the Document Localization Module using the Doc3D dataset. Besides, DTD dataset is exploited for background data enhancement.
- We train the Progressive Rectification Module using the Doc3D dataset. Here we use the background-excluded document images for training.
Inference
- Put the pre-trained DocScanner-L to
$ROOT/model_pretrained/
. - Put the distorted images in
$ROOT/distorted/
. - Run the script and the rectified images are saved in
$ROOT/rectified/
by default.python inference.py
Evaluation
- Important. In the DocUNet Benchmark, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ignored by most of the existing works. Before the evaluation, please make a check. Note that the performances in most of the existing work are computed with these two mistaken samples.
- For reproducing the following quantitative performance on the corrected DocUNet Benchmark, please use the geometric rectified images available from Google Drive. For the corrected performance of other methods, please refer to the paper DocScanner.
- Image Metrics: We use the same evaluation code for MS-SSIM and LD as DocUNet Benchmark dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at
$ROOT/ssim_ld_eval.m
. - OCR Metrics: The index of 30 documents (60 images) of DocUNet Benchmark used for our OCR evaluation is
$ROOT/ocr_img.txt
(Setting 1). Please refer to DewarpNet for the index of 25 documents (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). We provide the OCR evaluation code at$ROOT/OCR_eval.py
. The version of pytesseract is 0.3.8, and the version of Tesseract in Windows is recent 5.0.1.20220118. Note that in different operating systems, the calculated performance has slight differences. - W_v and W_h Index: The layout results of DocUNet Benchmark is available at Google Drive.
Method | MS-SSIM | LD | Li-D | ED (Setting 1) | CER | ED (Setting 2) | CER | Para. (M) |
---|---|---|---|---|---|---|---|---|
DocScanner-T | 0.5123 | 7.92 | 2.04 | 501.82 | 0.1823 | 809.46 | 0.2068 | 2.6 |
DocScanner-B | 0.5134 | 7.62 | 1.88 | 434.11 | 0.1652 | 671.48 | 0.1789 | 5.2 |
DocScanner-L | 0.5178 | 7.45 | 1.86 | 390.43 | 0.1486 | 632.34 | 0.1648 | 8.5 |
Citation
Please cite the related works in your publications if it helps your research:
@inproceedings{feng2021doctr,
title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={273--281},
year={2021}
}
@inproceedings{feng2022docgeonet,
title={Geometric Representation Learning for Document Image Rectification},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Wang, Yuechen and Li, Houqiang},
booktitle={Proceedings of the European Conference on Computer Vision},
year={2022}
}
@article{feng2021docscanner,
title={DocScanner: robust document image rectification with progressive learning},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
journal={arXiv preprint arXiv:2110.14968},
year={2021}
}
Acknowledgement
The codes are largely based on DocUNet and DewarpNet. Thanks for their wonderful works.
Contact
For commercial usage, please contact Professor Wengang Zhou (zhwg@ustc.edu.cn) and Hao Feng (haof@mail.ustc.edu.cn).