Changelog

v0.4.1 (27/01/2022)

Highlights

Visualizing edge weights in OpenSet KIE is now supported! https://github.com/open-mmlab/mmocr/pull/677
Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. https://github.com/open-mmlab/mmocr/pull/757
Now you can use CPU to train/debug your model! https://github.com/open-mmlab/mmocr/pull/752
We have fixed a severe bug that causes users unable to call mmocr.apis.test with our pre-built wheels. https://github.com/open-mmlab/mmocr/pull/667

New Features & Enhancements

Show edge score for openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/677
Download flake8 from github as pre-commit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/695
Deprecate the support for 'python setup.py test' by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/722
Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/721
Extend ctw1500 converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/729
Extend totaltext converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/728
Speed up training by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/739
Add setup multi-processing both in train and test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/757
Support CPU training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/752
Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/756
Remove unnecessary custom_import from test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/758

Bug Fixes

Fix satrn onnxruntime test by @AllentDan in https://github.com/open-mmlab/mmocr/pull/679
Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/675
Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/667
Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https://github.com/open-mmlab/mmocr/pull/690
Fix opencv version to avoid some bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/694
Fix py39 ci error by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/707
Update visualize.py by @TommyZihao in https://github.com/open-mmlab/mmocr/pull/715
Fix link of config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/726
Use yaml.safe_load instead of load by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/753
Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/754

Docs

Fix recog.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/674
Add config tutorial by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/683
Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/692
Add recog & det model summary by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/693
Update docs link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/710
add pull request template.md by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/711
Add website links to readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/731
update readme according to standard by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/742

New Contributors

@MhLiao made their first contribution in https://github.com/open-mmlab/mmocr/pull/690
@TommyZihao made their first contribution in https://github.com/open-mmlab/mmocr/pull/715

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.0...v0.4.1

v0.4.0 (15/12/2021)

Highlights

We release a new text recognition model - ABINet (CVPR 2021, Oral). With it dedicated model design and useful data augmentation transforms, ABINet can achieve the best performance on irregular text recognition tasks. Check it out!
We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievement, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it cannot take the full advantage of OpenSet format. For more information, please read our tutorial.
APIs of models can be exposed through TorchServe. Docs

Breaking Changes & Migration Guide

Postprocessor

Some refactoring processes are still going on. For all text detection models, we unified their decode implementations into a new module category, POSTPROCESSOR, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type argument in bbox_head is deprecated and will be removed in the future release.

Migration Guide: Find a similar line from detection model's config:

text_repr_type=xxx,

And replace it with

postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),

Take a snippet of PANet's config as an example. Before the change, its config for bbox_head looks like:

    bbox_head=dict(
        type='PANHead',
        text_repr_type='poly',
        in_channels=[128, 128, 128, 128],
        out_channels=6,
        loss=dict(type='PANLoss')),

Afterwards:

    bbox_head=dict(
    type='PANHead',
    in_channels=[128, 128, 128, 128],
    out_channels=6,
    loss=dict(type='PANLoss'),
    postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),

There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess or through our api docs.

New Config Structure

We reorganized the configs/ directory by extracting reusable sections into configs/_base_. Now the directory tree of configs/_base_ is organized as follows:

_base_
├── det_datasets
├── det_models
├── det_pipelines
├── recog_datasets
├── recog_models
├── recog_pipelines
└── schedules

Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.

New Features

Support openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/498
Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https://github.com/open-mmlab/mmocr/pull/497
Support Chinese for kie show result by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/464
Add TorchServe support for text detection and recognition by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/522
Save filename in text detection test results by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/570
Add codespell pre-commit hook and fix typos by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/520
Avoid duplicate placeholder docs in CN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/582
Save results to json file for kie. by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/589
Add SAR_CN to ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/579
mim extension for windows by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/641
Support muitiple pipelines for different datasets by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/657
ABINet Framework by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/651

Refactoring

Refactor textrecog config structure by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/617
Refactor text detection config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/626
refactor transformer modules by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/618
refactor textdet postprocess by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/640

Docs

C++ example section by @apiaccess21 in https://github.com/open-mmlab/mmocr/pull/593
install.md Chinese section by @A465539338 in https://github.com/open-mmlab/mmocr/pull/364
Add Chinese Translation of deployment.md. by @fatfishZhao in https://github.com/open-mmlab/mmocr/pull/506
Fix a model link and add the metafile for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/473
Improve docs style by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/474
Enhancement & sync Chinese docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/492
TorchServe docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/539
Update docs menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/564
Docs for KIE CloseSet & OpenSet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/573
Fix broken links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/576
Docstring for text recognition models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/562
Add MMFlow & MIM by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/597
Add MMFewShot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/621
Update model readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/604
Add input size check to model_inference by @mpena-vina in https://github.com/open-mmlab/mmocr/pull/633
Docstring for textdet models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/561
Add MMHuman3D in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/644
Use shared menu from theme instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/655
Refactor docs structure by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/662
Docs fix by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/664

Enhancements

Use bounding box around polygon instead of within polygon by @alexander-soare in https://github.com/open-mmlab/mmocr/pull/469
Add CITATION.cff by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/476
Add py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/475
update model-index.yml by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/484
Use container in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/502
CircleCI Setup by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/611
Remove unnecessary custom_import from train.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/603
Change the upper version of mmcv to 1.5.0 by @zhouzaida in https://github.com/open-mmlab/mmocr/pull/628
Update CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/631
Pass custom_hooks to MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/609
Skip CI when some specific files were changed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/642
Add markdown linter in pre-commit hook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/643
Use shape from loaded image by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/652
Cancel previous runs that are not completed by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/666

Bug Fixes

Modify algorithm "sar" weights path in metafile by @ShoupingShan in https://github.com/open-mmlab/mmocr/pull/581
Fix Cuda CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/472
Fix image export in test.py for KIE models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/486
Allow invalid polygons in intersection and union by default by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/471
Update checkpoints' links for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/518
Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/523
Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/540
Fix paper field in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/550
Unify recognition task names in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/548
Fix py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/563
Always map location to cpu when loading checkpoint by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/567
Fix wrong model builder in recog_test_imgs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/574
Improve dbnet r50 by fixing img std by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/578
Fix resource warning: unclosed file by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/577
Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/587
Keep original texts for kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/588
Fix random seed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/600
Fix DBNet_r50 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/625
Change SBC case to DBC case by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/632
Fix kie demo by @innerlee in https://github.com/open-mmlab/mmocr/pull/610
fix type check by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/650
Remove depreciated image validator in totaltext converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/661
Fix change locals() dict by @Fei-Wang in https://github.com/open-mmlab/mmocr/pull/663
fix #614: textsnake targets by @HolyCrap96 in https://github.com/open-mmlab/mmocr/pull/660

New Contributors

@alexander-soare made their first contribution in https://github.com/open-mmlab/mmocr/pull/469
@A465539338 made their first contribution in https://github.com/open-mmlab/mmocr/pull/364
@fatfishZhao made their first contribution in https://github.com/open-mmlab/mmocr/pull/506
@baudm made their first contribution in https://github.com/open-mmlab/mmocr/pull/497
@ShoupingShan made their first contribution in https://github.com/open-mmlab/mmocr/pull/581
@apiaccess21 made their first contribution in https://github.com/open-mmlab/mmocr/pull/593
@zhouzaida made their first contribution in https://github.com/open-mmlab/mmocr/pull/628
@mpena-vina made their first contribution in https://github.com/open-mmlab/mmocr/pull/633
@Fei-Wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/663

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.3.0...0.4.0

v0.3.0 (25/8/2021)

Highlights

We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
Improve the demo script, ocr.py, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428
Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
The requirement of Polygon3 is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in shapely instead. #448

Breaking Changes & Migration Guide

Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.

mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
mmcv.MODELS -> mmdet.NECKS -> NECKS
mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
mmcv.MODELS -> mmdet.HEADS -> HEADS
mmcv.MODELS -> mmdet.LOSSES -> LOSSES
mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS

To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors) from mmdet.models.builder to mmocr.models.builder. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet. or mmcv. prefix to its name to inform the model builder of the right namespace to work on.

Interested users may check out MMCV's tutorial on Registry for in-depth explanations on its mechanism.

New Features

Automatically replace SyncBN with BN for inference #420, #453
Support batch inference for CRNN and SegOCR #407
Support exporting documentation in pdf or epub format #406
Support persistent_workers option in data loader #459

Bug Fixes

Remove depreciated key in kie_test_imgs.py #381
Fix dimension mismatch in batch testing/inference of DBNet #383
Fix the problem of dice loss which stays at 1 with an empty target given #408
Fix a wrong link in ocr.py (@naarkhoo) #417
Fix undesired assignment to "pretrained" in test.py #418
Fix a problem in polygon generation of DBNet #421, #443
Skip invalid annotations in totaltext_converter #438
Add zero division handler in poly utils, remove Polygon3 #448

Improvements

Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
Support MIM #394
Add tests for PyTorch 1.9 in CI #401
Enables fullscreen layout in readthedocs #413
General documentation enhancement #395
Update version checker #427
Add copyright info #439
Update citation information #440

Contributors

We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!

v0.2.1 (20/7/2021)

Highlights

Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
Unified parameter initialization method which uses init_cfg in config files #365

New Features

Support TextOCR dataset #293
Support Total-Text dataset #266, #273, #357
Support grouping text detection box into lines #290, #304
Add benchmark_processing script that benchmarks data loading process #261
Add SynthText preprocessor for text recognition models #351, #361
Support batch inference during testing #310
Add user-friendly OCR inference script #366

Bug Fixes

Fix improper class ignorance in SDMGR Loss #221
Fix potential numerical zero division error in DRRG #224
Fix installing requirements with pip and mim #242
Fix dynamic input error of DBNet #269
Fix space parsing error in LineStrParser #285
Fix textsnake decode error #264
Correct isort setup #288
Fix a bug in SDMGR config #316
Fix kie_test_img for KIE nonvisual #319
Fix metafiles #342
Fix different device problem in FCENet #334
Ignore improper tailing empty characters in annotation files #358
Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
Fix NRTR config #356, #370

Improvements

Add backend for resizeocr #244
Skip image processing pipelines in SDMGR novisual #260
Speedup DBNet #263
Update mmcv installation method in workflow #323
Add part of Chinese documentations #353, #362
Add support for ConcatDataset with two workflows #348
Add list_from_file and list_to_file utils #226
Speed up sort_vertex #239
Support distributed evaluation of KIE #234
Add pretrained FCENet on IC15 #258
Support CPU for OCR demo #227
Avoid extra image pre-processing steps #375

v0.2.0 (18/5/2021)

Highlights

Add the NER approach Bert-softmax (NAACL'2019)
Add the text detection method DRRG (CVPR'2020)
Add the text detection method FCENet (CVPR'2021)
Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
Simplify the installation.

New Features

Add Bert-softmax for Ner task #148
Add DRRG #189
Add FCENet #133
Add end-to-end demo #105
Support batch inference #86 #87 #178
Add TPS preprocessor for text recognition #117 #135
Add demo documentation #151 #166 #168 #170 #171
Add checkpoint for Chinese recognition #156
Add metafile #175 #176 #177 #182 #183
Add support for numpy array inference #74

Bug Fixes

Fix the duplicated point bug due to transform for textsnake #130
Fix CTC loss NaN #159
Fix error raised if result is empty in demo #144
Fix results missing if one image has a large number of boxes #98
Fix package missing in dockerfile #109

Improvements

Simplify installation procedure via removing compiling #188
Speed up panet post processing so that it can detect dense texts #188
Add zh-CN README #70 #95
Support windows #89
Add Colab #147 #199
Add 1-step installation using conda environment #193 #194 #195

v0.1.0 (7/4/2021)

Highlights

MMOCR is released.

Main Features

Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
For text detection, support both single-step (PSENet, PANet, DBNet, TextSnake) and two-step (MaskRCNN) methods.
For text recognition, support CTC-loss based method CRNN; Encoder-decoder (with attention) based methods SAR, Robustscanner; Segmentation based method SegOCR; Transformer based method NRTR.
For key information extraction, support GCN based method SDMG-R.
Provide checkpoints and log files for all of the methods above.