naverpapago/garnet · Hugging Face

GaRNet

This is text-removal model that introduced in the paper below and first released at this page.
The Surprisingly Straightforward Scene Text Removal Method With Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis.
Hyeonsu Lee, Chankyu Choi
Naver Corp.
In ECCV 2022.

Model description

GaRNet is a generator that create non-text image with given image and coresponding text box mask. It consists of convolution encoder and decoder. The encoder consists of residual block with attention module called Gated Attention.

Gated Attention module has two Spatial attention branch. Each attention branch finds text stroke or its surrounding regions. The module adjusts the weight of these two domains by trainable parameters.

The model was trained in PatchGAN manner with Region-of-Interest Generation.
The discriminator is consists of convolution encoder. Given an image, it determines whether each patch, which indicates text-box regions, is real or fake. All loss functions treat non-textbox regions as 'don't care'.

Intended uses & limitations

This model can be used for areas that require the process of erasing text from an image, such as concealment private information, text editing.
You can use the raw model or pre-trained model.
Note that pre-trained model was trained in both Synthetic and SCUT_EnsText dataset. And the SCUT-EnsText dataset can only be used for non-commercial research purposes.

How to use

You can use inference code in this page.

BibTeX entry and citation info

@inproceedings{lee2022surprisingly,
  title={The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis},
  author={Lee, Hyeonsu and Choi, Chankyu},
  booktitle={European Conference on Computer Vision},
  pages={457--472},
  year={2022},
  organization={Springer}
}