Spaces:

breezedeus
/

Pix2Text-Demo

Running

App Files Files Community

breezedeus commited on Feb 26

Commit

d917a85

•

1 Parent(s): 5bedb5a

p2t v1.0

Browse files

Files changed (11) hide show

README.md +8 -9
app.py +168 -125
docs/examples/formula1.png +0 -0
docs/examples/formula2.jpg +0 -0
docs/examples/hw-formula.png +0 -0
docs/examples/mixed-ch_sim.jpg +0 -0
docs/examples/mixed-ch_tra.jpg +0 -0
docs/examples/mixed-en.jpg +0 -0
docs/examples/mixed-vietnamese.jpg +0 -0
docs/examples/pure-text.jpg +0 -0
requirements.txt +3 -1

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
 title: Pix2Text
-emoji: 🅿❷🆃
 colorFrom: red
 colorTo: blue
 sdk: gradio
-sdk_version: 4.16.0
 app_file: app.py
 pinned: false
 license: mit
@@ -12,14 +12,13 @@ license: mit
 # Pix2Text (P2T)
-[**CnOCR**](https://github.com/breezedeus/cnocr)  is an **Optical Character Recognition (OCR)** toolkit for **Python 3**. It supports recognition of common characters in **English and numbers**, **Simplified Chinese**, **Traditional Chinese** (some models), and **vertical text** recognition. It comes with [**20+ well-trained models**](https://cnocr.readthedocs.io/zh/latest/models/) for different application scenarios and can be used directly after installation. Also, CnOCR provides simple training [commands](https://cnocr.readthedocs.io/zh/latest/train/) for users to train their own models. Welcome to join the WeChat contact group.
-<div align="center">
-  <img src="https://huggingface.co/datasets/breezedeus/cnocr-wx-qr-code/resolve/main/wx-qr-code.JPG" alt="WeChat Group" width="300px"/>
-</div>
-The author also maintains **Planet of Knowledge** [**CnOCR/CnSTD Private Group**](https://t.zsxq.com/FEYZRJQ), welcome to join. The **Planet of Knowledge Private Group** will release some CnOCR/CnSTD related private materials one after another, including [**more detailed training tutorials**](https://articles.zsxq.com/id_u6b4u0wrf46e.html), **non-public models**, answers to problems encountered during usage, etc. This group also releases the latest research materials related to OCR/STD. In addition, **the author in the private group provides free training services for unique data twice a month**.
-## Documentation
-See [CnOCR online documentation](https://cnocr.readthedocs.io/) , in Chinese.

 ---
 title: Pix2Text
+emoji: ♾️
 colorFrom: red
 colorTo: blue
 sdk: gradio
+sdk_version: 4.19.2
 app_file: app.py
 pinned: false
 license: mit
 # Pix2Text (P2T)
+**[Pix2Text (P2T)](https://github.com/breezedeus/pix2text)** aims to be a **free and open-source Python** alternative to **[Mathpix](https://mathpix.com/)**. It can already complete the core functionalities of **Mathpix**. Starting from **V0.2**, **Pix2Text (P2T)** supports recognizing **mixed images containing both text and formulas**, with output similar to **Mathpix**. The core principles of P2T are shown below (text recognition supports both **Chinese** and **English**):
+<div align="center"> <img src="https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F8afb65f8-fd1d-48b9-978a-688554cc759a%2FUntitled.jpeg?table=block&id=39580ae6-09e5-4631-a611-e80e720f3877" alt="Pix2Text workflow" width="600px"/> </div>
+**P2T** utilizes the open-source tool **[CnSTD](https://github.com/breezedeus/cnstd)** to detect the locations of **mathematical formulas** in images. These detected areas are then processed by **P2T**'s own **formula recognition engine (LatexOCR)** to recognize the LaTeX representation of each mathematical formula. The remaining parts of the image are processed by a **text recognition engine ([CnOCR](https://github.com/breezedeus/cnocr) or [EasyOCR](https://github.com/JaidedAI/EasyOCR))** for text detection and recognition. Finally, **P2T** merges all recognition results to obtain the final image recognition outcome. Thanks to these great open-source projects!
+For beginners who are not familiar with Python, we also provide the **free-to-use** [P2T Online Service](https://p2t.breezedeus.com/). Just upload your image and it will output the P2T parsing results. **The online service uses the latest models and works better than the open-source ones.**
+The author also maintains **Planet of Knowledge** [**P2T/CnOCR/CnSTD Private Group**](https://t.zsxq.com/FEYZRJQ), welcome to join. The **Planet of Knowledge Private Group** will release some P2T/CnOCR/CnSTD related private materials one after another, including **non-public models**, **discount for paid models**, answers to problems encountered during usage, etc. This group also releases the latest research materials related to VIE/OCR/STD.

app.py CHANGED Viewed

@@ -1,34 +1,23 @@
 # coding: utf-8
-# Copyright (C) 2023, [Breezedeus](https://github.com/breezedeus).
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-# Ref: https://huggingface.co/spaces/hysts/Manga-OCR/blob/main/app.py
 import os
 import json
 import functools
 import random
 import string
 import time
 import yaml
 import gradio as gr
 import numpy as np
 # from cnstd.utils import pil_to_numpy, imsave
@@ -38,10 +27,47 @@ from pix2text.utils import set_logger, merge_line_texts
 logger = set_logger()
 LANGUAGES = yaml.safe_load(open('languages.yaml', 'r', encoding='utf-8'))['languages']
-def get_p2t_model(lan_list: list):
-    p2t = Pix2Text(languages=lan_list)
     return p2t
@@ -50,27 +76,36 @@ def latex_render(latex_str):
     # return latex_str
-def recognize(lang_list, rec_type, resized_shape, image_file):
     lang_list = [LANGUAGES[l] for l in lang_list]
-    p2t = get_p2t_model(lang_list)
-    if rec_type == 'Formula & Text':
         suffix = list(string.ascii_letters)
         random.shuffle(suffix)
         suffix = ''.join(suffix[:6])
         out_det_fp = f'out-det-{time.time()}-{suffix}.jpg'
-        outs = p2t(
-            image_file, resized_shape=resized_shape, save_analysis_res=out_det_fp
         )
         # To get just the text contents, use:
         only_text = merge_line_texts(outs, auto_line_break=True)
         # return only_text, latex_render(only_text)
-        return only_text, out_det_fp
-    elif rec_type == 'Only Formula':
         only_text = p2t.recognize_formula(image_file)
         return latex_render(only_text), None
-    elif rec_type == 'Only Text':
         only_text = p2t.recognize_text(image_file)
         return only_text, None
@@ -80,77 +115,71 @@ def main():
     langs.sort(key=lambda x: x.lower())
     title = 'Demo'
-    # example_func = functools.partial(
-    #     recognize,
-    #     new_size=768,
-    #     box_score_thresh=0.3,
-    #     min_box_size=10,
-    # )
-    # examples = [
-    #     [
-    #         'ch_PP-OCRv3_det::onnx',
-    #         True,
-    #         'number-densenet_lite_136-fc',
-    #         False,
-    #         'docs/examples/card1-s.jpg',
-    #     ],
-    #     [
-    #         'ch_PP-OCRv3_det::onnx',
-    #         True,
-    #         'number-densenet_lite_136-fc',
-    #         False,
-    #         'docs/examples/card2-s.jpg',
-    #     ],
-    #     [
-    #         'ch_PP-OCRv3_det::onnx',
-    #         True,
-    #         'number-densenet_lite_136-fc',
-    #         False,
-    #         'docs/examples/cy1-s.jpg',
-    #     ],
-    #     [
-    #         'ch_PP-OCRv3_det::onnx',
-    #         False,
-    #         'densenet_lite_136-gru',
-    #         False,
-    #         'docs/examples/huochepiao.jpeg',
-    #     ],
-    #     [
-    #         'ch_PP-OCRv3_det::onnx',
-    #         False,
-    #         'densenet_lite_136-gru',
-    #         False,
-    #         'docs/examples/1_res.jpg',
-    #     ],
-    #     [
-    #         'db_shufflenet_v2::pytorch',
-    #         False,
-    #         'en_number_mobile_v2.0',
-    #         False,
-    #         'docs/examples/en_book1.jpeg',
-    #     ],
-    #     [
-    #         'db_shufflenet_v2::pytorch',
-    #         False,
-    #         'densenet_lite_136-gru',
-    #         True,
-    #         'docs/examples/beauty0.jpg',
-    #     ],
-    # ]
     table_desc = """
 <div align="center">
-<img src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Fc41e0b1d-4869-4e39-93db-631569e6a38d%2FUntitled.png?table=block&id=3d0819ca-2e1a-46a7-b6f3-b4cf89cd045c" width="120px"/>
-[![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbreezedeus%2FCnOCR-Demo&labelColor=%23697689&countColor=%23f5c791&style=flat&labelStyle=upper)](https://visitorbadge.io/status?path=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbreezedeus%2FCnOCR-Demo)
 [![Discord](https://img.shields.io/discord/1200765964434821260?logo=discord&label=Discord)](https://discord.gg/H9FmDSMA)
 |                                 |                                         |
 | ------------------------------- | --------------------------------------- |
-| 🏄 **Free Web Service**             | [p2t.breezedeus.com](https://p2t.breezedeus.com) |
 | 📀 **Code**              | [Github](https://github.com/breezedeus/pix2text) |
-| 💬 **Discord**              | [P2T @ Discord](https://discord.gg/H9FmDSMA) |
 | 👨🏻‍💻 **Author**            | [Breezedeus](https://www.breezedeus.com) |
 If useful, please help to **star 🌟 [Pix2Text](https://github.com/breezedeus/pix2text)** 🙏
@@ -169,31 +198,38 @@ If useful, please help to **star 🌟 [Pix2Text](https://github.com/breezedeus/p
                     choices=langs,
                     value=['English', 'Chinese Simplified'],
                     multiselect=True,
-                    info='Which languages to be recognized as Texts.',
                 )
-                rec_type = gr.Radio(
-                    choices=['Formula & Text', 'Only Formula', 'Only Text'],
-                    label='Image Type',
-                    value='Formula & Text',
-                    info='Which type of image to be recognized.',
                 )
-                resized_shape = gr.Slider(
-                    label='resized_shape',
-                    minimum=512,
-                    maximum=2048,
-                    value=608,
-                    step=32,
                 )
-                # with gr.Accordion('Choose Text Languages', open=False):
-                #     lang_list = gr.Checkboxgroup(
-                #         label='Text Languages',
-                #         choices=langs,
-                #         value=['English', 'Chinese Simplified'],
-                #     )
             with gr.Column(scale=6, variant='compact'):
                 gr.Markdown('### Upload Image to be Recognized')
-                image_file = gr.Image(label='Image', type="pil", image_mode='RGB', show_label=False)
                 sub_btn = gr.Button("Submit", variant="primary")
             with gr.Column(scale=2, variant='compact'):
@@ -205,9 +241,11 @@ If useful, please help to **star 🌟 [Pix2Text](https://github.com/breezedeus/p
                     label='Detection Result', scale=1, show_label=False
                 )
             with gr.Column(scale=1, variant='compact'):
-                gr.Markdown('**Recognition Result**')
                 rec_result = gr.Textbox(
-                    label=f'Recognition Result',
                     lines=5,
                     value='',
                     scale=1,
@@ -218,24 +256,29 @@ If useful, please help to **star 🌟 [Pix2Text](https://github.com/breezedeus/p
             # rec_result.change(latex_render, rec_result, render_result)
         sub_btn.click(
             recognize,
-            inputs=[lang_list, rec_type, resized_shape, image_file,],
             outputs=[rec_result, det_result],
         )
-        # gr.Examples(
-        #     label='示例',
-        #     examples=examples,
-        #     inputs=[
-        #         det_model_name,
-        #         is_single_line,
-        #         rec_model_name,
-        #         use_angle_clf,
-        #         image_file,
-        #     ],
-        #     outputs=[out_image, naive_warn, out_texts],
-        #     fn=example_func,
-        #     cache_examples=os.getenv('CACHE_EXAMPLES') == '1',
-        # )
     demo.queue(max_size=10)
     demo.launch()

 # coding: utf-8
+# [Pix2Text](https://github.com/breezedeus/pix2text): an Open-Source Alternative to Mathpix.
+# Copyright (C) 2022-2024, [Breezedeus](https://www.breezedeus.com).
 import os
 import json
 import functools
 import random
+import shutil
 import string
+import tempfile
 import time
+import zipfile
+from pathlib import Path
 import yaml
 import gradio as gr
 import numpy as np
+from huggingface_hub import hf_hub_download
 # from cnstd.utils import pil_to_numpy, imsave
 logger = set_logger()
 LANGUAGES = yaml.safe_load(open('languages.yaml', 'r', encoding='utf-8'))['languages']
+OUTPUT_RESULT_DIR = Path('./output-results')
+OUTPUT_RESULT_DIR.mkdir(exist_ok=True)
+def prepare_mfd_model():
+    target_fp = './yolov7-model/mfd-yolov7-epoch224-20230613.pt'
+    if os.path.exists(target_fp):
+        return target_fp
+    HF_TOKEN = os.environ.get('HF_TOKEN')
+    local_path = hf_hub_download(
+        repo_id='breezedeus/paid-models',
+        subfolder='cnstd/1.2',
+        filename='yolov7-model-20230613.zip',
+        repo_type="model",
+        cache_dir='./',
+        token=HF_TOKEN,
+    )
+    with zipfile.ZipFile(local_path) as zf:
+        zf.extractall('./')
+    return target_fp
+def get_p2t_model(lan_list: list, mfd_model_name: str, mfr_model_name: str):
+    analyzer_config = {}
+    if 'yolov7_tiny' not in mfd_model_name:
+        mfd_fp = prepare_mfd_model()
+        analyzer_config = dict(  # 声明 LayoutAnalyzer 的初始化参数
+            model_name='mfd',
+            model_type='yolov7',  # 表示使用的是 YoloV7 模型，而不是 YoloV7_Tiny 模型
+            model_fp=mfd_fp,  # 注：修改成你的模型文件所存储的路径
+        )
+    formula_config = {}
+    if 'mfr-pro' in mfr_model_name:
+        formula_config = dict(  # 声明 LayoutAnalyzer 的初始化参数
+            model_name='mfr-pro', model_backend='onnx',
+        )
+    p2t = Pix2Text(
+        languages=lan_list,
+        analyzer_config=analyzer_config,
+        formula_config=formula_config,
+    )
     return p2t
     # return latex_str
+def recognize(
+    lang_list, mfd_model_name, mfr_model_name, rec_type, resized_shape, image_file
+):
     lang_list = [LANGUAGES[l] for l in lang_list]
+    p2t = get_p2t_model(lang_list, mfd_model_name, mfr_model_name)
+    if rec_type == 'mixed':
         suffix = list(string.ascii_letters)
         random.shuffle(suffix)
         suffix = ''.join(suffix[:6])
         out_det_fp = f'out-det-{time.time()}-{suffix}.jpg'
+        # 如果 OUTPUT_RESULT_DIR 文件数量超过 1000，按时间删除最早的 1000 个文件
+        if len(os.listdir(OUTPUT_RESULT_DIR)) > 1000:
+            for fp in sorted(os.listdir(OUTPUT_RESULT_DIR))[:1000]:
+                os.remove(OUTPUT_RESULT_DIR / fp)
+        outs = p2t.recognize(
+            image_file,
+            resized_shape=resized_shape,
+            save_analysis_res=OUTPUT_RESULT_DIR / out_det_fp,
         )
         # To get just the text contents, use:
         only_text = merge_line_texts(outs, auto_line_break=True)
         # return only_text, latex_render(only_text)
+        return only_text, str(OUTPUT_RESULT_DIR / out_det_fp)
+    elif rec_type == 'formula':
         only_text = p2t.recognize_formula(image_file)
         return latex_render(only_text), None
+    elif rec_type == 'text':
         only_text = p2t.recognize_text(image_file)
         return only_text, None
     langs.sort(key=lambda x: x.lower())
     title = 'Demo'
+    example_func = functools.partial(
+        recognize,
+        mfd_model_name='yolov7 (paid)',
+        mfr_model_name='mfr-pro',
+        rec_type='mixed',
+        resized_shape=768,
+    )
+    examples = [
+        [
+            ['English'],
+            'mixed',
+            'docs/examples/mixed-en.jpg',
+        ],
+        [
+            ['English', 'Chinese Simplified'],
+            'mixed',
+            'docs/examples/mixed-ch_sim.jpg',
+        ],
+        [
+            ['English', 'Chinese Traditional'],
+            'mixed',
+            'docs/examples/mixed-ch_tra.jpg',
+        ],
+        [
+            ['English', 'Vietnamese'],
+            'mixed',
+            'docs/examples/mixed-vietnamese.jpg',
+        ],
+        [
+            ['English'],
+            'formula',
+            'docs/examples/formula1.png'
+        ],
+        [
+            ['English'],
+            'formula',
+            'docs/examples/formula2.jpg'
+        ],
+        [
+            ['English'],
+            'formula',
+            'docs/examples/hw-formula.png'
+        ],
+        [
+            ['English', 'Chinese Simplified'],
+            'text',
+            'docs/examples/pure-text.jpg',
+        ],
+    ]
     table_desc = """
 <div align="center">
+<img src="https://www.notion.so/image/https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2F9341931a-53f0-48e1-b026-0f1ad17b457c%2Fd0e55da8-36a5-482c-bea6-c389e2fcacea%2FUntitled.png?table=block&id=caebb37a-e23f-49ab-9687-2cba3801992e" width="120px"/>
+[![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbreezedeus%2Fpix2text-demo&labelColor=%23697689&countColor=%23f5c791&style=flat&labelStyle=upper)](https://visitorbadge.io/status?path=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbreezedeus%2FCnOCR-Demo)
 [![Discord](https://img.shields.io/discord/1200765964434821260?logo=discord&label=Discord)](https://discord.gg/H9FmDSMA)
 |                                 |                                         |
 | ------------------------------- | --------------------------------------- |
+| 🏄 **Online Service**             | [p2t.breezedeus.com](https://p2t.breezedeus.com) |
+| 💬 **Discord**              | [Pix2Text @ Discord](https://discord.gg/tGuFEybd) |
 | 📀 **Code**              | [Github](https://github.com/breezedeus/pix2text) |
+| 🤗 **MFR Model**              | [breezedeus/pix2text-mfr](https://huggingface.co/breezedeus/pix2text-mfr) |
+| 📄 **More Infos**              | [breezedeus.com/pix2text](https://www.breezedeus.com/pix2text) |
 | 👨🏻‍💻 **Author**            | [Breezedeus](https://www.breezedeus.com) |
 If useful, please help to **star 🌟 [Pix2Text](https://github.com/breezedeus/pix2text)** 🙏
                     choices=langs,
                     value=['English', 'Chinese Simplified'],
                     multiselect=True,
+                    # info='Which languages to be recognized as Texts.',
                 )
+                mfd_model_name = gr.Dropdown(
+                    label='MFD Models',
+                    choices=['yolov7_tiny (free)', 'yolov7 (paid)'],
+                    value='yolov7 (paid)',
                 )
+                mfr_model_name = gr.Dropdown(
+                    label='MFR Models',
+                    choices=['mfr (free)', 'mfr-pro (paid)'],
+                    value='mfr-pro (paid)',
                 )
+                rec_type = gr.Dropdown(
+                    label='Image Type',
+                    choices=['mixed', 'formula', 'text'],
+                    value='mixed',
+                    # info='Which type of image to be recognized.',
+                )
+                with gr.Accordion('More Options', open=False):
+                    resized_shape = gr.Slider(
+                        label='resized_shape',
+                        minimum=512,
+                        maximum=2048,
+                        value=768,
+                        step=32,
+                    )
             with gr.Column(scale=6, variant='compact'):
                 gr.Markdown('### Upload Image to be Recognized')
+                image_file = gr.Image(
+                    label='Image', type="pil", image_mode='RGB', show_label=False
+                )
                 sub_btn = gr.Button("Submit", variant="primary")
             with gr.Column(scale=2, variant='compact'):
                     label='Detection Result', scale=1, show_label=False
                 )
             with gr.Column(scale=1, variant='compact'):
+                gr.Markdown(
+                    '**Recognition Results (Paste them into the [P2T Online Service](https://p2t.breezedeus.com) to view rendered outcomes)**'
+                )
                 rec_result = gr.Textbox(
+                    label=f'Recognition Result ',
                     lines=5,
                     value='',
                     scale=1,
             # rec_result.change(latex_render, rec_result, render_result)
         sub_btn.click(
             recognize,
+            inputs=[
+                lang_list,
+                mfd_model_name,
+                mfr_model_name,
+                rec_type,
+                resized_shape,
+                image_file,
+            ],
             outputs=[rec_result, det_result],
         )
+        gr.Examples(
+            label='Examples',
+            examples=examples,
+            inputs=[
+                lang_list,
+                rec_type,
+                image_file,
+            ],
+            outputs=[rec_result, det_result],
+            fn=example_func,
+            cache_examples=os.getenv('CACHE_EXAMPLES') == '1',
+        )
     demo.queue(max_size=10)
     demo.launch()