yifeihu
/

TFT-ID-1.0

@@ -1,6 +1,6 @@
 ---
 license: mit
-license_link: https://huggingface.co/microsoft/Florence-2-base-ft/resolve/main/LICENSE
 pipeline_tag: image-text-to-text
 tags:
 - vision
@@ -14,14 +14,14 @@ tags:
 TFT-ID (Table/Figure/Text IDentifier) is a family of object detection models finetuned to extract tables, figures, and text sections in academic papers created by [Yifei Hu](https://x.com/hu_yifei).
-TFT-ID is finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
 - The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
 - TFT-ID models take an image of a single paper page as the input, and return bounding boxes for all tables, figures, and text sections in the given page.
 - The text sections contain clean text content perfect for downstream OCR workflows. However, TFT-ID is not an OCR model.
-![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
 Object Detection results format:
 {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
 'labels': ['label1', 'label2', ...]} }
@@ -36,10 +36,17 @@ We tested the models on paper pages outside the training dataset. The papers are
 Correct output - the model draws correct bounding boxes for every table/figure/text section in the given page and not missing any content.
 | Model                                                         | Total Images | Correct Output | Success Rate |
 |---------------------------------------------------------------|--------------|----------------|--------------|
 | TFT-ID-1.0[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0)   | 373          | 361            | 96.78%       |
 Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.
 ## How to Get Started with the Model
@@ -51,8 +58,8 @@ import requests
 from PIL import Image
 from transformers import AutoProcessor, AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
 prompt = "<OD>"

 ---
 license: mit
+license_link: https://huggingface.co/microsoft/Florence-2-large/resolve/main/LICENSE
 pipeline_tag: image-text-to-text
 tags:
 - vision
 TFT-ID (Table/Figure/Text IDentifier) is a family of object detection models finetuned to extract tables, figures, and text sections in academic papers created by [Yifei Hu](https://x.com/hu_yifei).
+![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/TFT-ID.png)
+TFT-ID is finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large) checkpoints.
 - The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
 - TFT-ID models take an image of a single paper page as the input, and return bounding boxes for all tables, figures, and text sections in the given page.
 - The text sections contain clean text content perfect for downstream OCR workflows. However, TFT-ID is not an OCR model.
 Object Detection results format:
 {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
 'labels': ['label1', 'label2', ...]} }
 Correct output - the model draws correct bounding boxes for every table/figure/text section in the given page and not missing any content.
+Task 1: Table, Figure, and Text Section Identification
 | Model                                                         | Total Images | Correct Output | Success Rate |
 |---------------------------------------------------------------|--------------|----------------|--------------|
 | TFT-ID-1.0[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0)   | 373          | 361            | 96.78%       |
+Task 2: Table and Figure Identification
+| Model                                                         | Total Images | Correct Output | Success Rate |
+|---------------------------------------------------------------|--------------|----------------|--------------|
+| **TFT-ID-1.0**[[HF]](https://huggingface.co/yifeihu/TFT-ID-1.0)   | 258          | 255            | **98.84%**       |
+| TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) | 258          | 253            | 98.06%       |
 Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.
 ## How to Get Started with the Model
 from PIL import Image
 from transformers import AutoProcessor, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("yifeihu/TFT-ID-1.0", trust_remote_code=True)
 prompt = "<OD>"

TFT-ID.png ADDED Viewed