File size: 1,416 Bytes
d4ecb34 1b8f5e0 cd358bb 1b8f5e0 cd358bb 64495d8 cd358bb 1b8f5e0 cd358bb 1b8f5e0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
<h1>General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
</h1>
[GitHub](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/tree/main)
## Usage
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
```
torch==2.0.1
torchvision==0.15.2
transformers==4.37.2
megfile==3.1.2
```
```python
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True)
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id)
model = model.eval().cuda()
# input your test image
image_file = 'xxx.jpg'
# plain texts OCR
model.chat(tokenizer, image_file, ocr_type='ocr')
# format texts OCR:
model.chat(tokenizer, image_file, ocr_type='format')
# fine-grained OCR:
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='')
model.chat(tokenizer, image_file, ocr_type='ocr', ocr_color='')
model.chat(tokenizer, image_file, ocr_type='format', ocr_color='')
# multi-crop OCR:
res = model.chat_crop(tokenizer, image_file = image_file)
# render the formatted OCR results:
model.chat(tokenizer, image_file, ocr_type='format', ocr_box='', ocr_color='', render=True, save_render_file = './demo.html')
print(res)
``` |