File size: 1,296 Bytes
19a7cd4
010be03
19a7cd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
010be03
19a7cd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Model Card for ResNet-152 Text Detector
This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~140k images, where 50% of them had text and 50% of them had no legible text.

# Model Details
## How to Get Started with the Model
```python
from PIL import Image
import requests
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification

model = AutoModelForImageClassification.from_pretrained(
    "miguelcarv/resnet-152-text-detector",
)

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)

url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((300,300))

inputs = processor(image, return_tensors="pt").pixel_values

with torch.no_grad():
    outputs = model(inputs)
    
logits_per_image = outputs.logits 
probs = logits_per_image.softmax(dim=1) 
print(probs)
# tensor([[0.0767, 0.9233]])
```
# Training Details
- Trained for three epochs
- Resolution: 256x256
- Learning rate: 5e-5
- Optimizer: AdamW
- Batch size: 64
- Trained with FP32