miguelcarv commited on
Commit
19a7cd4
1 Parent(s): d043739

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for ResNet-152 Text Detector
2
+ This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.
3
+
4
+ # Model Details
5
+ ## How to Get Started with the Model
6
+ ```python
7
+ from PIL import Image
8
+ import requests
9
+ import torch
10
+ from transformers import AutoImageProcessor, AutoModelForImageClassification
11
+
12
+ model = AutoModelForImageClassification.from_pretrained(
13
+ "miguelcarv/resnet-152-text-detector",
14
+ )
15
+
16
+ processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)
17
+
18
+ url = "http://images.cocodataset.org/train2017/000000044520.jpg"
19
+ image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256))
20
+
21
+ inputs = processor(image, return_tensors="pt").pixel_values
22
+
23
+ with torch.no_grad():
24
+ outputs = model(inputs)
25
+
26
+ logits_per_image = outputs.logits
27
+ probs = logits_per_image.softmax(dim=1)
28
+ print(probs)
29
+ # tensor([[0.0767, 0.9233]])
30
+ ```
31
+ # Training Details
32
+ - Trained for three epochs
33
+ - Resolution: 256x256
34
+ - Learning rate: 5e-5
35
+ - Optimizer: AdamW
36
+ - Batch size: 64
37
+ - Trained with FP32