miguelcarv
/

resnet-152-text-detector

Image Classification

Inference Endpoints

Model card Files Files and versions Community

miguelcarv commited on Jan 20

Commit

19a7cd4

•

1 Parent(s): d043739

Create README.md

Files changed (1) hide show

README.md +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# Model Card for ResNet-152 Text Detector
+This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.
+# Model Details
+## How to Get Started with the Model
+```python
+from PIL import Image
+import requests
+import torch
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+model = AutoModelForImageClassification.from_pretrained(
+    "miguelcarv/resnet-152-text-detector",
+)
+processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)
+url = "http://images.cocodataset.org/train2017/000000044520.jpg"
+image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256))
+inputs = processor(image, return_tensors="pt").pixel_values
+with torch.no_grad():
+    outputs = model(inputs)
+logits_per_image = outputs.logits
+probs = logits_per_image.softmax(dim=1)
+print(probs)
+# tensor([[0.0767, 0.9233]])
+```
+# Training Details
+- Trained for three epochs
+- Resolution: 256x256
+- Learning rate: 5e-5
+- Optimizer: AdamW
+- Batch size: 64
+- Trained with FP32