nielsr HF staff commited on
Commit
d72f414
1 Parent(s): 13aaf62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -6
README.md CHANGED
@@ -9,17 +9,35 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
9
 
10
  Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
11
 
12
- ## Model description
13
-
14
- (to do)
15
-
16
  ## Intended uses & limitations
17
 
18
- You can use the raw model for natural language visual reasoning.
19
 
20
  ### How to use
21
 
22
- (to do)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Training data
25
 
 
9
 
10
  Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
11
 
 
 
 
 
12
  ## Intended uses & limitations
13
 
14
+ You can use the model to determine whether a sentence is true or false given 2 images.
15
 
16
  ### How to use
17
 
18
+ Here is how to use the model in PyTorch:
19
+
20
+ ```
21
+ from transformers import ViltProcessor, ViltForImagesAndTextClassification
22
+ import requests
23
+ from PIL import Image
24
+
25
+ image1 = Image.open(requests.get("https://lil.nlp.cornell.edu/nlvr/exs/ex0_0.jpg", stream=True).raw)
26
+ image2 = Image.open(requests.get("https://lil.nlp.cornell.edu/nlvr/exs/ex0_1.jpg", stream=True).raw)
27
+ text = "The left image contains twice the number of dogs as the right image."
28
+
29
+ processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-nlvr2")
30
+ model = ViltForImagesAndTextClassification.from_pretrained("dandelin/vilt-b32-finetuned-nlvr2")
31
+
32
+ # prepare inputs
33
+ encoding = processor([image1, image2], text, return_tensors="pt")
34
+
35
+ # forward pass
36
+ outputs = model(input_ids=encoding.input_ids, pixel_values=encoding.pixel_values.unsqueeze(0))
37
+ logits = outputs.logits
38
+ idx = logits.argmax(-1).item()
39
+ print("Predicted answer:", model.config.id2label[idx])
40
+ ```
41
 
42
  ## Training data
43