chrisc36 commited on
Commit
ff82110
1 Parent(s): df092d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -94,6 +94,33 @@ print(generated_text)
94
  # perspective. The puppy is sitting on a wooden deck, which is composed ...
95
  ```
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ## Evaluations
98
 
99
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |
 
94
  # perspective. The puppy is sitting on a wooden deck, which is composed ...
95
  ```
96
 
97
+ To make inference more efficient, run with autocast:
98
+
99
+ ```python
100
+ with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
101
+ output = model.generate_from_batch(
102
+ inputs,
103
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
104
+ tokenizer=processor.tokenizer
105
+ )
106
+ ```
107
+
108
+ We did most of our evaluation in this setting (autocast on, but float32 weights)
109
+
110
+ To even further reduce the memory requirements, the model can be run with bfloat16 weights:
111
+
112
+ ```python
113
+ model.to(dtype=torch.bfloat16)
114
+ inputs["images"] = inputs["images"].to(torch.bfloat16)
115
+ output = model.generate_from_batch(
116
+ inputs,
117
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
118
+ tokenizer=processor.tokenizer
119
+ )
120
+ ```
121
+
122
+ Note that we have observed that this can (rarely) change the output of the model compared to running with float32 weights.
123
+
124
  ## Evaluations
125
 
126
  | Model | Average Score on 11 Academic Benchmarks | Human Preference Elo Rating |