Spaces:
Running
on
T4
Running
on
T4
Update app.py
#3
by
liuhaotian
- opened
app.py
CHANGED
@@ -342,12 +342,13 @@ title_markdown = """
|
|
342 |
|
343 |
ONLY WORKS WITH GPU!
|
344 |
|
345 |
-
You can load the model with
|
|
|
346 |
|
347 |
Recommended configurations:
|
348 |
-
| Hardware |
|
349 |
-
|
350 |
-
| **Bits** |
|
351 |
|
352 |
"""
|
353 |
|
|
|
342 |
|
343 |
ONLY WORKS WITH GPU!
|
344 |
|
345 |
+
You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
|
346 |
+
*Note: 8-bit seems to be slower than both 4-bit/16-bit. Although it has enough VRAM to support 8-bit, until we figure out the inference speed issue, we recommend 4-bit for A10G for the best efficiency.*
|
347 |
|
348 |
Recommended configurations:
|
349 |
+
| Hardware | T4-Small (16G) | A10G-Small (24G) | A100-Large (40G) |
|
350 |
+
|-------------------|-----------------|------------------|------------------|
|
351 |
+
| **Bits** | 4 (default) | 4 | 16 |
|
352 |
|
353 |
"""
|
354 |
|