Spaces:

badayvedat
/

LLaVA

Running on T4

liuhaotian commited on Oct 11, 2023

Commit

3d4b5d4

•

1 Parent(s): d1e2541

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -343,11 +343,12 @@ title_markdown = """
 ONLY WORKS WITH GPU!
 You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
 Recommended configurations:
-| Hardware          | T4-Small (16G)  | A10G-Medium (24G) | A100-Large (40G) |
-|-------------------|-----------------|-------------------|------------------|
-| **Bits**          | 4 (default)     | 8                 | 16               |
 """

 ONLY WORKS WITH GPU!
 You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
+*Note: 8-bit seems to be slower than both 4-bit/16-bit. Although it has enough VRAM to support 8-bit, until we figure out the inference speed issue, we recommend 4-bit for A10G for the best efficiency.*
 Recommended configurations:
+| Hardware          | T4-Small (16G)  | A10G-Small (24G) | A100-Large (40G) |
+|-------------------|-----------------|------------------|------------------|
+| **Bits**          | 4 (default)     | 4                | 16               |
 """