Update README.md
#2
by
ShleemLeemTeem
- opened
README.md
CHANGED
@@ -125,7 +125,7 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
|
|
125 |
|
126 |
* `wizardlm-33b-v1.0-uncensored-GPTQ-4bit--1g.act.order.safetensors`
|
127 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
128 |
-
* LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama
|
129 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
130 |
* Works with text-generation-webui, including one-click-installers.
|
131 |
* Parameters: Groupsize = -1. Act Order / desc_act = True.
|
|
|
125 |
|
126 |
* `wizardlm-33b-v1.0-uncensored-GPTQ-4bit--1g.act.order.safetensors`
|
127 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
128 |
+
* LLaMa models also work with [ExLlama](https://github.com/turboderp/exllama), which usually provides much higher performance, and uses less VRAM, than AutoGPTQ.
|
129 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
130 |
* Works with text-generation-webui, including one-click-installers.
|
131 |
* Parameters: Groupsize = -1. Act Order / desc_act = True.
|