load using gptq-for-llama?
#1
by
iateadonut
- opened
When I try to run WizardCoder 4bit, I get this error message:
python server.py --listen --chat --model GodRain_WizardCoder-15B-V1.1-4bit --loader gptq-for-llama
2023-07-25 18:25:26 INFO:Loading GodRain_WizardCoder-15B-V1.1-4bit...
2023-07-25 18:25:26 ERROR:The model could not be loaded because its type could not be inferred from its name.
2023-07-25 18:25:26 ERROR:Please specify the type manually using the --model_type argument.
The oobabooga interface says that:
On some systems, AutoGPTQ can be 2x slower than GPTQ-for-LLaMa. You can manually select the GPTQ-for-LLaMa loader above.
I'm only getting about 2 tokens/s on a 4090, so I'm trying to see how I can speed it up.
- Will GPTQ-for-LLaMA be a better model loader than AutoGPTQ?
- If so, how can I run it? Will it run? And what is the parameter for the --model_type argument?