How to make it (Llama-2-13B-chat-GPTQ) work with Fastchat

#30

by Vishvendra - opened Aug 9, 2023

Aug 9, 2023

this model is not loading in Fastchat, is there any GPTQ which is build with GPTQ

Vishvendra changed discussion title from How to make it (Llama-2-13B-chat-GPTQ) wot with Fastchat to How to make it (Llama-2-13B-chat-GPTQ) work with Fastchat Aug 9, 2023

TheBloke

Owner Aug 9, 2023

•

edited Aug 9, 2023

With this model, the one in main is built with an GPTQ-for-LLaMa branch. And the ones in the other branches are made with AutoGPTQ.

The one in main - made with a very old version of GPTQ-for-LLaMa - will not work with FastChat, but all the others should.

In my more recent repos, all GPTQs are made with AutoGPTQ and should be compatible with FastChat.

Vishvendra

Aug 10, 2023

Thanks, for the quick response. let me check the AutoGPTQ one.

Vishvendra

Aug 10, 2023

•

edited Aug 10, 2023

I tried with AutoGPTQ and FastChat with no luck. Do you have any documentation/PR/Readme paper which have the process defined.

ETZhangSX

Oct 10, 2023

@Vishvendra refer to this link
https://huggingface.co/docs/transformers/main_classes/quantization#autogptq-integration

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment