TheBloke/Llama-2-7B-Chat-GPTQ · how to load the GPTQ model using any pipeline method

Aug 31, 2023

is there any way to load this model? for QA with a faiss db retriver

use_triton = False
model_file = "TheBloke/Llama-2-7B-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_file , use_fast=True)
logging.info("Loaded Tokenizer")

model = AutoGPTQForCausalLM.from_quantized(model_file,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton
)

logging.info("Initialize model")
logging.info("*** Pipeline:")

pipe = pipeline(
"question-answering",
model=model,
tokenizer=tokenizer,
max_new_tokens=256,
temperature=0.1,
top_p=0.95,
repetition_penalty=1.15,
do_sample =True,
device_map= "auto"
)

llm = HuggingFacePipeline(pipeline=pipe)

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})

prompt = PromptTemplate(template=qa_template,
input_variables=['context', 'question'])

answer = get_answer(question, llm, prompt, embeddings)

The model 'LlamaGPTQForCausalLM' is not supported for question-answering. Supported models are ['AlbertForQuestionAnswering', 'BartForQuestionAnswering', 'BertForQuestionAnswering', 'BigBirdForQuestionAnswering', 'BigBirdPegasusForQuestionAnswering', 'BloomForQuestionAnswering', 'CamembertForQuestionAnswering', 'CanineForQuestionAnswering', 'ConvBertForQuestionAnswering', 'Data2VecTextForQuestionAnswering', 'DebertaForQuestionAnswering', 'DebertaV2ForQuestionAnswering', 'DistilBertForQuestionAnswering', 'ElectraForQuestionAnswering', 'ErnieForQuestionAnswering', 'ErnieMForQuestionAnswering', 'FalconForQuestionAnswering', 'FlaubertForQuestionAnsweringSimple', 'FNetForQuestionAnswering', 'FunnelForQuestionAnswering', 'GPT2ForQuestionAnswering', 'GPTNeoForQuestionAnswering', 'GPTNeoXForQuestionAnswering', 'GPTJForQuestionAnswering', 'IBertForQuestionAnswering', 'LayoutLMv2ForQuestionAnswering', 'LayoutLMv3ForQuestionAnswering', 'LEDForQuestionAnswering', 'LiltForQuestionAnswering', 'LongformerForQuestionAnswering', 'LukeForQuestionAnswering', 'LxmertForQuestionAnswering', 'MarkupLMForQuestionAnswering', 'MBartForQuestionAnswering', 'MegaForQuestionAnswering', 'MegatronBertForQuestionAnswering', 'MobileBertForQuestionAnswering', 'MPNetForQuestionAnswering', 'MptForQuestionAnswering', 'MraForQuestionAnswering', 'MT5ForQuestionAnswering', 'MvpForQuestionAnswering', 'NezhaForQuestionAnswering', 'NystromformerForQuestionAnswering', 'OPTForQuestionAnswering', 'QDQBertForQuestionAnswering', 'ReformerForQuestionAnswering', 'RemBertForQuestionAnswering', 'RobertaForQuestionAnswering', 'RobertaPreLayerNormForQuestionAnswering', 'RoCBertForQuestionAnswering', 'RoFormerForQuestionAnswering', 'SplinterForQuestionAnswering', 'SqueezeBertForQuestionAnswering', 'T5ForQuestionAnswering', 'UMT5ForQuestionAnswering', 'XLMForQuestionAnsweringSimple', 'XLMRobertaForQuestionAnswering', 'XLMRobertaXLForQuestionAnswering', 'XLNetForQuestionAnsweringSimple', 'XmodForQuestionAnswering', 'YosoForQuestionAnswering'].

webpolis

Aug 31, 2023

Can you make sure to format your code when asking for help?

bhaskararao1212

Feb 7

instead of question answering use the pipeline of text-generation