Are Bloke's models usually slow on Kaggle?
#4
by
fahim9778
- opened
Hello, I have been testing this GPTQ model since this morning and compared to the original model, it's being quite slow.
For example, for this code only:
%%time
# Use the model
input_text = "Which entrepreneur wants to go to Mars?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_new_tokens=500)
print(tokenizer.decode(outputs[0]))
And It had to run almost 6.19 minutes to answer! Is it normal or I am missing something? I have used the following
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map = 'auto',
trust_remote_code=True,
torch_dtype=torch.float16,
cache_dir="./cache",
revision="main"
)
Can anyone please help???