How to stop the prediction once the model is generated a sufficient solution for the asked prompt ?
knowing max_length is kept 300 , but answer is getting ended in 150 , so how to stop the model so that it dont give further prediction .
Any suggestion can help , since I aint sure whats the max length for different prompts , so setting it to a static , some time gives unwanted prediction after the actual prediction is already done.
+1
use this:
import time
import torch
from transformers import pipeline
start = time.time()
'''loading the local checkpoint here, device_map = "auto" decide where to put each layer,
either on the GPU or the CPU'''
pipe = pipeline("text-generation", model="/home/ec2-user/starCoderCheckpointLocal",
torch_dtype=torch.bfloat16, device_map= "auto",load_in_8bit=True)
text = input("Enter query >>")
prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"
prompt = prompt_template.format(query=text)
outputs = pipe(prompt, max_new_tokens=512, stop_sequence='<|end|>', do_sample=True,
temperature=0.2, top_k=50, top_p= 0.95, eos_token_id= 49155)
# print(outputs)
# print( outputs[0]['generated_text'])
generated = outputs[0]['generated_text'].split('<|assistant|>')[-1]
print(generated)
end = seconds = time.time()
time = end - start
print("Time taken: ", str(int(time//60))+"minutes",str(round(time%60))+"seconds")
Hey
@doraexp
i got the value error.
ValueError: The following model_kwargs
are not used by the model: ['stop_sequence'] (note: typos in the generate arguments will also show up in this list)
output = model.generate(
input_ids,
do_sample=True,
min_length=min_length,
max_length=max_length,
temperature=temperature,
early_stopping=True,
stop_sequence='<|end|>',
top_k=50,
top_p= 0.95,
eos_token_id= 49155,
)
I am using stracoder model , any further suggestion to resolve this. Or any alternative ? do suggest
thanks
@doraexp can u please help on this ? really looking forward for ur help .
Hi @MukeshSharma ,
Could you please provide me with the code snippet that you using and the checkpoint that you are trying to load, and the whole error would be really helpful too. :))
I am loading the same checkpoint
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder")
No etc changes
During output i am using
output = model.generate(
input_ids,
do_sample=True,
min_length=min_length,
max_length=max_length,
temperature=temperature,
early_stopping=True,
stop_sequence='<|end|>',
top_k=50,
top_p= 0.95,
eos_token_id= 49155,
)
So nothing etc. I am trying but still this error
i got the value error.
ValueError: The following model_kwargs are not used by the model: ['stop_sequence'] (note: typos in the generate arguments will also show up in this list)
@doraexp
Any help on this , I am still in use of it .
Thanks
Hi @MukeshSharma , sorry I got little busy with some-other stuff and couldn't reply before. Also, I am not sure why you are getting this error.
However, I am downloading the model on my local and then running it. Follow the below step and see if they works for u
Run this python pgrm:
tokenizer = AutoToklsenizer.from_pretrained("HuggingFaceH4/starchat-alpha")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/starchat-alpha")
#mention the directory where you want to save the checkpoint
tokenizer.save_pretrained("/home/ec2-user/starCoderCheckpointLocal")
model.save_pretrained("/home/ec2-user/starCoderCheckpointLocal")
#these commands check if the model is working offline using the local directory
tokenizer = AutoTokenizer.from_pretrained("/home/ec2-user/starCoderCheckpointLocal")
model = AutoModelForCausalLM.from_pretrained("/home/ec2-user/starCoderCheckpointLocal")
Now just run this:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
#checkpoint = "HuggingFaceH4/starchat-alpha"
checkpoint= "/home/ec2-user/starCoderCheckpointLocal"
device = "cuda" *# for GPU usage or "cpu" for CPU usage*
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint,torch_dtype=torch.float16).to(device)
inputs = tokenizer.encode("Create a typescript function that calculates factorial of a number.", return_tensors="pt").to(device)
outputs = model.generate(inputs,max_length=500)
print(tokenizer.decode(outputs[0]))
I hope this helps :)