Transformer Pipeline
Loading Gemma 2b.it model with this code:
model_version = 2
model_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
model_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/config.json"
tokenizer_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
tokenizer_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/tokenizer_config.json"
model_config = AutoConfig.from_pretrained(model_config)
model = AutoModelForCausalLM.from_pretrained(model_id, config=model_config, device_map='auto')
tokenizer_config = AutoConfig.from_pretrained(tokenizer_config)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, config=tokenizer_config, device_map='auto', return_tensors="pt")
Executing the generation as follow:
input_text = "Write a python function to print all elements of a list."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=16)
print(tokenizer.decode(outputs[0]))
Some text is generated. But creating a transformers.pipeline as follow, the only text in output is the input text.
query_pipeline = transformers.pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
device_map="auto",
framework="pt",
)
input_text = "Write a python function to print all elements of a list."
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)
print(f"Result: {result}")
This is the output:
Result: [{'generated_text': 'Write a python function to print all elements of a list.'}]
This procedure is correct or there are some mistakes?
Instead, when the pipeline is applying the chat-template in this way before executing the pipeline generates some text:
chat = [
{ "role": "user", "content": input_text },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
But the text is also generated creatina a pipeline of type "conversational" and passing a chat like this:
chat = [
{ "role": "user", "content": input_text },
]
There's a problem with the TextGenerationPipeline?
Even I am struggling with this
Is this using the right chat template and control tokens under the hood?
I had the same issue the generated_text is the same as input. I found a way to fix this.
Modify the code:
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)
to:
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
add_special_tokens=True
)
ah good to know! cc @osanseviero in case we should specify this somewhere?
To utilize pipeline the chat template must be used. Using pipeline without chat template does not generate any new tokens.
Interesting, cc @ArthurZ @Rocketknight1 do you think there is something we need to upstream in transformers pipeline?
But shouldn't the text-generation pipeline produce new tokens as for all the other models?
Also for gemma-7b-it it sometimes generates tokens for me.