|
For example, you can use the [TextStreamer] class to stream the output of generate() into |
|
your screen, one word at a time: |
|
thon |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
|
tok = AutoTokenizer.from_pretrained("openai-community/gpt2") |
|
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2") |
|
inputs = tok(["An increasing sequence: one,"], return_tensors="pt") |
|
streamer = TextStreamer(tok) |
|
Despite returning the usual output, the streamer will also print the generated text to stdout. |
|
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=20) |
|
An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven, |
|
|
|
Decoding strategies |
|
Certain combinations of the generate() parameters, and ultimately generation_config, can be used to enable specific |
|
decoding strategies. |