Universal Assisted Generation: Faster Decoding with Any Assistant Model
ā¢
40
top_k
arbitrarily discarding high-quality continuations? Or top_p
forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p
flag in generate
, fresh from a PR merged today! š„¬min_p
flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?min_p
to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.outlines
library.outlines
folks to stay on top of the constrained generation game š§