Voicelab/vlt5-base-keywords · Text generation during training

I'm fine tuning the model and print the decoded predictions in compute_metrics during training. Here, I see nice convergence to the desired output but when I use the model afterwards with pipeline or with model.generate + tokenizer.decode the results look a bit different (mostly shorter and cut off even when increasing max_new_tokens). I assume that they use slightly different values for generating the tokens when calculating the EvalPrediction object during the evaluation step during training. So my question is: where can I look up those values? Or maybe somebody has another theory?
Thanks for your help :)