Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

BLOOM API inference

#150
by Matuesz - opened

Hi, I've been playing around with the API and discovered two interesting things that maybe someone could explain to me.
First: when I switch the model to greedy in the huggingface API, it works fine, but when I set "do_sample": False in my request, it returns different results each time. Is there any possibility to force Bloom to be more deterministic?
Second: when i get response in json it skips space before punctuation e. g. "bla , bla" -> "bla, bla". Is there a way to avoid this behavior?
Thanks for answers!

BigScience Workshop org

To your first question, the parameter to set is do_sample which might explain the non-determinism you're observing.

oh yess, I used do_sample, typo here so I quickly edited question

I encountered the same problem as explained in your second question, is this perhaps the cause of the return_full_text parameter not working which always seems to be True?
https://huggingface.co/bigscience/bloom/discussions/153#6397907b71eb2455d898e0a4

Sign up or log in to comment