How to enable streaming for phi 3 vision model ?
I have developed an interface to chat with this model and was exploring how to stream the output.
https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-phi-3-vision-128k-instruct
But I couldn't get it right.
What have you tried?
You can try this script: https://gist.github.com/dranger003/845739ac3a64f49d608e9bb39317dbf5
Thanks @dranger003 for the script.
I used the existing TextIterabeStreamer and got it working.
#streaming
from threading import Thread
from transformers import TextIteratorStreamer
streamer = TextIteratorStreamer(processor.tokenizer,skip_prompt=True,skip_special_tokens=True,clean_up_tokenization_spaces=False)
# Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=512, eos_token_id=processor.tokenizer.eos_token_id)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
for text in streamer:
print(text, end="", flush=True)
@sebbyjp , I was getting errors due to some parameter misconfiguration. Finally, it works now.
Awesome! Are you able to run batched inference with image inputs?
@bhimrazy
Thanks, I didn't know about TextIteratorStreamer
!
Awesome! Are you able to run batched inference with image inputs?
Thank you for the feedback! I haven't had the chance to check out batched inference with image inputs yet, but I'll definitely look into it. I appreciate you bringing it to my attention.
By the way, I have a studio deployed that you can try out. Feel free to explore it here: Deploy and Chat with PHI 3 Vision 128K Instruct.