How to enable streaming for phi 3 vision model ?

#15

by bhimrazy - opened May 24

May 24

I have developed an interface to chat with this model and was exploring how to stream the output.
https://lightning.ai/bhimrajyadav/studios/deploy-and-chat-with-phi-3-vision-128k-instruct

But I couldn't get it right.

sebbyjp

May 25

What have you tried?

dranger003

May 25

You can try this script: https://gist.github.com/dranger003/845739ac3a64f49d608e9bb39317dbf5

bhimrazy

May 27

Thanks @dranger003 for the script.

I used the existing TextIterabeStreamer and got it working.


#streaming
from threading import Thread
from transformers import TextIteratorStreamer
streamer = TextIteratorStreamer(processor.tokenizer,skip_prompt=True,skip_special_tokens=True,clean_up_tokenization_spaces=False)

# Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=512, eos_token_id=processor.tokenizer.eos_token_id)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

for text in streamer:
    print(text, end="", flush=True)

@sebbyjp , I was getting errors due to some parameter misconfiguration. Finally, it works now.

sebbyjp

May 27

Awesome! Are you able to run batched inference with image inputs?

dranger003

May 27

@bhimrazy Thanks, I didn't know about TextIteratorStreamer!

bhimrazy

Jun 2

Awesome! Are you able to run batched inference with image inputs?

Thank you for the feedback! I haven't had the chance to check out batched inference with image inputs yet, but I'll definitely look into it. I appreciate you bringing it to my attention.

By the way, I have a studio deployed that you can try out. Feel free to explore it here: Deploy and Chat with PHI 3 Vision 128K Instruct.

nguyenbh changed discussion status to closed Jun 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment