How to parallelize starcoder inference?

#93

by Cubby9059 - opened Oct 2, 2023

Oct 2, 2023

Hello,
I am trying to deploy starcoder as an internal coding assistant for a team of 100 people. However, the model is taking too long for predictions especially when parallel requests are being made. I am using an Nvidia A100 40GB. Any suggestions on how to make faster inference?
Thank you.

loubnabnl

BigCode org Oct 3, 2023

you can try deploying the model with Text Generation Inference library that we use for the inference endpoints

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment