Batch inference slower as compared to single inferences
#6
by
gauranshsoni12
- opened
Hi I ran the model over a sequence of prompts using the batch_size param of 32, GPU Specs: Nvidia A10G 4x24Gb, even tho Nvidia-smi shows heavy gpu utilization but results are not coming up, rather a loop over the sequence generates faster results