Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."

#27
by Pranav0511 - opened

Hi there. I am running this model for inference with a question-answering dataset on a single 80GB GPU from an NVIDIA 8GPU DGX node. However, after the model is downloaded and the checkpoint shards are loaded, it gives this error. Please help me with it, if anyone has faced this before or knows how to tackle it.

Same problem on A10.

Sign up or log in to comment