google/gemma-2b-it · RuntimeError: cutlassF: no kernel found to launch!

Mar 20

I am facing the issues as per the title, for executing "torch.bfloat16". Can suggest what's wrong in it?

Below is my dev env:

NVIDIA V100 GPU device
Python 3.10.12
CUDA 12.4 with Driver 550.54.14
accelerate==0.28.0
torch==2.1.2
transformers==4.38.2

Also, below code execute successfully , as my setup able to use the V100.

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16, token = access_token)

However, it failed for below code, as using "torch.bfloat16"

...
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16, token = access_token)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

ybelkada

Mar 21

Hi @kehkok
Thanks for the issue! V100s do not support bf16, they only support fp16 for half-precision. I believe you need to restrict the usage for fp16 only in your case

lkv

Google org Sep 12

Hi @kehkok , I reproduced issue, because the NVIDIA V100 GPU does not support bfloat16 operations, which are required when using torch.bfloat16. The V100 GPU supports float16 (FP16) but lacks full hardware support for bfloat16. This is reason your model works fine with torch.float16 but fails when using torch.bfloat16. Kindly use float16 instead of bfloat16. find the below code. Thank you.

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16, token=access_token)