RuntimeError: cutlassF: no kernel found to launch!
I am facing the issues as per the title, for executing "torch.bfloat16". Can suggest what's wrong in it?
Below is my dev env:
- NVIDIA V100 GPU device
- Python 3.10.12
- CUDA 12.4 with Driver 550.54.14
- accelerate==0.28.0
- torch==2.1.2
- transformers==4.38.2
Also, below code execute successfully , as my setup able to use the V100.
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16, token = access_token)
However, it failed for below code, as using "torch.bfloat16"
...
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16, token = access_token)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Hi @kehkok , I reproduced issue, because the NVIDIA V100 GPU does not support bfloat16 operations, which are required when using torch.bfloat16. The V100 GPU supports float16 (FP16) but lacks full hardware support for bfloat16. This is reason your model works fine with torch.float16 but fails when using torch.bfloat16. Kindly use float16 instead of bfloat16. find the below code. Thank you.
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16, token=access_token)