RuntimeError: cutlassF: no kernel found to launch

#3
by Manmax31 - opened

I attempted finetuning using DeciLM-7B as the model using PEFT and SFT.

After the peft model is created, during inference, I get the error RuntimeError: cutlassF: no kernel found to launch!

I am using TeslaV100 GPUs. I have tried the same with Mistral-7b and don't see any issues.

Here is my code:

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)
base_model_id="Deci/DeciLM-7B"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

ft_model = PeftModel.from_pretrained(
    model, "finetuned_models/deci-7b-tuned-r32-a16"
)
pipe = pipeline("text-generation", model=ft_model, tokenizer=tokenizer)

outputs = pipe(
    prompt,
    max_new_tokens=800,
    do_sample=True,
    temperature=0.1,
    return_full_text=False,
    repetition_penalty=1.5,
    num_beams=1,
    no_repeat_ngram_size=3,
    early_stopping=True,
)
print("\nOutput:", outputs[0]["generated_text"])

Hello,

This is because SDPA is not supported in your environment, this will likely be fixed in a future transformer version as the new version checks if SDPA is available.

Hopefully, that helps.

Thank you.
Is there a work around this for now?

try on A100 GPU, it worked for me

Sign up or log in to comment