vikhyatk/moondream2 · Qlora fine-tune

Mar 28

This model seems great to me but I have a problem with Qlora that I fine tune it and the v-ram shoots up to more than 30GB instead of the 15 GB I normally use with the script you provided.

vikhyatk

Owner Mar 29

How are you doing the Qlora finetuning? Would appreciate reproduction steps. :)

nicolollo

Apr 2

How are you doing the Qlora finetuning? Would appreciate reproduction steps. :)

me too XD

rockerBOO

May 1

•

edited May 1

Load in 4bit and then the regular PEFT model addition. Needs bitsandbytes package as well.

    nf4_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

 model = AutoModelForCausalLM.from_pretrained(
        model_id,
        trust_remote_code=True,
        revision=revision,
        quantization_config=nf4_config,
    )

target_modules = ["mixer.Wqkv"]
config = LoraConfig(
        r=1,
        lora_alpha=1,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        # target specific modules (look for Linear in the model)
        # print(model) to see the architecture of the model
        target_modules=target_modules,
    )
model = get_peft_model(model, config)