Qlora fine-tune
#7
by
NickyNicky
- opened
This model seems great to me but I have a problem with Qlora that I fine tune it and the v-ram shoots up to more than 30GB instead of the 15 GB I normally use with the script you provided.
How are you doing the Qlora finetuning? Would appreciate reproduction steps. :)
How are you doing the Qlora finetuning? Would appreciate reproduction steps. :)
me too XD
Load in 4bit and then the regular PEFT model addition. Needs bitsandbytes package as well.
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
revision=revision,
quantization_config=nf4_config,
)
target_modules = ["mixer.Wqkv"]
config = LoraConfig(
r=1,
lora_alpha=1,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
# target specific modules (look for Linear in the model)
# print(model) to see the architecture of the model
target_modules=target_modules,
)
model = get_peft_model(model, config)