load_state_dict fails

#7
by haydenhong - opened

Thanks for the great works!
I have just pulled your latest, done pip-install the requirements, and used the following GPTQ command to run:
python setup_cuda.py install
CUDA_VISIBLE_DEVICES=0 python llama_inference.py alpaca-30b-lora-int4 --wbits 4 --load alpaca-30b-lora-int4/alpaca-30b-4bit-128g.safetensors --text "count up to 10" --max_length 1500 --min_length 100
Where alpaca-30b-lora-int4 --wbits 4 is the subfolder, git clone of your code
it fails with the following mismatch; any suggestion how to fix this? thanks!

Loading model ...
Traceback (most recent call last):
File "llama_inference.py", line 112, in
model = load_quant(args.model, args.load, args.wbits, args.groupsize)
File "llama_inference.py", line 50, in load_quant
model.load_state_dict(safe_load(checkpoint))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([52, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([52, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([52, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([52, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([52, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([52, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([52, 832]) from checkpoint, the shape in current model is torch.Size([1, 832]).
size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([52, 6656]) from checkpoint, the shape in current model is torch.Size([1, 6656]).
....

Use the latest weight alpaca-30b-4bit.safetensors fixes the problem. Thanks!

haydenhong changed discussion status to closed

Sign up or log in to comment