Expected scalar type Float but found Half when using Text Gen WebUI with VIcuna & monkey-patch
#11
by
mbecuwe
- opened
I am trying to finetune a Vicuna model using text generation webui.
I followed these steps for install as shown in the documentation:
# Install miniconda
curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
bash Miniconda3.sh
# Create conda env
conda create -n textgen python=3.10.9
conda activate textgen
# Install torch
pip3 install torch torchvision torchaudio
# Install text generation webui
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
# install nvcc
conda install -c conda-forge cudatoolkit-dev
# Install GPTQ for LLaMa
sudo apt install build-essential
mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install
# Install monkey patch
cd ..
git clone https://github.com/johnsmith0031/alpaca_lora_4bit
pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@eaa9955 # Wont work if I dont revert to this specific commit
# Download model
cd ..
python download-model.py TheBloke/stable-vicuna-13B-GPTQ
# Run server with monkey patch
python server.py --model TheBloke_stable-vicuna-13B-GPTQ --wbits 4 --groupsize 128 --model_type Llama --share --api --listen --auto-devices --monkey-patch --no-stream
When trying to generate from prompts in the interface, I get the error:
Traceback (most recent call last):
File "/home/jupyter/text-generation-webui/modules/callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/jupyter/text-generation-webui/modules/text_generation.py", line 277, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/jupyter/text-generation-webui/repositories/alpaca_lora_4bit/amp_wrapper.py", line 18, in autocast_generate
return self.model.non_autocast_generate(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
return self.sample(
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
outputs = self(
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
outputs = self.model(
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward
layer_outputs = decoder_layer(
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 293, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 197, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jupyter/text-generation-webui/repositories/alpaca_lora_4bit/autograd_4bit.py", line 133, in forward
out = matmul4bit_with_backend(x, self.qweight, self.scales,
File "/home/jupyter/text-generation-webui/repositories/alpaca_lora_4bit/autograd_4bit.py", line 89, in matmul4bit_with_backend
return mm4b.matmul4bit(x, qweight, scales, qzeros, g_idx)
File "/home/jupyter/text-generation-webui/repositories/alpaca_lora_4bit/matmul_utils_4bit.py", line 131, in matmul4bit
output = _matmul4bit_v2(x, qweight, scales, zeros, g_idx)
File "/home/jupyter/text-generation-webui/repositories/alpaca_lora_4bit/matmul_utils_4bit.py", line 70, in _matmul4bit_v2
quant_cuda.vecquant4matmul_faster(x, qweight, y, scales, zeros, g_idx, x.shape[-1] // 2)
RuntimeError: expected scalar type Float but found Half
Text generation will work without the monkey patch but then I cannot finetune the model on my dataset.
All my tests are using GPU Nvidia P100.
Would be a great help if you could help me fix it !