Have a problem with rtx4090
I have this error and don't quite understand what i did wrong
type: llama
wbits: 4
groupsize: none
Traceback (most recent call last): File “G:\ai\oobabooga-windows\text-generation-webui\server.py”, line 70, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “G:\ai\oobabooga-windows\text-generation-webui\modules\models.py”, line 103, in load_model tokenizer = load_tokenizer(model_name, model) File “G:\ai\oobabooga-windows\text-generation-webui\modules\models.py”, line 128, in load_tokenizer tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}/“), clean_up_tokenization_spaces=True) File “G:\ai\oobabooga-windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py”, line 1811, in from_pretrained return cls.from_pretrained( File “G:\ai\oobabooga-windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py”, line 1965, in from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File “G:\ai\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py”, line 96, in init self.sp_model.Load(vocab_file) File "G:\ai\oobabooga-windows\installer_files\env\lib\site-packages\sentencepiece_init.py", line 905, in Load return self.LoadFromFile(model_file) File "G:\ai\oobabooga-windows\installer_files\env\lib\site-packages\sentencepiece_init.py”, line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) TypeError: not a string
Check that your files are downloaded correctly, especially tokenizer.model, tokenizer.json and tokenizer_config.json
If need be, download them again
I will test it on a 4090 and see if I can reproduce the error, from the top of my head sentence piece requires a particular version of python.
I have most recent version,
Python 3.10.11
I will test it on a 4090 and see if I can reproduce the error, from the top of my head sentence piece requires a particular version of python.
It's not related to GPU. I don't really know what's causing it if you definitely have tokenizer.model downloaded and its sha256sum is correct. Lack of tokenizer.model is the normal cause of this.
Try deleting/renaming tokenizer.model
, see if it loads the Fast Tokenizer (tokenizer.json) instead
I had this same error. When I downloaded the repo the command I used didn't grab LFS objects. I went in and manually downloaded the safetensors file but I didn't catch that tokenizer.model is flagged as LFS too and I neglected to download it. Once I downloaded the proper tokenizer.model, the model loaded right up on my 4090.
It's not related to GPU. I don't really know what's causing it if you definitely have tokenizer.model downloaded and its sha256sum is correct. Lack of tokenizer.model is the normal cause of this.
Try deleting/renaming
tokenizer.model
, see if it loads the Fast Tokenizer (tokenizer.json) instead
Hmm, interesting. I tried deleting it, reloading... same error. I also tried putting it back, re loading... nothing. And same when I rename it while its still in the folder.
I had this same error. When I downloaded the repo the command I used didn't grab LFS objects. I went in and manually downloaded the safetensors file but I didn't catch that tokenizer.model is flagged as LFS too and I neglected to download it. Once I downloaded the proper tokenizer.model, the model loaded right up on my 4090.
He's right, it was my mistake that LFS wasn't downloaded.
But now i can't load the model again, there's no localhost, ip or something
I had this same error. When I downloaded the repo the command I used didn't grab LFS objects. I went in and manually downloaded the safetensors file but I didn't catch that tokenizer.model is flagged as LFS too and I neglected to download it. Once I downloaded the proper tokenizer.model, the model loaded right up on my 4090.
He's right, it was my mistake that LFS wasn't downloaded.
But now i can't load the model again, there's no localhost, ip or something
This problem is caused by not having enough RAM to load the model. It uses a lot of RAM while it loads the model on to VRAM.
To resolve it, increase your Windows pagefile size a lot - eg up to about 90GB. That should allow the model to load onto the GPU OK.
I'm on an RTX 4090 (24 GB VRAM) and got it working on the first try despite me only having 16 GB RAM. I used Kobold's GBTQ version branch. Used install_requirements.bat, opened Kobold and once inside I enabled the new UI mode. After installing the model into my model folder, I renamed the .safetensors file to 4bit.safetensors. I then loaded the model into Kobold via the AI button. I assigned all the layers to the GPU (and it all fit). Done. Took a while to load it all in tho.
This makes me glad I have 96GB of ram!
Sadly didn't help...I get this error:
Traceback (most recent call last): File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\models.py”, line 95, in load_model output = load_func(model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\models.py”, line 275, in GPTQ_loader model = modules.GPTQ_loader.load_quantized(model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 177, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 84, in _load_quant model.load_state_dict(safe_load(checkpoint), strict=False) File “C:\Users\cleverest\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py”, line 2041, in load_state_dict raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format( RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in cu
Sadly 96GB of RAM didn't help...I got this error (and it's much longer, I cut it off to share some of it:
Traceback (most recent call last): File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\models.py”, line 95, in load_model output = load_func(model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\models.py”, line 275, in GPTQ_loader model = modules.GPTQ_loader.load_quantized(model_name) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 177, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “C:\Users\cleverest\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py”, line 84, in _load_quant model.load_state_dict(safe_load(checkpoint), strict=False) File “C:\Users\cleverest\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py”, line 2041, in load_state_dict raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format( RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.q_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.self_attn.v_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]). size mismatch for model.layers.0.mlp.down_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([140, 832]). size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([140, 6656]). size mismatch for model.layers.0.mlp.gate_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]). size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]). size mismatch for model.layers.0.mlp.up_proj.qzeros: copying a param with shape torch.Size([1, 2240]) from checkpoint, the shape in current model is torch.Size([52, 2240]). size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([1, 17920]) from checkpoint, the shape in current model is torch.Size([52, 17920]). size mismatch for model.layers.1.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]). size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoin
This happens when GPTQ params are not set correctly
Please ensure you set and saved these params:
- bits = 4
- groupsize = None
- model type = llama
If you're still having the problem, please edit models/config-user.yaml
and find the entry for this model, and make sure it matches this:
TheBloke_WizardLM-30B-Uncensored-GPTQ$:
auto_devices: false
bf16: false
cpu: false
cpu_memory: 0
disk: false
gpu_memory_0: 0
groupsize: None
load_in_8bit: false
mlock: false
model_type: llama
n_batch: 512
n_gpu_layers: 0
pre_layer: 0
threads: 0
wbits: '4'
In particular ensure groupsize: None
Thank you. Those are the parameters that are already set, so I must have to edit the file...I'll try that later when I'm home.
So I confirmed the default config settings for this model in OobaBooga is loading as:
auto_devices: false
bf16: false
cpu: false
cpu_memory: 0
disk: false
gpu_memory_0: 0
groupsize: 128
load_in_8bit: false
mlock: false
model_type: llama
n_batch: 512
n_gpu_layers: 0
no_mmap: false
pre_layer: 0
threads: 0
wbits: 4
I changed it to yours as recommended above, which I noticed two things changed: The groupsize from 128 to 0 (yet it IS set to NONE in the GUI BTW), and I removed this line 'no_mmap: false' entirely.
After doing this, AND RESTARTING it completely, it works! Thank you!