issues to convert to GGUF

#2
by robbiemu - opened

I've downloaded the model and am trying to quantize:

ls -la .
total 8812344
drwxr-xr-x  13 macdev  wheel   416B Oct  6 10:45 .
drwxr-xr-x   3 macdev  wheel    96B Oct  6 10:44 ..
drwxr-xr-x  13 macdev  wheel   416B Oct  6 10:45 .git
-rw-r--r--   1 macdev  wheel   1.5K Oct  6 10:44 .gitattributes
-rw-r--r--   1 macdev  wheel    65K Oct  6 10:44 README.md
-rwxr-xr-x   1 macdev  wheel   811B Oct  6 10:44 config.json
-rwxr-xr-x   1 macdev  wheel   200B Oct  6 10:44 generation_config.json
drwxr-xr-x   4 macdev  wheel   128B Oct  6 10:44 images
-rwxr-xr-x   1 macdev  wheel   4.2G Oct  6 10:45 model.safetensors
-rwxr-xr-x   1 macdev  wheel   513B Oct  6 10:44 special_tokens_map.json
-rwxr-xr-x   1 macdev  wheel   133B Oct  6 10:44 tokenizer.json
-rwxr-xr-x   1 macdev  wheel   4.6M Oct  6 10:44 tokenizer.model
-rwxr-xr-x   1 macdev  wheel   2.3K Oct  6 10:44 tokenizer_config.json

but I get this error:

/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,               torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_norm.weight,     torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.ffn_down.weight,      torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,      torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_up.weight,        torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_k.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_output.weight,   torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_q.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_v.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 434, in write
    self.prepare_metadata(vocab_only=False)
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata
    self.set_vocab()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 1521, in set_vocab
    self._set_vocab_sentencepiece()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 752, in _set_vocab_sentencepiece
    special_vocab = gguf.SpecialVocab(self.dir_model, n_vocab=len(tokens))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 40, in __init__
    self._load(Path(path))
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 76, in _load
    self._try_load_from_tokenizer_json(path)
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 122, in _try_load_from_tokenizer_json
    tokenizer = json.load(f)
                ^^^^^^^^^^^^
  File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

this error, minus all the safe tensor INFO lines, is common when the model download is broken somehow .. can you provide/verify MD5s please?

find . -type f -maxdepth 1 -exec md5 {} \;
MD5 (./model.safetensors) = 72d45e5e9c4a738380e198e1929a7dd4
MD5 (./tokenizer_config.json) = 705ddfcd92cdd94f3855796ec4b01e3b
MD5 (./special_tokens_map.json) = af0d0fcd1cc480681d633730ae47ad96
MD5 (./config.json) = 96ef31cfcf5d41fd1b19c1262d07d299
MD5 (./tokenizer.json) = 61f26337646dce58e6aa773483b967d2
MD5 (./generation_config.json) = 91b221a0fe4f974912a35a74119ed69f
MD5 (./tokenizer.model) = 2378d392cfb6f4b30aa9d38f54975b0a
MD5 (./README.md) = 7b9ca1d9157f3b80b83381011b5def44
MD5 (./.gitattributes) = 0249edc2dad72f81bb6c65142fbeee42

I see the error, the tokenizer.json did not download, its just 133bytes
I downloaded that manually and I have a different error :)

/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,               torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
...
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {%- if not date_string is defined %}{%- set date_string = "2024-09-30" %}{%- endif %}{{ "<|im_start|>assistant
I am Salamandra, an AI language model developed at the Barcelona Supercomputing Centre (BSC) by the Language Technologies Unit. My knowledge base was last updated on August 2023. Today Date: "+ date_string +"
Soy Salamandra, un modelo lingüístico de IA desarrollado en el Barcelona Supercomputing Centre (BSC) por la Language Technologies Unit. Mi base de conocimientos se actualizó por última vez en agosto de 2023.
Soc Salamandra, un model de llenguatge d'IA desenvolupat al Barcelona Supercomputing Centre (BSC) per la Language Technologies Unit. La meva base de coneixement es va actualitzar per última vegada l'agost de 2023.<|im_end|>
" }}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G
Traceback (most recent call last):
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write
    self.gguf_writer.write_kv_data_to_file()
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file
    kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val
    raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type
robbiemu changed discussion title from is "model_type": "llama" correct? (failing to convert to GGUF) to issues to convert to GGUF

could it be because of the inversion in the down weights?

INFO:hf-to-gguf:blk.9.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down.weight,       torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,       torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_up.weight,         torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight,    torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}

I see the error, the tokenizer.json did not download, its just 133bytes
I downloaded that manually and I have a different error :)

/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight,               torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {2048}
...
INFO:hf-to-gguf:blk.9.attn_v.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {%- if not date_string is defined %}{%- set date_string = "2024-09-30" %}{%- endif %}{{ "<|im_start|>assistant
I am Salamandra, an AI language model developed at the Barcelona Supercomputing Centre (BSC) by the Language Technologies Unit. My knowledge base was last updated on August 2023. Today Date: "+ date_string +"
Soy Salamandra, un modelo lingüístico de IA desarrollado en el Barcelona Supercomputing Centre (BSC) por la Language Technologies Unit. Mi base de conocimientos se actualizó por última vez en agosto de 2023.
Soc Salamandra, un model de llenguatge d'IA desenvolupat al Barcelona Supercomputing Centre (BSC) per la Language Technologies Unit. La meva base de coneixement es va actualitzar per última vegada l'agost de 2023.<|im_end|>
" }}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G
Traceback (most recent call last):
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write
    self.gguf_writer.write_kv_data_to_file()
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file
    kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val
    raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type

Hey @robbiemu , I am also working on this. I have discovered that tokenizer.json was missing in the base models, but they have updated the repos a few minutes ago: https://huggingface.co/BSC-LT/salamandra-7b/discussions/1#6707a5f2f9b028c1857f2df0

I hadpulled the instruct .. it was a stub when I downloaded the repo initially, but I thought maybe it was an LFS error so I downloaded it manually, as you can see:

ls -la
total 8849584
drwxr-xr-x  13 macdev  wheel   416B Oct  6 13:14 .
drwxr-xr-x   4 macdev  wheel   128B Oct  6 15:10 ..
drwxr-xr-x  13 macdev  wheel   416B Oct  6 10:45 .git
-rw-r--r--   1 macdev  wheel   1.5K Oct  6 10:44 .gitattributes
-rw-r--r--   1 macdev  wheel    65K Oct  6 10:44 README.md
-rwxr-xr-x   1 macdev  wheel   811B Oct  6 10:44 config.json
-rwxr-xr-x   1 macdev  wheel   200B Oct  6 10:44 generation_config.json
drwxr-xr-x   4 macdev  wheel   128B Oct  6 10:44 images
-rwxr-xr-x   1 macdev  wheel   4.2G Oct  6 10:45 model.safetensors
-rwxr-xr-x   1 macdev  wheel   513B Oct  6 10:44 special_tokens_map.json
-rw-r--r--@  1 macdev  staff    18M Oct  6 12:06 tokenizer.json
-rwxr-xr-x   1 macdev  wheel   4.6M Oct  6 10:44 tokenizer.model
-rwxr-xr-x   1 macdev  wheel   2.3K Oct  6 10:44 tokenizer_config.json

edit: huh, I see what you mean, the new one is a different size. the wires were crossed somewhere I guess. I will retry.

edit 2: actually, I see it is the same size, my filesystem just rounds differently for human readable file size ... on HF 19MB is 18MB for me on disk.. about to try anyway. BTW I had the same problem when I cloned the repo as last time. I get this file at first:

cat tokenizer.json
version https://git-lfs.github.com/spec/v1
oid sha256:139de51e6bbe12b772a255e157829f43bd67b63a4d55f1fe0e3abce37b2d8c9a
size 19066993

I have to download that one file manually. This happens during clone:

git clone https://huggingface.co/BSC-LT/salamandra-2b-instruct
Cloning into 'salamandra-2b-instruct'...
remote: Enumerating objects: 71, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 71 (delta 29), reused 0 (delta 0), pack-reused 4 (from 1)
Unpacking objects: 100% (71/71), 90.00 KiB | 1.70 MiB/s, done.
fatal: active `post-checkout` hook found during `git clone`:
    /Users/Shared/Public/huggingface/salamandra-2b-instruct/.git/hooks/post-checkout
For security reasons, this is disallowed by default.
If this is intentional and the hook should actually be run, please
run the command again with `GIT_CLONE_PROTECTION_ACTIVE=false`
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

cd salamandra-2b-instruct

git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

edit 3: same results:

...
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G
Traceback (most recent call last):
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write
    self.gguf_writer.write_kv_data_to_file()
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file
    kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val
    raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type

we found the issue was just in the readme -- the Norwegian language is in the yaml spec as "- no" instead of "- \no", and because of how yaml works, that is read as a False.

robbiemu changed discussion status to closed

Sign up or log in to comment