Transformers
llama

can't load model with llama.cpp commit 519c981f8b65ee6c87c2965539685ced0a17223b

#6
by md2 - opened

Hello TheBloke, thanks for the great effort in pulling this up.

I understood that this version should be compatible with the latest llama.cpp. I have updated it a few hours ago (commit 519c981f8b65ee6c87c2965539685ced0a17223b) and trying to use the new model as

./main -m ./models/13B/vicuna-13b-v1.5-16k/vicuna-13b-v1.5-16k.ggmlv3.q4_1.bin -n 256 --repeat_penalty 1.0 -c 2048 --rope-freq-base 10000 --rope-freq-scale 0.25 --color -i -r "User:" -f prompts/chat-with-bob.txt

However, this is the output:

main: warning: scaling RoPE frequency by 0,25 (default 1.0)
main: build = 1022 (bac6699)
main: seed = 1692746800
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from ./models/13B/vicuna-13b-v1.5-16k/vicuna-13b-v1.5-16k.ggmlv3.q4_1.bin

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './models/13B/vicuna-13b-v1.5-16k/vicuna-13b-v1.5-16k.ggmlv3.q4_1.bin'
main: error: unable to load model

What am I doing wrongly? Any help will be much appreciated, thanks!

The new version supports gguf format. Try converting it using the python ggml_to_gguf in the llama.cpp repo.

Yeah, latest llama.cpp is no longer compatible with GGML models. The new model format, GGUF, was merged recently. As far as llama.cpp is concerned, GGML is now dead - though of course many third-party clients/libraries are likely to continue to support it for a lot longer. I need to update my GGML READMEs to mention this and will be doing this shortly.

I will be providing GGUF models for all my repos in the next 2-3 days. I'm waiting for another PR to merge, which will add improved k-quant quantisation formats.

For now, if you want to use llama.cpp you will need to downgrade it back to commit dadbed99e65252d79f81101a392d0d6497b86caa or earlier. Or use one of the llama.cpp binary releases from before GGUF was merged. Or use a third party client like KoboldCpp, LM Studio, text-generation-webui, etc.

Look out for new -GGUF repos from me in the coming days. Or yes, you can convert them yourself using the script ggml_to_gguf.py now provided with llama.cpp.

Thank you so much TheBloke, that's a perfect summary of the current situation. I can confirm that using

python3 convert-llama-ggmlv3-to-gguf.py --input ... --output ...

does the job!

@TheBloke I think it's better to ship old ggml with gguf files in the same repo for sometime with a description of the two types of files in the README.

Hello. Chiming in here.
Let's say I want to convert nous-hermes-llama2-13b.ggmlv3.q4_K_M.bin to GGUF.

Do I have to provide the metadata from the original model (https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b/tree/main)?
Perhaps config.json? If so, what's the correct flag?

If I understand correctly, GGUF is GGML in the core, but with some extra metadata. 🤔
I want to make sure the conversion is done 100% right.

Sign up or log in to comment