Does not work /:
using it in Llama.cpp newest version or LM Studio the model will fail to load.
We haven't made an official merge yet.
It's available now on https://github.com/OpenBMB/llama.cpp
Not working with LM Studio Q4 version
{
"cause": "(Exit code: -1073740791). Unknown error. Try a different model and/or config.",
"suggestion": "",
"data": {
"memory": {
"ram_capacity": "63.74 GB",
"ram_unused": "48.25 GB"
},
"gpu": {
"type": "Nvidia CUDA",
"vram_recommended_capacity": "8.00 GB",
"vram_unused": "6.93 GB"
},
"os": {
"platform": "win32",
"version": "10.0.22631",
"supports_avx2": true
},
"app": {
"version": "0.2.23",
"downloadsDir": "C:\\Users\\username\\.cache\\lm-studio\\models"
},
"model": {}
},
"title": "Error loading model."
}```
![image.png](https://cdn-uploads.huggingface.co/production/uploads/66188da9cfc431c5205269c9/Ys-fYa56EPi2dzGlBYWhg.png)
Not working with LM Studio Q4 version
{ "cause": "(Exit code: -1073740791). Unknown error. Try a different model and/or config.", "suggestion": "", "data": { "memory": { "ram_capacity": "63.74 GB", "ram_unused": "48.25 GB" }, "gpu": { "type": "Nvidia CUDA", "vram_recommended_capacity": "8.00 GB", "vram_unused": "6.93 GB" }, "os": { "platform": "win32", "version": "10.0.22631", "supports_avx2": true }, "app": { "version": "0.2.23", "downloadsDir": "C:\\Users\\username\\.cache\\lm-studio\\models" }, "model": {} }, "title": "Error loading model." }``` ![image.png](https://cdn-uploads.huggingface.co/production/uploads/66188da9cfc431c5205269c9/Ys-fYa56EPi2dzGlBYWhg.png)
Our code has not been merged into the official.
please temporarily through our fork(https://github.com/OpenBMB/llama.cpp) to use minicpmv2.5.
Not working with LM Studio
Our code has not been merged into the official.
please temporarily through our fork(https://github.com/OpenBMB/llama.cpp) to use minicpmv2.5.
@joedong
Yes it does :
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git llama.cpp-minicpm
cd llama.cpp-minicpm/
export LLAMA_CUDA=1 # if you have a NViDiA GPU
make minicpmv-cli -j$(nproc)
if you read https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv you can for example :
run f16 version./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
run quantized int4 version
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
or run in interactive mode
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
i tried with the Q4_K_M set temp to 0.1 and it worked perfect :~/whatever/llama.cpp-minicpm/minicpmv-cli -m ~/whatever/ggml-model-Q4_K_M.gguf --mmproj ~/whatever/mmproj-model-f16.gguf -c 4096 --temp 0.1 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image /path/to/image.png -p "describe image"
Today the latest version of LM Studio got released 0.2.24.. Still facing the same issue. I was hoping the new llama.cpp commit in the release notes would have resolved the issue. LM Studio is very popular tool. Please see if you can make it work ASAP. Thanks in advance.
{
"cause": "(Exit code: 0). Some model operation failed. Try a different model and/or config.",
"suggestion": "",
"data": {
"memory": {
"ram_capacity": "63.74 GB",
"ram_unused": "44.32 GB"
},
"gpu": {
"gpu_names": [
"NVIDIA GeForce RTX 3080 Laptop GPU"
],
"vram_recommended_capacity": "8.00 GB",
"vram_unused": "6.93 GB"
},
"os": {
"platform": "win32",
"version": "10.0.22631",
"supports_avx2": true
},
"app": {
"version": "0.2.24",
"downloadsDir": "C:\\Users\\username\\.cache\\lm-studio\\models"
},
"model": {}
},
"title": "Error loading model."
}```
The problem is that OpenBMB forked both llamacpp and also ollama and in their fork it might work. However in the main versions of those programs there is no support as of now. Maybe the authors of the forks should try to create pull requests in llamacpp and ollama instead of creating their own forks in order to make it more widely available.
Did you run bf16.gguf? I got error!
Yes it does :
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git llama.cpp-minicpm cd llama.cpp-minicpm/ export LLAMA_CUDA=1 # if you have a NViDiA GPU make minicpmv-cli -j$(nproc)
if you read https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv you can for example :
run f16 version
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
run quantized int4 version
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
or run in interactive mode
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
i tried with the Q4_K_M set temp to 0.1 and it worked perfect :
~/whatever/llama.cpp-minicpm/minicpmv-cli -m ~/whatever/ggml-model-Q4_K_M.gguf --mmproj ~/whatever/mmproj-model-f16.gguf -c 4096 --temp 0.1 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image /path/to/image.png -p "describe image"
No, only the mmproj-model-f16.gguf