Bad results with ggml version of your model
Hello,
do you have any example prompts and responses of your model. With a GGML version from the Bloke, the model performs very bad. See here: https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GGML/discussions
So, I am interested in what you think about the cause of this?
Is it the GGML conversion that makes it that bad or is the model itself that cause that it has such a worse performance?
Thebloke and me knowing about the worse performance
Since I have never ever worked with ggml I have no experience how to fix that
Ok that mean the results on the full model are better? Maybe you can provide some examples in the model card? That would be great! Would make it a lot easier to decide if I use it without deploying it on a GPU server. Thanks in advance!