AetherArchitectural/GGUF-Quantization-Script · Question

Apr 25

I wanted to ask a question: is it currently necessary to replace files in the Llama 3 model, since this is written in your README file?

SolidSnacke

Apr 25

Because I replaced the files and the created models, every 4 or 5 answers began to repeat the same word, like: she walked along the road and found red red red red red red red red.
Now I’ll check the created models without replacing files.

SolidSnacke

Apr 25

In short, I looked at one repository on hugging face and perhaps the issue may be in the model itself that I downloaded for quantization. But I'm not sure yet.
I just found another author who made models and his models have the same problem.

FantasiaFoundry changed discussion title from Question to Question - Llama-3 configs. Apr 28

FantasiaFoundry

AetherArchitectural org Apr 28

•

edited Apr 28

It's still necessary to get the correct EOS tokens just go make sure it doesn't generate forever. I make and use my own quants with these and don't run into issues. Make sure you're using the correct Promoting format when using the model.

For SillyTavern, which I use, we use Presets, you can get them here (simple) or here (Virt's). Use the latest version of KoboldCpp.

If it doesn't work under these circumstances the model you're quanting might be unstable.

Related Discussion: LLM-Discussions #5

FantasiaFoundry

AetherArchitectural org Apr 28

•

edited Apr 28

This model uses the same config and quantization process, for your testing, and it's considered a good performer:

@Lewdiculous /SOVL_Llama3_8B

SolidSnacke

Apr 28

OK, thanks for answer.

FantasiaFoundry changed discussion status to closed Apr 28