Quantization made by Richard Erkhov.

Llama-3-8B-Stroganoff-2.0 - GGUF

Model creator: https://huggingface.co/HiroseKoichi/
Original model: https://huggingface.co/HiroseKoichi/Llama-3-8B-Stroganoff-2.0/

Name	Quant method	Size
Llama-3-8B-Stroganoff-2.0.Q2_K.gguf	Q2_K	2.96GB
Llama-3-8B-Stroganoff-2.0.IQ3_XS.gguf	IQ3_XS	3.28GB
Llama-3-8B-Stroganoff-2.0.IQ3_S.gguf	IQ3_S	3.43GB
Llama-3-8B-Stroganoff-2.0.Q3_K_S.gguf	Q3_K_S	3.41GB
Llama-3-8B-Stroganoff-2.0.IQ3_M.gguf	IQ3_M	3.52GB
Llama-3-8B-Stroganoff-2.0.Q3_K.gguf	Q3_K	3.74GB
Llama-3-8B-Stroganoff-2.0.Q3_K_M.gguf	Q3_K_M	3.74GB
Llama-3-8B-Stroganoff-2.0.Q3_K_L.gguf	Q3_K_L	4.03GB
Llama-3-8B-Stroganoff-2.0.IQ4_XS.gguf	IQ4_XS	4.18GB
Llama-3-8B-Stroganoff-2.0.Q4_0.gguf	Q4_0	4.34GB
Llama-3-8B-Stroganoff-2.0.IQ4_NL.gguf	IQ4_NL	4.38GB
Llama-3-8B-Stroganoff-2.0.Q4_K_S.gguf	Q4_K_S	4.37GB
Llama-3-8B-Stroganoff-2.0.Q4_K.gguf	Q4_K	4.58GB
Llama-3-8B-Stroganoff-2.0.Q4_K_M.gguf	Q4_K_M	4.58GB
Llama-3-8B-Stroganoff-2.0.Q4_1.gguf	Q4_1	4.78GB
Llama-3-8B-Stroganoff-2.0.Q5_0.gguf	Q5_0	5.21GB
Llama-3-8B-Stroganoff-2.0.Q5_K_S.gguf	Q5_K_S	5.21GB
Llama-3-8B-Stroganoff-2.0.Q5_K.gguf	Q5_K	5.34GB
Llama-3-8B-Stroganoff-2.0.Q5_K_M.gguf	Q5_K_M	5.34GB
Llama-3-8B-Stroganoff-2.0.Q5_1.gguf	Q5_1	5.65GB
Llama-3-8B-Stroganoff-2.0.Q6_K.gguf	Q6_K	6.14GB
Llama-3-8B-Stroganoff-2.0.Q8_0.gguf	Q8_0	7.95GB

Original model description:

license: llama3 library_name: transformers tags: - nsfw - not-for-all-audiences - llama-3 - text-generation-inference - mergekit - merge

Llama-3-8B-Stroganoff-2.0

I have made an incredible model. Stroganoff was not so substantially different from other roleplay models that I could confidently recommend it to other people; it felt more consistent and reduced repetition, but that's mostly it. Stroganoff-2.0, on the other hand, shows some emergent properties from the addition of MopeyMule, and Unaligned_Alpha amplifies its effect. The original intention was to obviously reduce positivity bias by introducing MopeyMule, but I started noticing that character reactions in different scenarios were more varied and realistic instead of just defaulting to an extremely nice and respectful personality.

In particular, current models feel like they're drawing an invisible line on what they're willing to generate. Sure, they can technically generate all kinds of content, but they will refuse to go into detail on anything that isn't positive, happy, and respectful. Stroganoff-2.0, on the other hand, has no issue delving into any topic in detail. To understand what I mean, use the prompt "Write a story about hardcore BDSM" and compare another roleplay model to Stroganoff-2.0; it can be absolutely brutal and humiliating when it needs to, and in great detail too. You don't have to worry about it being overly negative or horny all the time, though; it actually seems to understand the line between SFW and NSFW much better.

One of the main reasons I started model merging was to create a model that's good for story writing; it is so goddamn frustrating to see "a mysterious figure who trained their whole life for this oddly specific moment appears and solves the issue," "the resilience of humans and the power of friendship," and "the bad guys spontaneously feel immense regret and remorse and dedicate their whole lives to righting their wrongs" in every single fucking situation. Now, I'm not going to claim that this model is perfect; it absolutely can be improved in many areas, but it's the first model I've used that has met the bare minimum required to actually be usable for story writing, and not just erotic stories, but in general. Granted, 70B models are too slow on my hardware, and I refuse to use an API, so this opinion is on sub-70B local models.

Now that I think about it, is this really emergent behavior? It seems pretty obvious in hindsight that a model that's not trying to shove positivity up your ass at every turn would be more willing to generate "offensive" and realistic content.

Note: 2.0 seems to have more repitition than the first. I'll try to fix that in future versions.

Merging Tips

If I were to write a paper on model merging, it would be called "Model Stock Is All You Need" because it's seriously amazing. I've tried many different merge methods, and I could only get barely passable results after tweaking parameters all day, but Model Stock has consistently produced good models for me. I recently made a discovery, though in hindsight it's very obvious, but model order matters a lot when using Model Stock, and it can make or break a merge. I have found that models at the top of the list integrate more deeply into the model, and models at the bottom of the list keep more of their style in the final result. What this means is that you should put chaotic models and ones that add new capabilities at the top of the list and the more balanced and coherent ones at the bottom.

The secret to absolutely hammering out positivity bias is to use MopeyMule as the base model and put an uncensored model at the top of the list (my favorite is LLAMA-3_8B_Unaligned_Alpha). Of course, if you add models that have a strong bias towards positivity to the merge, then it will likely reduce or even nullify the effect.

Quantization Formats

GGUF

Details

License: llama3
Instruct Format: llama-3
Context Size: 8K

Models Used

Merge Config

models:
    - model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha
    - model: maldv/badger-writer-llama-3-8b
    - model: Sao10K/L3-8B-Niitama-v1
    - model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85
    - model: Sao10K/L3-8B-Stheno-v3.2
merge_method: model_stock
base_model: failspy/Llama-3-8B-Instruct-MopeyMule
dtype: bfloat16

RichardErkhov
/

HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf