Feedback
Feedback appreciated, thank you!
Added to the UGI-Leaderboard. It's definitely one of the best Nemo fine-tunes, but it's almost but not quite number 1. Currently all Nemo fine-tunes I've tested are actually more censored than the original instruct (more likely to give ethical disclaimers, even when told not to). Nemomix-v4.0 was second to the original instruct on UGI, and 2nd to mini-magnum on writing.
Hey, thank you for adding it to the leaderboard and for the feedback! Just one more clarification — this is a merge of existing models, not a fine-tune.
I tried to connect „the best of two worlds” so I assume it won’t excel at either, but I’ll try to bump up the weights of the Instruct to make it even more smarter. I’ll possibly add different models to the merge too. Once again, thank you!
Here's my personal review of the model using standard Mistral instruct template, no system message and such:
Good
- very good spacial awareness, characters move around the scene in a natural manner
- consistent character states, such as clothes being on/off, things being visible or not
- good character card following, almost never straying off the original personality
Bad
- not very wordy, almost always responding in very short sentences, not 'talkative' enough
- doesn't like to change the 'status quo', making longer roleplays somewhat repetitive story wise
ps. I grew used to midnight miqu's romantic drama, so I might be biased on this one.
Thank you for the review, @Olafangensan !
Mistral is very instruct-sensitive, so the style of the messages you receive will heavily depend on the example and first message of your character, plus the prompt itself. I have no issue with receiving longer replies (sometimes, they're even cut-off) on my current settings, which can be shamelessly stolen from here (custom): https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings/tree/main. Example below.
No issues with receiving long replies of 1000t either in my test chats. You can also ban BOS, to ensure the character ALWAYS writes long responses, though that might influence the quality and might result in the model talking for your character.
However, due to how good the Nemo Instruct is at following instructions, it will struggle to change the character unless you specify that you allow for dynamic character growth. Here's an example that should help.
Hope this helps with bringing the best out of the merge!
I was about to edit my message to ask for tips an tricks, but I see it was unnecessary.
Bless!
Honestly, love it - this feels like a slightly more toned down, slightly more intelligent version of mini-magnum (which was possibly getting a bit too unhinged for me). This would probably be my go-to from now on for the time being!
Edit: on further testing, mini-magnum maybe has a slightly more "fun" way of writing... perhaps it's the repetitiveness of content issue as someone else mentioned. Guess I'll play around with the rep/ presence penalty/ DRY or switch between the two depending on what I'm using the LLM for! I do still like its general consistency!
Thank you for the feedback, and super happy you like it!
@DontPlanToEnd Can I get your discord handle
It's da same as my username, dontplantoend. God I wish I didn't choose this username lol. It's so much of a statement. It's just a random song lyric.
I did an EQ-Bench test run of this model and got the following results:
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
eq_bench | 2.1 | none | 0 | eqbench | ↑ | 78.9709 | ± | 1.5866 |
none | 0 | percent_parseable | ↑ | 100.0000 | ± | 0.0000 |
That's better than instruct! :D Pretty cool.
Been running it locally and it's an interesting critter, can follow instructions quite well enough, but also quite happy to introduce itself as a software developer looking for a job like a base model would. You can feel the base/instruct hybrid. I like it.
Oh wow, I honestly did not expect it to be so smart, haha. Thank you, super glad you like it!
I added 0.05 to rep penalty and turned DRY to 0.8/2 with 0 penalty range, works like a charm now! For me, it shows a nice mix of intelligence and creativity. Noticeably, this model seems to follow the system prompt quite well and I was able to be pretty particular about that, yielding some good results.
https://pastebin.com/SMyJ46Wt I use the instruct settings here - I know people say that telling it "don't do these things" usually doesn't work with LLMs before, but it seems to work decently with this one for me.
Notably I also had a line in there for myself in the Don'ts section that I didn't include in the pastebin, because I'm not sure if this one works that well - "Answering or performing actions as {{user}} - only represent the other character(s) instead." It seems to work for me but I'm not sure if it's just my luck, so that can be tested too.
Edit: Honestly yeah the model does have its problems with the repetition issue and some recurrent GPT-isms, but its ability to keep on tract and pull back little relevant details from a decently lengthy context is really impressive for a model of this size. Feels really smart for a 12b model.
Feels really nice for 12b model but I got two questions.
I was testing your custom settings and Example Message appears twice in context, once after Example Response section of Story String and the second time immediately after the chat starts. Is this a command line visual bug, is it supposed to be like that or a mistake on my side?
Could you please take a screenshot of your GGUF model loading settings in ooba and share with me?
Thanks for fun merge.
@KerDisren
you need to set Example Message Behavior to „Never Include Examples” on the User Settings tab to not send the example message twice in SillyTavern.
As for Ooba settings, I run the GGUF version so it’s just all default settings plus 64000 context, flash attention flag marked, and that’s it.
Also, thank you for the kind words!