VERY impressive!

by Kquant03 - opened Jan 31

Jan 31

Just going to drop this here, I'm so hyped about this. You guys have been doing such amazing work and it's been such a pleasure to join you, on this journey!

Undi95

NeverSleep org Jan 31

Thank you!

dillfrescott

Feb 1

Mistral CEO confirmed that Miqu is a prototype Mistral 70b. So this is a finetune of the padded dequantized version? Interesting!

Undi95

NeverSleep org Feb 1

Mistral CEO confirmed that Miqu is a prototype Mistral 70b. So this is a finetune of the padded dequantized version? Interesting!

We finetuned on top of that : https://huggingface.co/152334H/miqu-1-70b-sf
Apparently it has less problem, less perplexity too. Fixes a bunch of things.

dillfrescott

Feb 1

https://twitter.com/arthurmensch/status/1752734898476007821 proof just in case someone reads that and doubts

dillfrescott

Feb 1

Ooooo nice!

morgul

Feb 2

It certainly seems to have it's own style of writing, much nicer than a lot of other models. I like it a lot just playing with it on it's own, need to play with settings to dial it in for Silly Tavern.

Seems good at general knowledge, and it's code generation beat GPT 3.5 (better quality answers, newer libraries).

On a Macbook M1 Max w/ 64GB, the Q3_K_M runs 16s TTFT, ~5.25t/s.

zaq-hack

Feb 3

•

edited Feb 3

Code generation!?
(checks title, again)
Oooooooookayyyyyyy ...
Can't wait for that Noromaid-EveryoneCoder33b merge. :-)

nonetrix

Feb 6

•

edited Feb 6

I think it has my favorite writing style so far very creative, I will edit this later and tell you how coherent it is etc. If I am insane enough I might try to reinstall my system with a stupidly large swap file and try to make a frankenmerge to 120B with this model, seems doable since someone else did it with swap so maybe? I have 64GBs of RAM so that will at least make it a lot better

Edit: it's really dumb sometimes especially at higher temperature, but there is a nice balance you can find somewhat, it also needs a really high repeat penalty. I think this my favorite model so far at least sometimes, but other times it's really bad. But my standards might be corrupted by 120B models. It needs a lot of tweaking to get good results and it acts weirdly still sometimes. I will have to do more testing though. I think it's a good base for further refinement and merging. So overall it's complicated... It's exceptionally good 50% of the time, then exceptionally terrible the other 50% of the time I'd maybe look into RLHF to maybe keep the same feel, but help it with logic perhaps but I don't know much about training

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment