Llama 3 coping mechanisms - Part 4
The upcoming season. Now scattered across 4 different streaming services for your displeasure. Apologies if this tangents too hard.
This is a direct Part 4 continuation of Part 3 in this thread.
2nd
εΈ(ββββ )
I was doing my "I use arch BTW" how could you.
BTW there is supposed to be a way to setup arch in wsl: https://wsldl-pg.github.io/ArchW-docs/How-to-Setup/
But I've never tried this. You may also use arch via docker and enable all of the virtualization stuff there
@ABX-AI - This is a bit finicky but it works, you can install WSL Arch like that.
I'm good on windows 10 tbh, is there any real benefit of going into linux at the moment?
BTW there is supposed to be a way to setup arch in wsl: https://wsldl-pg.github.io/ArchW-docs/How-to-Setup/
But I've never tried this. You may also use arch via docker and enable all of the virtualization stuff there
@ABX-AI - This is a bit finicky but it works, you can install WSL Arch like that.
I barely handle Ubuntu, I'd go insane trying to setup a non included distro π
For a normal user not really, linux desktop is transitioning to Wayland. And some things just don't work perfectly yet.
For WSL2, this:
https://github.com/bostrot/wsl2-distro-manager
Is pretty convenient, you can install many docker distros.
Im preoccupied with xtts-v2 training on the Baldur's gate 3 narrator for memes.
Available here: https://huggingface.co/Nitral-AI/XTTS-V2-BG3NV-FT-ST
Now i crawl back into bed and sleep.
@ABX-AI have to tag you, look how simple & good the reasoning is @_@ (for a little bit)
Dolphin Yi-9B
Then it goes insane :3
Edit - Base Yi-9B-Chat gets it right every time, suspiciously well, like 10 out of 10 times
I'm quite happy with the new Hermes Theta, actually. It runs giga-fast in LMS (50t/s at Q5_K_M), and consistently answers this even on regeneration of response.
Answered correctly 7/10 times, which is not bad.
GPT3.5 gets this wrong all the time as well, and I've basically only seen models at the level of GPT4 that get it right every time, anything below is likely to fail at least a few times out of 10.
I'm quite happy with the new Hermes Theta, actually. It runs giga-fast in LMS (50t/s at Q5_K_M), and consistently answers this even on regeneration of response.
Answered correctly 7/10 times, which is not bad.GPT3.5 gets this wrong all the time as well, and I've basically only seen models at the level of GPT4 that get it right every time, anything below is likely to fail at least a few times out of 10.
I've been messing around with Theta too, it's impressive for its size and when I run it through koboldcpp I can see 100t/s most of the time when context is below 4k
I pair it with Maid because it has a native app for android with openai API support.
An interesting new model that popped up when I was messing about in lmsys was glm-4(closed source)
It has flown under the radar (arXiv pages have been appearing since January) but it can answer the weight question right every time and has coding abilities similar to GPT-4
Zhipu AI Unveils GLM-4: A Next-Generation Foundation Model on Par with GPT-4
I'm waiting to see how it scores on the leaderboard.
When it's closed source, "on par with gpt-4" is not that interesting at the end of the day, especially now that 4-o is out + free
The failed reasoning in my tests with a 7B seem to revolve around determining that steel is denser than feathers, and then halting there rather than chaining in conversions.
I stumbled onto the fact that this model that I released with little notice a couple of months back recently got quanted by two of the current high volume quanters. I have no idea how this happened, but this was a few days after someone came across my post about it and noted that it was a good model? This was a merge where I took a successful merge and then remerged it with a higher benching model, so this appears to support the meta about merging in reasoning, which I will apply to some eventual L3 merges.
https://huggingface.co/grimjim/kunoichi-lemon-royale-v2-32K-7B
I'd been sitting on another 7B merge, and finally got around to releasing it. Starling was never meant to be an RP model, but it seems to have helped in conjunction with Mistral v0.2.
https://huggingface.co/grimjim/cuckoo-starling-32k-7B
LLM coping mechanisms - Part 5
Looooong maaaaaan!