On some promts, medium is worse than mini&small?
"Today I own 3 cars but last year I sold 2 cars. How many cars do I own today?"
How is it possible that the 'medium' version often fails at this question, while even the 'mini' version gets it right? (and 'small' too)
It alwost always gives wrong answer: 1 , while other two say: 3
are you using quants for the others as well? does small have quant support?
That's strange either way
are you using quants for the others as well? does small have quant support?
https://ai.azure.com/explore/models?selectedCollection=phi
Here are all the models, you can test each on the right, under "Try it out". I also tested q4km ggufs for mini and medium locally, and get same results.
If it's full weights for all of them and they're still different outputs that's super strange!
Didn't mean to ignore this, just got lost lol
Didn't mean to ignore this, just got lost lol
no, i'm just not sure myself anymore. Because at first i was sure about what's in my first message. But now it seems to answer the question correctly... most of the time.
And btw, sorry for another question, but i just can't figure out why phi models, like this one, only generate text up to around 2500/4096 context and then stop, or just generate nonsense?(instruct mode) I think kobolt.cpp says something like "EOS token triggered!". Same in lm studio or oobabooga.
That does seem curious.. If you have a prompt that triggers it reliable let me know but I'll try to see if I can see it too. If it's happening on multiple platforms that does seem odd..
I assume this doesn't apply to any hosted full weight versions?
No special promt. Just tell it to write some stories or something so it reaches ~2500 context length.
I've been experiencing this since the first day phi-3 came out, and I have no idea why, it seems like I'm the only one, because nobody talks about it. Only phi models do this.
@urtuuuu It is happening with me too. Messed up garbage after 2500 token. Using Q5KM. Trying to change quants.