(Request) Anybody able to make a 4bit quantized version

#3
by TheFairyMan - opened

Sorry for the request, but I do not know anybody with a good enough computer to ask this for, since this model by default requires around 64GB of VRAM and even if you put the remaining requirements on the CPU, it runs super slow on in comparison and may generate less than half a token per second.
Even a 4090 will have trouble running it as states in the other discussion, thus why I ask if somebody could help and make it or if there is already one available that we didn't find.

Sign up or log in to comment