more models?
can you release a Phind codellama 34B or Falcon 180B model please?
Falcon 180B is on the way. Please wait a few days.
We have released Falcon-180B, refer https://github.com/OpenGVLab/OmniQuant for more details.
thank you!
Can it works on Mac Studio on m1 ultra with 128go using metal ??
The current implementation uses CUDA kernels for the quantized model, so it's GPU-compatible.
In theory, Falcon-180b with w3a16g512 quantization can operate on any device with over 80GB of free memory. However, adapting the weight-only quantization kernels on other devices requires additional effort.
How to run any of this?
Refer https://github.com/OpenGVLab/OmniQuant/blob/main/runing_falcon180b_on_single_a100_80g.ipynb for more details.
Thanks, i will wait until i can find a version embeded the metal librairy, perhaps MLC
The current implementation uses CUDA kernels for the quantized model, so it's GPU-compatible.
In theory, Falcon-180b with w3a16g512 quantization can operate on any device with over 80GB of free memory. However, adapting the weight-only quantization kernels on other devices requires additional effort.
are those additional efforts being made or planned by your team?
No, we currently have no plan about doing this by ourself.
MLC LLM is an excellent platform which support various devices. We can wait until MLC LLM supports Falcon-180B.