ChenMnZ
/

OmniQuant

Model card Files Files and versions Community

more models?

by gnomealone - opened Sep 9, 2023

Discussion

gnomealone

Sep 9, 2023

can you release a Phind codellama 34B or Falcon 180B model please?

ChenMnZ

Owner Sep 9, 2023

Falcon 180B is on the way. Please wait a few days.

ChenMnZ

Owner Sep 11, 2023

We have released Falcon-180B, refer https://github.com/OpenGVLab/OmniQuant for more details.

gnomealone

Sep 11, 2023

thank you!

TK4000

Sep 12, 2023

Can it works on Mac Studio on m1 ultra with 128go using metal ??

ChenMnZ

Owner Sep 12, 2023

The current implementation uses CUDA kernels for the quantized model, so it's GPU-compatible.
In theory, Falcon-180b with w3a16g512 quantization can operate on any device with over 80GB of free memory. However, adapting the weight-only quantization kernels on other devices requires additional effort.

Naugustogi

Sep 12, 2023

How to run any of this?

ChenMnZ

Owner Sep 13, 2023

Refer https://github.com/OpenGVLab/OmniQuant/blob/main/runing_falcon180b_on_single_a100_80g.ipynb for more details.

TK4000

Sep 13, 2023

Thanks, i will wait until i can find a version embeded the metal librairy, perhaps MLC

gnomealone

Sep 17, 2023

The current implementation uses CUDA kernels for the quantized model, so it's GPU-compatible.
In theory, Falcon-180b with w3a16g512 quantization can operate on any device with over 80GB of free memory. However, adapting the weight-only quantization kernels on other devices requires additional effort.

are those additional efforts being made or planned by your team?

ChenMnZ

Owner Sep 19, 2023

No, we currently have no plan about doing this by ourself.
MLC LLM is an excellent platform which support various devices. We can wait until MLC LLM supports Falcon-180B.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment