Code for producing 4bit model

by Federic - opened Mar 19

MLX Community org Mar 19

hi could you provide the code you used to quantize this model, i am particularly interested on 'model.safetensors.index.json' file because when i run quantization this file doesn't compare and it gives some errors . Thanks

prince-canuma

MLX Community org Mar 20

Hi @Federic

I used the code described in the mlx-lm repo:

https://github.com/ml-explore/mlx-examples/tree/main/llms

python -m mlx_lm.convert \
    --hf-path mistralai/Mistral-7B-v0.1 \
    -q \
    --upload-repo mlx-community/my-4bit-mistral

Could you share those errors?
Additionally, why are you trying to requantize an existing model?

Federic

MLX Community org Mar 21

my main problem is that i can't load this model using huggingface transformers api .

prince-canuma

MLX Community org Mar 25

Got it, the model weights in this organisation (meaning MLX )are exclusively for apple silicon.

If you want the HF transformers weights you can check these repos:

Quantised: https://huggingface.co/prince-canuma/c4ai-command-r-v01-4bit
Full precision: https://huggingface.co/CohereForAI/c4ai-command-r-v01

Hope this helps!

prince-canuma changed discussion status to closed Mar 25

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment