Code for producing 4bit model
#5
by
Federic
- opened
hi could you provide the code you used to quantize this model, i am particularly interested on 'model.safetensors.index.json' file because when i run quantization this file doesn't compare and it gives some errors . Thanks
Hi @Federic
I used the code described in the mlx-lm repo:
https://github.com/ml-explore/mlx-examples/tree/main/llms
python -m mlx_lm.convert \
--hf-path mistralai/Mistral-7B-v0.1 \
-q \
--upload-repo mlx-community/my-4bit-mistral
Could you share those errors?
Additionally, why are you trying to requantize an existing model?
my main problem is that i can't load this model using huggingface transformers api .
Got it, the model weights in this organisation (meaning MLX )are exclusively for apple silicon.
If you want the HF transformers weights you can check these repos:
- Quantised: https://huggingface.co/prince-canuma/c4ai-command-r-v01-4bit
- Full precision: https://huggingface.co/CohereForAI/c4ai-command-r-v01
Hope this helps!
prince-canuma
changed discussion status to
closed