Amethyst 13B Mistral - EXL2 - 8bpw, hb8
- Model creator: Undi
- Original model: Amethyst 13B Mistral
Description
- 8 bits per weight.
- 8 bits "for the lm_head (output) layer of the model," instead of the typical 6.
- Works fine with 24 GB VRAM and no flash attention v2 under Windows.
- For me runs at about 64% of the 4-bit GPTQ speed.
I converted the model using the convert.py script from the exllamav2 repo:
https://github.com/turboderp/exllamav2
Its documentation:
https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
Measuring the model took 51 minutes, converting it 18 minutes.
I used the WikiText-2-v1 dataset for calibration:
https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.