|
--- |
|
license: apache-2.0 |
|
language: |
|
- fr |
|
- it |
|
- de |
|
- es |
|
- en |
|
inference: false |
|
--- |
|
|
|
# Mixtral-8x7B (gpt-fast edition) |
|
|
|
This repo holds quantized Mixtral-8x7B weights to be used in [gpt-fast](https://github.com/pytorch-labs/gpt-fast/). |
|
|
|
## Compatibility |
|
|
|
Conversion to int4 was broken, so this repo only holds fp8 weights. |
|
Practically speaking this means your GPU(s) need to be Ada Lovelace or newer, and have enough VRAM to hold the model + KV cache + activations. |
|
|
|
I'm hoping it can work on a pair of 4090s, which combined have 48 GiB (51.539607552 GB) of VRAM. Ignoring all overhead, this leaves |
|
~4.74 GB for KV-cache and activations, which should be enough (?). |
|
|
|
- [ ] TODO: Test on 2x4090 with TP=2 |
|
|
|
## Notes |
|
|
|
Conversion was done with [(commit 7510a9d)](https://github.com/pytorch-labs/gpt-fast/commit/7510a9df7d23725ae46e9fca7d6ae8ee3a8f448e) |