Min hardware requirements

by narvind2003 - opened Dec 11, 2023

Discussion

narvind2003

Dec 11, 2023

Could you please add the minimum hardware requirements to run this Instruct model?

kev216

Dec 11, 2023

•

edited Dec 11, 2023

I have tried using a 4090 24G, but it didn't work... 💔💔💔 we need more RAM.

0-hero

Dec 11, 2023

•

edited Dec 11, 2023

“>130GB required”

Olaf

Dec 11, 2023

•

edited Dec 11, 2023

I believe you can find this information here: https://docs.mistral.ai/models/
(min. 100GB GPU RAM)

kev216

Dec 11, 2023

I believe you can find this information here: https://docs.mistral.ai/models/
(min. 100GB GPU RAM)

I think we may not be able to utilize such a large amount (100 GB) with 'load_in_4bits' or 'load_in_8bit.' I will attempt it with the A100 80GB. ^^

ArthurZ

Mistral AI_ org Dec 11, 2023

Load in 4 bits should work. Same as 8bit!

osanseviero

Dec 11, 2023

•

edited Dec 25, 2023

Quick math (approximate): 45 billion parameters

In 4-bits -> 180 trillion bits, that's 22.5GB of VRAM required
in 8-bits -> 45GB of VRAM
in half-precision -> 90GB of VRAM required

Note that 4-bits is presenting high quality degradation. It might be interesting to explore only quantizing the experts. https://arxiv.org/abs/2310.16795, for example, introduces QMoE which allows sub-1-bit quantization for MoEs. @timdettmers is also exploring this topic, so I'm waiting for exciting things in the incoming days!

luigisaetta

Dec 11, 2023

I have tested it on a VM with 2 GPU A10 (23 + 23 GB GPU) it works if using load_in_4bit, not 8bit. Performance in Italian are interesting

JayBZD

Dec 12, 2023

•

edited Dec 12, 2023

RTX 4090, 24GB dedicated GPU memory + 32 GB shared GPU memory, Windows 11, WSL (Ubuntu):

from_pretrained(load_in_8bits=True)
45.8 GB
from_pretrained(load_in_4bits=True)
27.1 GB

SabareeshGc

Dec 12, 2023

@JayBZD how to run with shared memory . I have 4090 and 64gb system memory available

supportend

Dec 13, 2023

I run it (Q6_K) on CPU only, it's much faster than 70B Models, but consumes over 50 percent from my 64 GB RAM.

Husain

Dec 20, 2023

If we would like to run at full precision, we will need 45 * 32 = 1440B bits -> 180 GB ?
Are the parameters in float32 by default?

one-thing

Dec 25, 2023

On RTX A6000(48GB)
load_in_4bit = 27.2GB
load_in_8bit = 45.4 GB

Businessboi

Jan 1

How do I load in 4bit when using the transformers library? Or do I load it another way?

one-thing

Jan 14

Pass “load_in_bit = True” in model.from_pretrained()

ShivanshMathur007

Jan 29

What is the required RAM|CPU|GPU? to run this model and the Q4_K_M version of the model?

preslaff

Feb 8

You can look at this: https://arxiv.org/abs/2312.17238

For fast inference with ~24GB RAM+VRAM in colab look at this: https://colab.research.google.com/github/dvmazur/mixtral-offloading

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment