TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-AWQ

Hugging Face

Error in model size?

by angel-luis - opened Feb 2

Discussion

angel-luis

Feb 2

Currently indicates:
Model size
6.48B params

I can't inference it using TGI because I don't have enough RAM.

YaTharThShaRma999

Feb 2

@angel-luis its probably because this is quanted, safetensor thinks this is 6.48b param.
However it is the 8x7b model but just quanted.

I would reccomend using the gguf format with llama.cpp or exl2 format with exlamav2 since not only are they faster but they should take less ram.
Tgi is better with batching but you don't seem to have enough ram to run the model.

llama.cpp with gguf is better for cpu, mac(still 2nd fastest for gpu with single prompt)
exl2 with exllamav2 is better for gpu

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment