GGUF
mergekit
Merge
Not-For-All-Audiences
nsfw
Inference Endpoints
conversational

It uses too much RAM for a 7B model

#1
by Kenshiro-28 - opened

Hi! I started to test this model with llama-cpp-python but noticed it's using 29 GB of RAM, it doesn't happen with other 7B models. Is there some error in the model config?

Looks like it's not 7b, it's 8x7b mixtral-like model.

NeverSleep org

It's a Mixtral model (8x7B)

Ok, thank you! :)

Kenshiro-28 changed discussion status to closed

Sign up or log in to comment