jarrelscy
/

Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Resources

View closed (0)

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram)

#3 opened 8 months ago by

KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'

#2 opened 8 months ago by

no special_tokens_map.json tokenizer_config.json and tokenizer.json

#1 opened 8 months ago by