KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'
vllm 0.4.0 post1 with 8*2080ti 22g which can inference mixtral-8x22b-instruct-v0.1-awq correctly.
Try the latest version?
Try the latest version?
using latest vllm 0.4.1 still reports this error
Running with vllm 0.3.0 - this works for me.
from vllm import LLM
llm = LLM(
model="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tokenizer="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tensor_parallel_size=8,
)
I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy
I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy
--max-model-len=10000 (try some larger)