KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'

by tutu329 - opened Apr 21

Discussion

tutu329

Apr 21

vllm 0.4.0 post1 with 8*2080ti 22g which can inference mixtral-8x22b-instruct-v0.1-awq correctly.

jarrelscy

Owner Apr 21

Try the latest version?

tutu329

Apr 21

Try the latest version?

using latest vllm 0.4.1 still reports this error

jarrelscy

Owner Apr 22

Running with vllm 0.3.0 - this works for me.

from vllm import LLM

llm = LLM(
model="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tokenizer="jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit",
tensor_parallel_size=8,
)

orel12

Apr 24

•

edited Apr 24

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

tutu329

Apr 26

I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram) @jarrelscy

--max-model-len=10000 （try some larger）

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment