I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram)
1
#3 opened 8 months ago
by
orel12
KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'
5
#2 opened 8 months ago
by
tutu329
no special_tokens_map.json tokenizer_config.json and tokenizer.json
#1 opened 8 months ago
by
tutu329