Can it run on A100/A800 with VLLM?

#1
by Parkerlambert123 - opened

Sorry to bother you. The model can be run on A100 (SM80)?Some models like Llama3.1 can be run on A100/A800 with fp8_marlin.

Neural Magic org

Currently we don't support MoE FP8 models on Ampere. This is because vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision gemm.

mgoin changed discussion status to closed

Any update for this? Does vLLM support now?

Neural Magic org

vLLM supports FP8 MoE models only on Ada Lovelace or Hopper GPUs (>= SM 89) with hardware support for FP8

Sign up or log in to comment