problems when use mistral-chat

#24
by jinglishi0206 - opened

HOW TO REPRODUCE

  1. follow the step download & install mistral-inference
  2. download models
  3. RUN mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256

Output:

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
`flshattF@0.0.0` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 128

On the package page https://github.com/mistralai/mistral-inference, it says that you need to have a GPU to install it, because it also installs the 'xformers' package which needs a GPU, probably one that has cores CUDA.

Sign up or log in to comment