HOW TO REPRODUCE

follow the step download & install mistral-inference
download models
RUN mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256

Output:

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
`flshattF@0.0.0` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 128

mistralai
/

Mistral-7B-Instruct-v0.3

problems when use mistral-chat

HOW TO REPRODUCE

Output: