TheBloke/alfred-40B-1023-AWQ · Tensor size mismatch

Hi, thanks for converting alfred to AWQ.
When running the model using vLLM 0.2.2 (either via the server command line or using the LLM constructor), I get the following error:

RuntimeError: The size of tensor a (16384) must match the size of tensor b (32768) at non-singleton dimension 2

There is also a warning:

You are using a model of type RefinedWeb to instantiate a model of type falcon. This is not supported for all configurations of models and can yield errors.
WARNING 11-20 12:00:39 config.py:433] The model's config.json does not contain any of the following keys to determine the original maximum length of the model: ['max_position_embeddings', 'n_positions', 'max_seq_len', 'seq_length', 'max_sequence_length', 'max_seq_length', 'seq_len']. Assuming the model's maximum length is 2048.

Here is an example command that fails:

python3 -m vllm.entrypoints.api_server --model TheBloke/alfred-40B-1023-AWQ --quantization awq --dtype auto --tensor-parallel-size 2 --trust-remote-code