Error in jamba slow_forward method
Hi,
I tried to run the model using deepspeed, with the example code given in the model card.
I run into following error:
File ~/.cache/huggingface/modules/transformers_modules/ai21labs/Jamba-v0.1/ce13f3fe99555a2606d1892665bb67649032ff2d/modeling_jamba.py:986, in JambaMambaMixer.forward(self, hidden_states, cache_params)
982 raise ValueError(
983 "Fast Mamba kernels are not available. Make sure to they are installed and that the mamba module is on a CUDA device"
984 )
985 return self.cuda_kernels_forward(hidden_states, cache_params)
--> 986 return self.slow_forward(hidden_states, cache_params)
File ~/.cache/huggingface/modules/transformers_modules/ai21labs/Jamba-v0.1/ce13f3fe99555a2606d1892665bb67649032ff2d/modeling_jamba.py:923, in JambaMambaMixer.slow_forward(self, input_states, cache_params)
921 conv_state[:, :, -1] = hidden_states[:, :, 0]
922 cache_params.conv_states[self.layer_idx] = conv_state
--> 923 hidden_states = torch.sum(conv_state * self.conv1d.weight[:, 0, :], dim=-1)
924 if self.use_conv_bias:
925 hidden_states += self.conv1d.bias
IndexError: too many indices for tensor of dimension 1
Any pointers to resolve this would be helpful
Thank you