mamba bug

#4
by cloudyu - opened

when to run mistral-chat ./codestral-mamba-7B-v0.1 --instruct

AttributeError: 'Mamba2' object has no attribute 'dconv'. Did you mean: 'd_conv'?

mistral_common 1.2.1
mistral_inference 1.2.0
mamba-ssm 2.2.0

Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0] on linux

full log

Traceback (most recent call last):
File "/home/cloudyu/.local/bin/mistral-chat", line 8, in
sys.exit(mistral_chat())
File "/home/cloudyu/.local/lib/python3.10/site-packages/mistral_inference/main.py", line 201, in mistral_chat
fire.Fire(interactive)
File "/home/cloudyu/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/cloudyu/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/cloudyu/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mistral_inference/main.py", line 115, in interactive
generated_tokens, _ = generate_fn( # type: ignore[operator]
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mistral_inference/generate.py", line 21, in generate_mamba
output = model.model.generate(
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 260, in generate
output = decode(
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 221, in decode
scores.append(get_logits(sequences[-1], inference_params))
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/utils/generation.py", line 184, in get_logits
logits = model(
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/models/mixer_seq_simple.py", line 279, in forward
hidden_states = self.backbone(input_ids, inference_params=inference_params, **mixer_kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/models/mixer_seq_simple.py", line 194, in forward
hidden_states, residual = layer(
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/modules/block.py", line 67, in forward
hidden_states = self.mixer(hidden_states, inference_params=inference_params, **mixer_kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cloudyu/.local/lib/python3.10/site-packages/mamba_ssm/modules/mamba2.py", line 230, in forward
self.conv1d(xBC.transpose(1, 2)).transpose(1, 2)[:, -(self.dconv - 1):]
File "/home/cloudyu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'Mamba2' object has no attribute 'dconv'. Did you mean: 'd_conv'?

Same or similar problem here.
After finaly solving the previous mamba_ssm install problem using (pip install mamba-ssm --no-cache-dir).
now I got this when I run the example:
mistral-chat $HOME/mistral_models/mamba-codestral-7B-v0.1 --instruct --max_tokens 256
Prompt: could you give the the rosenbrock function implementation in matlab?
Traceback (most recent call last):
File "/home/rswork/anaconda3/envs/ptllm/bin/mistral-chat", line 8, in
sys.exit(mistral_chat())
^^^^^^^^^^^^^^
....
File "/home/rswork/anaconda3/envs/ptllm/lib/python3.12/site-packages/mamba_ssm/modules/mamba2.py", line 231, in forward
self.conv1d(xBC.transpose(1, 2)).transpose(1, 2)[:, -(self.dconv - 1):]
^^^^^^^^^^
File "/home/rswork/anaconda3/envs/ptllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1709, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'Mamba2' object has no attribute 'dconv'. Did you mean: 'd_conv'?

The same issue also arizes when using mistral-demo.

Looks like there is a solution here:
https://github.com/mistralai/mistral-inference/issues/192#issuecomment-2234242452

Thanks for the link. Maybe I am missing something, but an example where it is working with some particular version of drivers and packages is not really a satisfying solution to this issue for me. The generalization of that would be to format and reinstall the OS and all drivers to replicate exactly the collab environment where the developer of a particular software tested his work and then do that for every software again and again.
If I reinstall and change the Cuda drivers on the system, other things will be broken. If the official requirement is that Cuda should be >11.6 ( for mamba_ssm), then 11.8 should work. Mamba_ssm is installed and working, and all requirements are met on my system.
The: AttributeError: 'Mamba2' object has no attribute 'dconv'. Did you mean: 'd_conv'? makes it seem as if there is some other issue here.
It seems to be an issue with causal-conv1d and or mistral_inference.

Also, in the thread in link, there is someone who got it to work with Cuda 11.8. So compatibility seems a bit of a lotery until some one finds out the real cause of the issue.

Hey @Ramzeee , you should install causal-conv1d too: pip install causal-conv1d>=1.4.0. With this you hopefully won't encounter this.

The most recent implementation of mamba_ssm likely has a bug on the non-optimized path, introduced via a commit for var_len inference. If you don't want the extra dependency, you must wait until the repo is updated or fix the typo in your lib sources yourself. This comment provides the necessary changes at first glance: Change this line with self.conv1d(xBC.transpose(1, 2)).transpose(1, 2)[:, (self.d_conv - 1):]. It is no weird system dep entanglement imo, just your average typo error :D

Edit: Or ofc downgrade versions.

And in general, the non-optimised path seems buggy for the convolutions so you might encounter more errors along the way.

!mistral-chat /root/mistral_models/Mamba-Codestral-7B-v0.1 --instruct --max_tokens 256

/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, dout):
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:986: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1045: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, dout, *args):
/usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):
/usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, grad_output):
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:758: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float("inf")), return_final_states=False, activation="silu",
/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:836: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(ctx, dout, *args):
Prompt: hi
Traceback (most recent call last):
File "/usr/local/bin/mistral-chat", line 8, in
sys.exit(mistral_chat())
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/main.py", line 259, in mistral_chat
fire.Fire(interactive)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/main.py", line 172, in interactive
generated_tokens, _ = generate_fn( # type: ignore[operator]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
TypeError: generate_mamba() takes 2 positional arguments but 3 positional arguments (and 3 keyword-only arguments) were given

Sign up or log in to comment