TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'

#13
by BramVanroy - opened

Trying to use this model as part of a pipeline (4x A100 80GB)

import torch
from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig


bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_kwargs = {
       "quantization_config": bnb_config,
}

tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
pipe = pipeline(
                "conversational",
                model="CohereForAI/c4ai-command-r-v01",
                tokenizer=tokenizer,
                model_kwargs=model_kwargs,
                trust_remote_code=True,
                device_map="auto",
                torch_dtype=torch.bfloat16,
)

messages = [{"role": "user", "content": "Hello, how are you?"}]
response = pipe(messages)
print(response)

But getting this error in the forward pass:

│ /dodrio/scratch/projects/2023_071/cache/huggingface/modules/transformers_mod │
│ ules/CohereForAI/c4ai-command-r-v01/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318 │
│ /modeling_cohere.py:1060 in _update_causal_mask                              │
│                                                                              │
│   1057 │   │   │   │   # Attend to all tokens in fully masked rows in the ca │
│   1058 │   │   │   │   # using left padding. This is required by F.scaled_do │
│   1059 │   │   │   │   # Details: https://github.com/pytorch/pytorch/issues/ │
│ ❱ 1060 │   │   │   │   causal_mask = AttentionMaskConverter._unmask_unattend │
│   1061 │   │                                                                 │
│   1062 │   │   return causal_mask                                            │
│   1063

TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'

Any thoughts on this?

Cohere For AI org

Does this piece of code work with the Llama models?

I'm going to close this for now. Will have to do more testing as I am experiencing multiple issues:

  • flash attention does not work
  • using model name without tokenizer in pipeline does not work - tokenizer has to be manually provided
  • CUDA errors

Will close this for now and report back but testing is difficult with models this size (loading to GPU in itself already takes 20 minutes or so).

BramVanroy changed discussion status to closed
Cohere For AI org

hi @BramVanroy , there is a small bug that affects import checks for custom models (trust_remote_code=True). We removed the flash attention imports for now, can you try with torch.backends.cuda.enable_flash_sdp(True)?

Coming back to the original issue @sarahooker @ahmetustun : TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value' was the error message. It seems that the cohere modeling file in this repo relies on the current (unreleased) master implementation of transformers. The required unmasked_value has been removed in a commit two weeks ago (https://github.com/huggingface/transformers/commit/49204c1d37b807def930fe45f5f84abc370a7200) but this is not part of any release yet. As such, the current model cannot be used with any official release of transformers yet. Perhaps that could be clarified in the README? It currently says pip install transformers but that is not sufficient AFAICT. You should install from source to get this to work.

BramVanroy changed discussion status to open
Cohere For AI org

This should have been resolved by the new release of transformers. Can you confirm @BramVanroy ?

I'm going to close this for now, but feel free to re-open if still an issue.

sarahooker changed discussion status to closed

Sign up or log in to comment