TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'

#13

by BramVanroy - opened Mar 12

Discussion

BramVanroy

Mar 12

•

edited Mar 12

Trying to use this model as part of a pipeline (4x A100 80GB)

import torch
from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig


bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_kwargs = {
       "quantization_config": bnb_config,
}

tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
pipe = pipeline(
                "conversational",
                model="CohereForAI/c4ai-command-r-v01",
                tokenizer=tokenizer,
                model_kwargs=model_kwargs,
                trust_remote_code=True,
                device_map="auto",
                torch_dtype=torch.bfloat16,
)

messages = [{"role": "user", "content": "Hello, how are you?"}]
response = pipe(messages)
print(response)

But getting this error in the forward pass:

│ /dodrio/scratch/projects/2023_071/cache/huggingface/modules/transformers_mod │
│ ules/CohereForAI/c4ai-command-r-v01/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318 │
│ /modeling_cohere.py:1060 in _update_causal_mask                              │
│                                                                              │
│   1057 │   │   │   │   # Attend to all tokens in fully masked rows in the ca │
│   1058 │   │   │   │   # using left padding. This is required by F.scaled_do │
│   1059 │   │   │   │   # Details: https://github.com/pytorch/pytorch/issues/ │
│ ❱ 1060 │   │   │   │   causal_mask = AttentionMaskConverter._unmask_unattend │
│   1061 │   │                                                                 │
│   1062 │   │   return causal_mask                                            │
│   1063

TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'

Any thoughts on this?

saurabhdash

Cohere For AI org Mar 12

Does this piece of code work with the Llama models?

BramVanroy

Mar 12

I'm going to close this for now. Will have to do more testing as I am experiencing multiple issues:

flash attention does not work
using model name without tokenizer in pipeline does not work - tokenizer has to be manually provided
CUDA errors

Will close this for now and report back but testing is difficult with models this size (loading to GPU in itself already takes 20 minutes or so).

BramVanroy changed discussion status to closed Mar 12

ahmetustun

Cohere For AI org Mar 12

hi @BramVanroy , there is a small bug that affects import checks for custom models (trust_remote_code=True). We removed the flash attention imports for now, can you try with torch.backends.cuda.enable_flash_sdp(True)?

BramVanroy

Mar 13

Coming back to the original issue @sarahooker @ahmetustun : TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value' was the error message. It seems that the cohere modeling file in this repo relies on the current (unreleased) master implementation of transformers. The required unmasked_value has been removed in a commit two weeks ago (https://github.com/huggingface/transformers/commit/49204c1d37b807def930fe45f5f84abc370a7200) but this is not part of any release yet. As such, the current model cannot be used with any official release of transformers yet. Perhaps that could be clarified in the README? It currently says pip install transformers but that is not sufficient AFAICT. You should install from source to get this to work.

BramVanroy changed discussion status to open Mar 13

sarahooker

Cohere For AI org Apr 3

This should have been resolved by the new release of transformers. Can you confirm @BramVanroy ?

I'm going to close this for now, but feel free to re-open if still an issue.

sarahooker changed discussion status to closed Apr 3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment