TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'
Trying to use this model as part of a pipeline (4x A100 80GB)
import torch
from transformers import pipeline, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_kwargs = {
"quantization_config": bnb_config,
}
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
pipe = pipeline(
"conversational",
model="CohereForAI/c4ai-command-r-v01",
tokenizer=tokenizer,
model_kwargs=model_kwargs,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16,
)
messages = [{"role": "user", "content": "Hello, how are you?"}]
response = pipe(messages)
print(response)
But getting this error in the forward pass:
│ /dodrio/scratch/projects/2023_071/cache/huggingface/modules/transformers_mod │
│ ules/CohereForAI/c4ai-command-r-v01/2a6d259c29bd319c3bdb8dd88b8d59b8c303c318 │
│ /modeling_cohere.py:1060 in _update_causal_mask │
│ │
│ 1057 │ │ │ │ # Attend to all tokens in fully masked rows in the ca │
│ 1058 │ │ │ │ # using left padding. This is required by F.scaled_do │
│ 1059 │ │ │ │ # Details: https://github.com/pytorch/pytorch/issues/ │
│ ❱ 1060 │ │ │ │ causal_mask = AttentionMaskConverter._unmask_unattend │
│ 1061 │ │ │
│ 1062 │ │ return causal_mask │
│ 1063
TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'
Any thoughts on this?
Does this piece of code work with the Llama models?
I'm going to close this for now. Will have to do more testing as I am experiencing multiple issues:
- flash attention does not work
- using model name without tokenizer in
pipeline
does not work - tokenizer has to be manually provided - CUDA errors
Will close this for now and report back but testing is difficult with models this size (loading to GPU in itself already takes 20 minutes or so).
hi @BramVanroy , there is a small bug that affects import checks for custom models (trust_remote_code=True). We removed the flash attention imports for now, can you try with torch.backends.cuda.enable_flash_sdp(True)?
Coming back to the original issue
@sarahooker
@ahmetustun
: TypeError: AttentionMaskConverter._unmask_unattended() missing 1 required positional argument: 'unmasked_value'
was the error message. It seems that the cohere modeling file in this repo relies on the current (unreleased) master implementation of transformers
. The required unmasked_value
has been removed in a commit two weeks ago (https://github.com/huggingface/transformers/commit/49204c1d37b807def930fe45f5f84abc370a7200) but this is not part of any release yet. As such, the current model cannot be used with any official release of transformers yet. Perhaps that could be clarified in the README? It currently says pip install transformers
but that is not sufficient AFAICT. You should install from source to get this to work.
This should have been resolved by the new release of transformers. Can you confirm @BramVanroy ?
I'm going to close this for now, but feel free to re-open if still an issue.