Not passing attention_mask in model.generate

#25

by hcy511 - opened Jun 26

Jun 26

Hi, I wonder why there's no need to pass the attention_mask (the commented line below) in model.generate during inference. Thanks!

outputs = self.model.generate(
input_ids=model_inputs['input_ids'],
pixel_values=model_inputs['pixel_values'],
# attention_mask=model_inputs['attention_mask'],
max_new_tokens=100,
early_stopping=False,
do_sample=False,
)

haipingwu

Microsoft org Jun 26

hi, Florence-2 language model is encoder-decoder, and the attention_mask for inputs are all ones

Nano1337

Jun 27

But wouldn't we want an attention mask for padded tokens? We don't want to attend over padded tokens in the encoder or am I misunderstanding?

pawlowskipawel

Aug 3

I've also noticed issue with padding tokens attention - model accuracy was different in single sample vs batch inference, because attention mask for pad tokens is 1 when doing batch inference. I did a small code change in Florence-2-base code that allows to pass text attention mask to the model https://huggingface.co/microsoft/Florence-2-base/discussions/17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment