Not passing attention_mask in model.generate

#25
by hcy511 - opened

Hi, I wonder why there's no need to pass the attention_mask (the commented line below) in model.generate during inference. Thanks!

outputs = self.model.generate(
input_ids=model_inputs['input_ids'],
pixel_values=model_inputs['pixel_values'],
# attention_mask=model_inputs['attention_mask'],
max_new_tokens=100,
early_stopping=False,
do_sample=False,
)

Microsoft org

hi, Florence-2 language model is encoder-decoder, and the attention_mask for inputs are all ones

But wouldn't we want an attention mask for padded tokens? We don't want to attend over padded tokens in the encoder or am I misunderstanding?

I've also noticed issue with padding tokens attention - model accuracy was different in single sample vs batch inference, because attention mask for pad tokens is 1 when doing batch inference. I did a small code change in Florence-2-base code that allows to pass text attention mask to the model https://huggingface.co/microsoft/Florence-2-base/discussions/17

Sign up or log in to comment