Take input attention masks to support left-padded sequences

by hiyouga - opened Jul 15, 2023

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+329

-140

hiyouga

Jul 15, 2023

The previous implementation does not accept attention masks as inputs, so it will cause some unexpected behaviours at batched inference (commonly using left-padding). So I reimplemented the alibi encodings to take attention masks in user inputs. Note that this implementation largely depends on [1].

[1] https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/modeling_bloom.py

Take input attention masks to support left-padded sequences1f41c4b3

hiyouga

Jul 15, 2023

•

edited Jul 15, 2023

Of course, the above implementation requires re-computing alibi tensors at each inference time. We cannot use cached tensors while using input attention masks. Thus, the inference efficiency will be slightly worse than the original version.

Update modeling_baichuan.py3a663f5b

wuzhiying2023

Baichuan Intelligent Technology org Jul 16, 2023

Could alibi fused with expanded mask and do not need to take causal mask into consideration? Because alibi mask is like causal mask which is a lower triu?

update cache format to support contrastive search and beam search478dc90d

trim whitespacec9d4975e

hiyouga changed pull request status to closed Jul 19, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment