Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame
166 Bytes
The attention mask is
modified to mask the current token (except at the first position), because it will give a query and a key equal (so
very similar to each other).