Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
position IDs
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token.