Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
raw
history blame contribute delete
508 Bytes
This attention mask is in the dictionary returned
by the tokenizer under the key "attention_mask":
thon
padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
autoencoding models
See encoder models and masked language modeling
autoregressive models
See causal language modeling and decoder models
B
backbone
The backbone is the network (embeddings and layers) that outputs the raw hidden states or features.