This attention mask is in the dictionary returned | |
by the tokenizer under the key "attention_mask": | |
thon | |
padded_sequences["attention_mask"] | |
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]] | |
autoencoding models | |
See encoder models and masked language modeling | |
autoregressive models | |
See causal language modeling and decoder models | |
B | |
backbone | |
The backbone is the network (embeddings and layers) that outputs the raw hidden states or features. |