Add XLM-R tokenizer files
#2
by
jvamvas
- opened
No description provided.
jvamvas
changed pull request status to
open
Perfect! Verified this works:
In [1]: from transformers import AutoTokenizer
In [2]: tokenizer = AutoTokenizer.from_pretrained("facebook/xmod-base", revision="refs/pr/2")
Downloading (β¦)okenizer_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 48.0/48.0 [00:00<00:00, 10.9kB/s]
Downloading (β¦)r%2F2/tokenizer.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.10M/9.10M [00:00<00:00, 14.1MB/s]
In [3]: tokenizer
Out[3]:
XLMRobertaTokenizerFast(name_or_path='facebook/xmod-base', vocab_size=250002, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': '<mask>'}, clean_up_tokenization_spaces=True), added_tokens_decoder={
0: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
3: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
250001: AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False, special=True),
}
lysandre
changed pull request status to
merged