The do_lower_case should be 'true'
#17
by
robin0307
- opened
in tokenizer_config.json the "do_lower_case": false
but it's should be true
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-chinese')
>>> tokenizer.do_lower_case
False
>>> tokenizer.decode(tokenizer('My name is Robin')['input_ids'])
'[CLS] [UNK] name is [UNK] [SEP]'