The do_lower_case should be 'true'

#17
by robin0307 - opened

in tokenizer_config.json the "do_lower_case": false
but it's should be true

>>> from transformers import  AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-chinese')
>>> tokenizer.do_lower_case
False
>>> tokenizer.decode(tokenizer('My name is Robin')['input_ids'])
'[CLS] [UNK] name is [UNK] [SEP]'

Sign up or log in to comment