Tokenizer class issue in the config file
#4
by
ydshieh
HF staff
- opened
Hi
@ankrgyl
. Do you know why we have"tokenizer_class": "RobertaTokenizer",
in the config file instead of LayoutLMTokenizer
? Is RobertaTokenizer
used in fine-tuning this downstream QA task?
Yes! It's forked from here: https://huggingface.co/microsoft/layoutlm-base-cased/blob/main/config.json
Thanks! There might be some reason why layoutlm-base-cased
use RobertaTokenizer
but layoutlm-base-uncased
doesn't specify a class (so will use LayoutLMTokenizer
. However, this question should be posted on those repos.
Okay great, sounds good to me. If you discover anything super interesting, please update here :). I'll close this out for now.
ankrgyl
changed discussion status to
closed