Tokenizer class issue in the config file

by ydshieh HF staff - opened Sep 26, 2022

Sep 26, 2022

•

edited Sep 26, 2022

Hi @ankrgyl . Do you know why we have
"tokenizer_class": "RobertaTokenizer",
in the config file instead of LayoutLMTokenizer? Is RobertaTokenizer used in fine-tuning this downstream QA task?

ankrgyl

Impira org Sep 26, 2022

Yes! It's forked from here: https://huggingface.co/microsoft/layoutlm-base-cased/blob/main/config.json

ydshieh

Sep 26, 2022

Thanks! There might be some reason why layoutlm-base-cased use RobertaTokenizer but layoutlm-base-uncased doesn't specify a class (so will use LayoutLMTokenizer. However, this question should be posted on those repos.

ankrgyl

Impira org Sep 26, 2022

Okay great, sounds good to me. If you discover anything super interesting, please update here :). I'll close this out for now.

ankrgyl changed discussion status to closed Sep 26, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment