vocab.txt is missing

#18
by trinisim - opened

Without vocab.txt I can use the fast tokenizer:

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")

But if I try to use the non-fast tokenizer:

tokenizer=AutoTokenizer.from_pretrained("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",use_fast=False)

I see the following error:

File "/sgreene/python/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgreene/python/transformers/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgreene/python/transformers/models/bert/tokenization_bert.py", line 199, in __init__
    if not os.path.isfile(vocab_file):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 30, in isfile

I am unable to use the fast tokenizer in my project.

Sign up or log in to comment