dicta-il
/

dictabert-seg

Feature Extraction

text-embeddings-inference

Model card Files Files and versions Community

Shaltiel commited on Aug 31, 2023

Commit

a6162ae

•

1 Parent(s): e6626d0

Added truncation for long sequences

Files changed (1) hide show

BertForPrefixMarking.py +1 -1

BertForPrefixMarking.py CHANGED Viewed

@@ -159,7 +159,7 @@ class BertForPrefixMarking(BertPreTrainedModel):
 def encode_sentences_for_bert_for_prefix_marking(tokenizer: BertTokenizerFast, sentences: List[str], padding='longest'):
-    inputs = tokenizer(sentences, padding=padding, return_tensors='pt')
     # create our prefix_id_options array which will be like the input ids shape but with an addtional
     # dimension containing for each prefix whether it can be for that word

 def encode_sentences_for_bert_for_prefix_marking(tokenizer: BertTokenizerFast, sentences: List[str], padding='longest'):
+    inputs = tokenizer(sentences, padding=padding, truncation=True, return_tensors='pt')
     # create our prefix_id_options array which will be like the input ids shape but with an addtional
     # dimension containing for each prefix whether it can be for that word