mkshing
/

novelai-tokenizer-v1

Model card Files Files and versions Community

mkshing commited on Aug 14, 2023

Commit

77e084e

•

1 Parent(s): 66107eb

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -1,3 +1,28 @@
 ---
 license: gpl-2.0
 ---

 ---
 license: gpl-2.0
+language:
+- en
+- ja
+tags:
+- tokenizer
+- novelai
+- sentencepiece
 ---
+# NovelAI Tokenizer v1
+This repository is exactly the same as [NovelAI/nerdstash-tokenizer-v1](https://huggingface.co/NovelAI/nerdstash-tokenizer-v1),
+but the config has been changed to address the following points (the sp model itself is not changed).
+- Load as T5Tokenizer
+- Enable to decode digits (In the original, digits are registered as `additional_special_tokens`, so if `skip_special_tokens=True` when decoding, the digits are also skipped.)
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("mkshing/novelai-tokenizer-v1", use_fast=False)
+text = "1+1=3"
+tokenizer.decode(tokenizer.encode(text), skip_special_tokens=True)
+# '1+1=3'
+```