mkshing
/

novelai-tokenizer-v1

Model card Files Files and versions Community

novelai-tokenizer-v1 / README.md

mkshing's picture

Update README.md

77e084e about 1 year ago

|

history blame contribute delete

783 Bytes

	---
	license: gpl-2.0
	language:
	- en
	- ja
	tags:
	- tokenizer
	- novelai
	- sentencepiece
	---

	# NovelAI Tokenizer v1
	This repository is exactly the same as [NovelAI/nerdstash-tokenizer-v1](https://huggingface.co/NovelAI/nerdstash-tokenizer-v1),
	but the config has been changed to address the following points (the sp model itself is not changed).

	- Load as T5Tokenizer
	- Enable to decode digits (In the original, digits are registered as `additional_special_tokens`, so if `skip_special_tokens=True` when decoding, the digits are also skipped.)

	```python

	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("mkshing/novelai-tokenizer-v1", use_fast=False)

	text = "1+1=3"
	tokenizer.decode(tokenizer.encode(text), skip_special_tokens=True)
	# '1+1=3'
	```