finetune commited on
Commit
e5beb19
β€’
1 Parent(s): 75d4e5d

Add meta data

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  # Tokenizer
2
 
3
  Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence β€œ`The quick brown fox jumps over the goblin.`” would tokenize as β€œ`The| quick| brown| fox| jumps| over| the| go|bl|in.`” in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.
 
1
+ ---
2
+ license: gpl-2.0
3
+ language:
4
+ - en
5
+ - ja
6
+ tags:
7
+ - tokenizer
8
+ - novelai
9
+ ---
10
  # Tokenizer
11
 
12
  Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence β€œ`The quick brown fox jumps over the goblin.`” would tokenize as β€œ`The| quick| brown| fox| jumps| over| the| go|bl|in.`” in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.