Add meta data
Browse files
README.md
CHANGED
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Tokenizer
|
2 |
|
3 |
Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence β`The quick brown fox jumps over the goblin.`β would tokenize as β`The| quick| brown| fox| jumps| over| the| go|bl|in.`β in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.
|
|
|
1 |
+
---
|
2 |
+
license: gpl-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- ja
|
6 |
+
tags:
|
7 |
+
- tokenizer
|
8 |
+
- novelai
|
9 |
+
---
|
10 |
# Tokenizer
|
11 |
|
12 |
Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence β`The quick brown fox jumps over the goblin.`β would tokenize as β`The| quick| brown| fox| jumps| over| the| go|bl|in.`β in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.
|