line-corporation
/

japanese-large-lm-1.7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sho-takase commited on Aug 14, 2023

Commit

4327e37

•

1 Parent(s): ab033e8

Fix readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -52,7 +52,7 @@ for t in text:
 Our training corpus consists of the Japanese portions of publicly available corpus such as C4, CC-100, and Oscar.
 We also incorporated the Web texts crawled by in-house system.
 The total size of our training corpus is about 650 GB.
-The trained model achieves 8.57 perplexity on the internal validation sets of Japanese C4,
 ## Tokenization
 We use a sentencepiece tokenizer with a unigram language model and byte-fallback.

 Our training corpus consists of the Japanese portions of publicly available corpus such as C4, CC-100, and Oscar.
 We also incorporated the Web texts crawled by in-house system.
 The total size of our training corpus is about 650 GB.
+The trained model achieves 8.57 perplexity on the internal validation sets of Japanese C4.
 ## Tokenization
 We use a sentencepiece tokenizer with a unigram language model and byte-fallback.