update readme
Browse files
README.md
CHANGED
@@ -20,15 +20,25 @@ inference: false
|
|
20 |
![rinna-icon](./rinna.png)
|
21 |
|
22 |
# Overview
|
23 |
-
This repository provides a Japanese GPT-NeoX model of 3.6 billion parameters.
|
24 |
|
25 |
-
|
26 |
-
|
|
|
27 |
|
28 |
-
|
29 |
-
The model was trained on around **312.5B** tokens from [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz), [Japanese C4](https://huggingface.co/datasets/mc4), and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective.
|
30 |
|
31 |
-
A
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
# How to use the model
|
34 |
|
@@ -89,9 +99,5 @@ The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based
|
|
89 |
# 'ααααα [UNK]ααα εΎθΌ©γ― η«γ§γγ </s>'
|
90 |
~~~
|
91 |
|
92 |
-
# Authors
|
93 |
-
* [Tianyu Zhao](https://huggingface.co/tianyuz)
|
94 |
-
* [Kei Sawada](https://huggingface.co/keisawada)
|
95 |
-
|
96 |
# Licenese
|
97 |
[The MIT license](https://opensource.org/licenses/MIT)
|
|
|
20 |
![rinna-icon](./rinna.png)
|
21 |
|
22 |
# Overview
|
23 |
+
This repository provides a Japanese GPT-NeoX model of 3.6 billion parameters.
|
24 |
|
25 |
+
* **Library**
|
26 |
+
|
27 |
+
The model was trained using code based on [EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
|
28 |
|
29 |
+
* **Model architecture**
|
|
|
30 |
|
31 |
+
A 36-layer, 2816-hidden-size transformer-based language model.
|
32 |
+
|
33 |
+
* **Pre-training**
|
34 |
+
|
35 |
+
The model was trained on around **312.5B** tokens from [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz), [Japanese C4](https://huggingface.co/datasets/mc4), and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective.
|
36 |
+
|
37 |
+
A final validation perplexity of **8.68** has been reached.
|
38 |
+
|
39 |
+
* **Authors**
|
40 |
+
|
41 |
+
[Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
|
42 |
|
43 |
# How to use the model
|
44 |
|
|
|
99 |
# 'ααααα [UNK]ααα εΎθΌ©γ― η«γ§γγ </s>'
|
100 |
~~~
|
101 |
|
|
|
|
|
|
|
|
|
102 |
# Licenese
|
103 |
[The MIT license](https://opensource.org/licenses/MIT)
|