Update README.md
Browse files
README.md
CHANGED
@@ -9,6 +9,7 @@ datasets:
|
|
9 |
- allenai/c4
|
10 |
language:
|
11 |
- ja
|
|
|
12 |
---
|
13 |
|
14 |
# What’s this?
|
@@ -51,6 +52,10 @@ model = AutoModelForTokenClassification.from_pretrained(model_name)
|
|
51 |
|
52 |
本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
|
53 |
|
|
|
|
|
|
|
|
|
54 |
---
|
55 |
The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
|
56 |
|
@@ -62,6 +67,10 @@ Key points include:
|
|
62 |
|
63 |
Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
|
64 |
|
|
|
|
|
|
|
|
|
65 |
# Data
|
66 |
| Dataset Name | Notes | File Size (with metadata) | Factor |
|
67 |
| ------------- | ----- | ------------------------- | ---------- |
|
@@ -83,6 +92,7 @@ Although the original DeBERTa V3 is characterized by a large vocabulary size, wh
|
|
83 |
- Training steps: 1,000,000
|
84 |
- Warmup steps: 100,000
|
85 |
- Precision: Mixed (fp16)
|
|
|
86 |
|
87 |
# Evaluation
|
88 |
| Model | #params | JSTS | JNLI | JSQuAD | JCQA |
|
|
|
9 |
- allenai/c4
|
10 |
language:
|
11 |
- ja
|
12 |
+
library_name: transformers
|
13 |
---
|
14 |
|
15 |
# What’s this?
|
|
|
52 |
|
53 |
本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
|
54 |
|
55 |
+
注意点として、 `xsmall` 、 `base` 、 `large` の 3 つのモデルのうち、前者二つは unigram アルゴリズムで学習しているが、 `large` モデルのみ BPE アルゴリズムで学習している。
|
56 |
+
深い理由はなく、 `large` モデルのみ語彙サイズを増やすために独立して学習を行ったが、なぜか unigram アルゴリズムでの学習がうまくいかなかったことが原因である。
|
57 |
+
原因の探究よりモデルの完成を優先して、 BPE アルゴリズムに切り替えた。
|
58 |
+
|
59 |
---
|
60 |
The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
|
61 |
|
|
|
67 |
|
68 |
Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
|
69 |
|
70 |
+
Note that, among the three models: xsmall, base, and large, the first two were trained using the unigram algorithm, while only the large model was trained using the BPE algorithm.
|
71 |
+
The reason for this is simple: while the large model was independently trained to increase its vocabulary size, for some reason, training with the unigram algorithm was not successful.
|
72 |
+
Thus, prioritizing the completion of the model over investigating the cause, we switched to the BPE algorithm.
|
73 |
+
|
74 |
# Data
|
75 |
| Dataset Name | Notes | File Size (with metadata) | Factor |
|
76 |
| ------------- | ----- | ------------------------- | ---------- |
|
|
|
92 |
- Training steps: 1,000,000
|
93 |
- Warmup steps: 100,000
|
94 |
- Precision: Mixed (fp16)
|
95 |
+
- Vocabulary size: 32,000
|
96 |
|
97 |
# Evaluation
|
98 |
| Model | #params | JSTS | JNLI | JSQuAD | JCQA |
|