ilos-vigil
commited on
Commit
•
e159f47
1
Parent(s):
b93f6a6
Add README.md, model weight and Tensorboard log
Browse files- README.md +156 -1
- pytorch_model.bin +1 -1
- runs/sanitzed_log/events.out.tfevents.0 +3 -0
README.md
CHANGED
@@ -21,4 +21,159 @@ widget:
|
|
21 |
|
22 |
# Indonesian small BigBird model NLI
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
# Indonesian small BigBird model NLI
|
23 |
|
24 |
+
## Source Code
|
25 |
+
|
26 |
+
Source code to create this model and perform benchmark is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
|
27 |
+
|
28 |
+
## Model Description
|
29 |
+
|
30 |
+
This model is based on [bigbird-small-indonesian](https://huggingface.co/ilos-vigil/bigbird-small-indonesian) and was finetuned on 2 datasets. It is intended to be used for zero-shot text classification.
|
31 |
+
|
32 |
+
## How to use
|
33 |
+
|
34 |
+
> Inference for ZSC (Zero Shot Classification) task
|
35 |
+
|
36 |
+
```py
|
37 |
+
>>> pipe = pipeline(
|
38 |
+
... task='zero-shot-classification',
|
39 |
+
... model='./tmp/checkpoint-28832'
|
40 |
+
... )
|
41 |
+
>>> pipe(
|
42 |
+
... sequences='Fakta nomor 7 akan membuat ada terkejut',
|
43 |
+
... candidate_labels=['clickbait', 'bukan clickbait'],
|
44 |
+
... hypothesis_template='Judul video ini {}.',
|
45 |
+
... multi_label=False
|
46 |
+
... )
|
47 |
+
{
|
48 |
+
'sequence': 'Fakta nomor 7 akan membuat ada terkejut',
|
49 |
+
'labels': ['clickbait', 'bukan clickbait'],
|
50 |
+
'scores': [0.6102734804153442, 0.38972654938697815]
|
51 |
+
}
|
52 |
+
>>> pipe(
|
53 |
+
... sequences='Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
|
54 |
+
... candidate_labels=['teknologi', 'olahraga', 'bisnis', 'politik', 'kesehatan', 'kuliner'],
|
55 |
+
... hypothesis_template='Kategori berita ini adalah {}.',
|
56 |
+
... multi_label=True
|
57 |
+
... )
|
58 |
+
{
|
59 |
+
'sequence': 'Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
|
60 |
+
'labels': ['politik', 'teknologi', 'kesehatan', 'bisnis', 'olahraga', 'kuliner'],
|
61 |
+
'scores': [0.7390161752700806, 0.6657379269599915, 0.4459509551525116, 0.38407933712005615, 0.3679264783859253, 0.14181996881961823]
|
62 |
+
}
|
63 |
+
```
|
64 |
+
|
65 |
+
> Inference for NLI (Natural Language Inference) task
|
66 |
+
|
67 |
+
```py
|
68 |
+
>>> pipe = pipeline(
|
69 |
+
... task='text-classification',
|
70 |
+
... model='./tmp/checkpoint-28832',
|
71 |
+
... return_all_scores=True
|
72 |
+
... )
|
73 |
+
>>> pipe({
|
74 |
+
... 'text': 'Nasi adalah makanan pokok.', # Premise
|
75 |
+
... 'text_pair': 'Saya mau makan nasi goreng.' # Hypothesis
|
76 |
+
... })
|
77 |
+
[
|
78 |
+
{'label': 'entailment', 'score': 0.25495028495788574},
|
79 |
+
{'label': 'neutral', 'score': 0.40920916199684143},
|
80 |
+
{'label': 'contradiction', 'score': 0.33584052324295044}
|
81 |
+
]
|
82 |
+
>>> pipe({
|
83 |
+
... 'text': 'Python sering digunakan untuk web development dan AI research.',
|
84 |
+
... 'text_pair': 'AI research biasanya tidak menggunakan bahasa pemrograman Python.'
|
85 |
+
... })
|
86 |
+
[
|
87 |
+
{'label': 'entailment', 'score': 0.12508109211921692},
|
88 |
+
{'label': 'neutral', 'score': 0.22146646678447723},
|
89 |
+
{'label': 'contradiction', 'score': 0.653452455997467}
|
90 |
+
]
|
91 |
+
```
|
92 |
+
|
93 |
+
## Limitation and bias
|
94 |
+
|
95 |
+
This model inherit limitation/bias from it's parent model and 2 datasets used for fine-tuning. And just like most language model, this model is sensitive towards input change. Here's an example.
|
96 |
+
|
97 |
+
```py
|
98 |
+
>>> from transformers import pipeline
|
99 |
+
>>> pipe = pipeline(
|
100 |
+
... task='zero-shot-classification',
|
101 |
+
... model='./tmp/checkpoint-28832'
|
102 |
+
... )
|
103 |
+
>>> text = 'Resep sate ayam enak dan mudah.'
|
104 |
+
>>> candidate_labels = ['kuliner', 'olahraga']
|
105 |
+
>>> pipe(
|
106 |
+
... sequences=text,
|
107 |
+
... candidate_labels=candidate_labels,
|
108 |
+
... hypothesis_template='Kategori judul artikel ini adalah {}.',
|
109 |
+
... multi_label=False
|
110 |
+
... )
|
111 |
+
{
|
112 |
+
'sequence': 'Resep sate ayam enak dan mudah.',
|
113 |
+
'labels': ['kuliner', 'olahraga'],
|
114 |
+
'scores': [0.7711364030838013, 0.22886358201503754]
|
115 |
+
}
|
116 |
+
>>> pipe(
|
117 |
+
... sequences=text,
|
118 |
+
... candidate_labels=candidate_labels,
|
119 |
+
... hypothesis_template='Kelas kalimat ini {}.',
|
120 |
+
... multi_label=False
|
121 |
+
... )
|
122 |
+
{
|
123 |
+
'sequence': 'Resep sate ayam enak dan mudah.',
|
124 |
+
'labels': ['kuliner', 'olahraga'],
|
125 |
+
'scores': [0.7043636441230774, 0.295636385679245]
|
126 |
+
}
|
127 |
+
>>> pipe(
|
128 |
+
... sequences=text,
|
129 |
+
... candidate_labels=candidate_labels,
|
130 |
+
... hypothesis_template='{}.',
|
131 |
+
... multi_label=False
|
132 |
+
... )
|
133 |
+
{
|
134 |
+
'sequence': 'Resep sate ayam enak dan mudah.',
|
135 |
+
'labels': ['kuliner', 'olahraga'],
|
136 |
+
'scores': [0.5986711382865906, 0.4013288915157318]
|
137 |
+
}
|
138 |
+
|
139 |
+
```
|
140 |
+
|
141 |
+
## Training, evaluation and testing data
|
142 |
+
|
143 |
+
This model was finetuned with [IndoNLI](https://huggingface.co/datasets/indonli) and [multilingual-NLI-26lang-2mil7](https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7). Although `multilingual-NLI-26lang-2mil7` dataset is machine-translated, this dataset slightly improve result of NLI benchmark and extensively improve result of ZSC benchmark. Both evaluation and testing data is only based on IndoNLI dataset.
|
144 |
+
|
145 |
+
## Training Procedure
|
146 |
+
|
147 |
+
The model was finetuned on single RTX 3060 with 16 epoch/28832 steps with accumulated batch size 64. AdamW optimizer is used with LR 1e-4, weight decay 0.05, learning rate warmup for first 6% steps (1730 steps) and linear decay of the learning rate afterwards. Take note while model weight on epoch 9 has lowest loss/highest accuracy, it has slightly lower performance on ZSC benchmark. Additional information can be seen on Tensorboard training logs.
|
148 |
+
|
149 |
+
## Benchmark as NLI model
|
150 |
+
|
151 |
+
Both benchmark show result of 2 different model as additional comparison. Additional benchmark using IndoNLI dataset is available on it's paper [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://aclanthology.org/2021.emnlp-main.821/).
|
152 |
+
|
153 |
+
| Model | bigbird-small-indonesian-nli | xlm-roberta-large-xnli | mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 |
|
154 |
+
| ------------------------------------------ | ---------------------------- | ---------------------- | -------------------------------------------- |
|
155 |
+
| Parameter count | 30.6M | 559.9M | 278.8M |
|
156 |
+
| Multilingual | | V | V |
|
157 |
+
| Finetuned on IndoNLI | V | | V |
|
158 |
+
| Finetuned on multilingual-NLI-26lang-2mil7 | V | | |
|
159 |
+
| Test (Lay) | 0.6888 | 0.2226 | 0.8151 |
|
160 |
+
| Test (Expert) | 0.5734 | 0.3505 | 0.7775 |
|
161 |
+
|
162 |
+
## Benchmark as ZSC model
|
163 |
+
|
164 |
+
[Indonesian-Twitter-Emotion-Dataset](https://github.com/meisaputri21/Indonesian-Twitter-Emotion-Dataset/) is used to perform ZSC benchmark. This benchmark include 4 different parameter which affect performance of each model differently. Hypothesis template for this benchmark is `Kalimat ini mengekspresikan perasaan {}.` and `{}.`. Take note F1 score measurement only calculate label with highest probability.
|
165 |
+
|
166 |
+
| Model | Multi-label | Use template | F1 Score |
|
167 |
+
| -------------------------------------------- | ----------- | ------------ | ------------ |
|
168 |
+
| mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 | V | V | 0.3574 |
|
169 |
+
| | V | | 0.3654 |
|
170 |
+
| | | V | 0.3985 |
|
171 |
+
| | | | _0.4160_ |
|
172 |
+
| xlm-roberta-large-xnli | V | V | _**0.6292**_ |
|
173 |
+
| | V | | 0.5596 |
|
174 |
+
| | | V | 0.5737 |
|
175 |
+
| | | | 0.5433 |
|
176 |
+
| bigbird-small-indonesian-nli | V | V | 0.5324 |
|
177 |
+
| | V | | _0.5499_ |
|
178 |
+
| | | V | 0.5269 |
|
179 |
+
| | | | 0.5228 |
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 122439617
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7dd660ec1ad44f03e6b89f7601c445e24b9a8905863185b183591c21c3773412
|
3 |
size 122439617
|
runs/sanitzed_log/events.out.tfevents.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:51a519f546ec054b68522c20514f856fd3d560d7330699c5de4e1ade098eb864
|
3 |
+
size 93238
|