ilos-vigil
/

bigbird-small-indonesian-nli

@@ -21,4 +21,159 @@ widget:
 # Indonesian small BigBird model NLI
-This commit contain model weight from epoch 6 which has lowest loss/highest accuracy.

 # Indonesian small BigBird model NLI
+## Source Code
+Source code to create this model and perform benchmark is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
+## Model Description
+This model is based on [bigbird-small-indonesian](https://huggingface.co/ilos-vigil/bigbird-small-indonesian) and was finetuned on 2 datasets. It is intended to be used for zero-shot text classification.
+## How to use
+> Inference for ZSC (Zero Shot Classification) task
+```py
+>>> pipe = pipeline(
+...     task='zero-shot-classification',
+...     model='./tmp/checkpoint-28832'
+... )
+>>> pipe(
+...     sequences='Fakta nomor 7 akan membuat ada terkejut',
+...     candidate_labels=['clickbait', 'bukan clickbait'],
+...     hypothesis_template='Judul video ini {}.',
+...     multi_label=False
+... )
+{
+ 'sequence': 'Fakta nomor 7 akan membuat ada terkejut',
+ 'labels': ['clickbait', 'bukan clickbait'],
+ 'scores': [0.6102734804153442, 0.38972654938697815]
+}
+>>> pipe(
+...     sequences='Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
+...     candidate_labels=['teknologi', 'olahraga', 'bisnis', 'politik', 'kesehatan', 'kuliner'],
+...     hypothesis_template='Kategori berita ini adalah {}.',
+...     multi_label=True
+... )
+{
+ 'sequence': 'Samsung tuntut balik Apple dengan alasan hak paten teknologi.',
+ 'labels': ['politik', 'teknologi', 'kesehatan', 'bisnis', 'olahraga', 'kuliner'],
+ 'scores': [0.7390161752700806, 0.6657379269599915, 0.4459509551525116, 0.38407933712005615, 0.3679264783859253, 0.14181996881961823]
+}
+```
+> Inference for NLI (Natural Language Inference) task
+```py
+>>> pipe = pipeline(
+...     task='text-classification',
+...     model='./tmp/checkpoint-28832',
+...     return_all_scores=True
+... )
+>>> pipe({
+...     'text': 'Nasi adalah makanan pokok.',  # Premise
+...     'text_pair': 'Saya mau makan nasi goreng.'  # Hypothesis
+... })
+[
+ {'label': 'entailment', 'score': 0.25495028495788574},
+ {'label': 'neutral', 'score': 0.40920916199684143},
+ {'label': 'contradiction', 'score': 0.33584052324295044}
+]
+>>> pipe({
+...     'text': 'Python sering digunakan untuk web development dan AI research.',
+...     'text_pair': 'AI research biasanya tidak menggunakan bahasa pemrograman Python.'
+... })
+[
+ {'label': 'entailment', 'score': 0.12508109211921692},
+ {'label': 'neutral', 'score': 0.22146646678447723},
+ {'label': 'contradiction', 'score': 0.653452455997467}
+]
+```
+## Limitation and bias
+This model inherit limitation/bias from it's parent model and 2 datasets used for fine-tuning. And just like most language model, this model is sensitive towards input change. Here's an example.
+```py
+>>> from transformers import pipeline
+>>> pipe = pipeline(
+...     task='zero-shot-classification',
+...     model='./tmp/checkpoint-28832'
+... )
+>>> text = 'Resep sate ayam enak dan mudah.'
+>>> candidate_labels = ['kuliner', 'olahraga']
+>>> pipe(
+...     sequences=text,
+...     candidate_labels=candidate_labels,
+...     hypothesis_template='Kategori judul artikel ini adalah {}.',
+...     multi_label=False
+... )
+{
+ 'sequence': 'Resep sate ayam enak dan mudah.',
+ 'labels': ['kuliner', 'olahraga'],
+ 'scores': [0.7711364030838013, 0.22886358201503754]
+}
+>>> pipe(
+...     sequences=text,
+...     candidate_labels=candidate_labels,
+...     hypothesis_template='Kelas kalimat ini {}.',
+...     multi_label=False
+... )
+{
+ 'sequence': 'Resep sate ayam enak dan mudah.',
+ 'labels': ['kuliner', 'olahraga'],
+ 'scores': [0.7043636441230774, 0.295636385679245]
+}
+>>> pipe(
+...     sequences=text,
+...     candidate_labels=candidate_labels,
+...     hypothesis_template='{}.',
+...     multi_label=False
+... )
+{
+ 'sequence': 'Resep sate ayam enak dan mudah.',
+ 'labels': ['kuliner', 'olahraga'],
+ 'scores': [0.5986711382865906, 0.4013288915157318]
+}
+```
+## Training, evaluation and testing data
+This model was finetuned with [IndoNLI](https://huggingface.co/datasets/indonli) and [multilingual-NLI-26lang-2mil7](https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7). Although `multilingual-NLI-26lang-2mil7` dataset is machine-translated, this dataset slightly improve result of NLI benchmark and extensively improve result of ZSC benchmark. Both evaluation and testing data is only based on IndoNLI dataset.
+## Training Procedure
+The model was finetuned on single RTX 3060 with 16 epoch/28832 steps with accumulated batch size 64. AdamW optimizer is used with LR 1e-4, weight decay 0.05, learning rate warmup for first 6% steps (1730 steps) and linear decay of the learning rate afterwards. Take note while model weight on epoch 9 has lowest loss/highest accuracy, it has slightly lower performance on ZSC benchmark. Additional information can be seen on Tensorboard training logs.
+## Benchmark as NLI model
+Both benchmark show result of 2 different model as additional comparison. Additional benchmark using IndoNLI dataset is available on it's paper [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://aclanthology.org/2021.emnlp-main.821/).
+| Model                                      | bigbird-small-indonesian-nli | xlm-roberta-large-xnli | mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 |
+| ------------------------------------------ | ---------------------------- | ---------------------- | -------------------------------------------- |
+| Parameter count                            | 30.6M                        | 559.9M                 | 278.8M                                       |
+| Multilingual                               |                              | V                      | V                                            |
+| Finetuned on IndoNLI                       | V                            |                        | V                                            |
+| Finetuned on multilingual-NLI-26lang-2mil7 | V                            |                        |                                              |
+| Test (Lay)                                 | 0.6888                       | 0.2226                 | 0.8151                                       |
+| Test (Expert)                              | 0.5734                       | 0.3505                 | 0.7775                                       |
+## Benchmark as ZSC model
+[Indonesian-Twitter-Emotion-Dataset](https://github.com/meisaputri21/Indonesian-Twitter-Emotion-Dataset/) is used to perform ZSC benchmark. This benchmark include 4 different parameter which affect performance of each model differently. Hypothesis template for this benchmark is `Kalimat ini mengekspresikan perasaan {}.` and `{}.`. Take note F1 score measurement only calculate label with highest probability.
+| Model                                        | Multi-label | Use template | F1 Score     |
+| -------------------------------------------- | ----------- | ------------ | ------------ |
+| mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 | V           | V            | 0.3574       |
+|                                              | V           |              | 0.3654       |
+|                                              |             | V            | 0.3985       |
+|                                              |             |              | _0.4160_     |
+| xlm-roberta-large-xnli                       | V           | V            | _**0.6292**_ |
+|                                              | V           |              | 0.5596       |
+|                                              |             | V            | 0.5737       |
+|                                              |             |              | 0.5433       |
+| bigbird-small-indonesian-nli                 | V           | V            | 0.5324       |
+|                                              | V           |              | _0.5499_     |
+|                                              |             | V            | 0.5269       |
+|                                              |             |              | 0.5228       |

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99d7283876c3bfeeb0248e9b29019683c61e8852e325ded3937f8cba2c4d115c
 size 122439617

 version https://git-lfs.github.com/spec/v1
+oid sha256:7dd660ec1ad44f03e6b89f7601c445e24b9a8905863185b183591c21c3773412
 size 122439617

runs/sanitzed_log/events.out.tfevents.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51a519f546ec054b68522c20514f856fd3d560d7330699c5de4e1ade098eb864
+size 93238