yujiepan
/

bert-base-uncased-sst2-int8-unstructured80-30epoch

+---
+language:
+- en
+tags:
+- generated_from_trainer
+datasets:
+- glue
+metrics:
+- accuracy
+model-index:
+- name: yujiepan/bert-base-uncased-sst2-int8-unstructured80-30epoch
+  results:
+  - task:
+      name: Text Classification
+      type: text-classification
+    dataset:
+      name: GLUE SST2
+      type: glue
+      config: sst2
+      split: validation
+      args: sst2
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.9139908256880734
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Joint magnitude pruning, quantization and distillation on BERT-base/SST-2
+This model conducts unstructured magnitude pruning, quantization and distillation at the same time when finetuning on the GLUE SST2 dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.41159623861312866
+- Accuracy: 0.9139908256880734
+## Setup
+```
+conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
+git clone https://github.com/yujiepan-work/optimum-intel.git
+git checkout -b "magnitude-pruning" 01927af543eaea8678671bf8f4eb78fdb29f8930
+cd optimum-intel
+pip install -e .[openvino,nncf]
+cd examples/openvino/text-classification/
+pip install -r requirements.txt
+pip install wandb # optional
+```
+## NNCF config
+Create a json file for NNCF compression configuration:
+```
+[
+    {
+        "algorithm": "quantization",
+        "preset": "mixed",
+        "overflow_fix": "disable",
+        "initializer": {
+            "range": {
+                "num_init_samples": 300,
+                "type": "mean_min_max"
+            },
+            "batchnorm_adaptation": {
+                "num_bn_adaptation_samples": 0
+            }
+        },
+        "scope_overrides": {
+            "activations": {
+                "{re}.*matmul_0": {
+                    "mode": "symmetric"
+                }
+            }
+        },
+        "ignored_scopes": [
+            "{re}.*Embeddings.*",
+            "{re}.*__add___[0-1]",
+            "{re}.*layer_norm_0",
+            "{re}.*matmul_1",
+            "{re}.*__truediv__*"
+        ]
+    },
+    {
+        "algorithm": "magnitude_sparsity",
+        "ignored_scopes": [
+            "{re}.*NNCFEmbedding.*",
+            "{re}.*LayerNorm.*",
+            "{re}.*pooler.*",
+            "{re}.*classifier.*"
+        ],
+        "sparsity_init": 0.0,
+        "params": {
+            "power": 3,
+            "schedule": "polynomial",
+            "sparsity_freeze_epoch": 10,
+            "sparsity_target": 0.8,
+            "sparsity_target_epoch": 9,
+            "steps_per_epoch": 2105,
+            "update_per_optimizer_step": true
+        }
+    }
+]
+```
+## Run
+We use one card for training.
+```
+NNCFCFG=/path/to/nncf/config
+python run_glue.py \
+--lr_scheduler_type cosine_with_restarts \
+--cosine_cycle_ratios 8,6,4,4,4,4 \
+--cosine_cycle_decays 1,1,1,1,1,1 \
+--save_best_model_after_epoch -1 \
+--save_best_model_after_sparsity 0.7999 \
+--model_name_or_path textattack/bert-base-uncased-SST-2 \
+--teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
+--distillation_temperature 2 \
+--task_name sst2 \
+--nncf_compression_config $NNCFCFG \
+--distillation_weight 0.95 \
+--output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80-30epoch \
+--run_name bert-base-uncased-sst2-int8-unstructured80-30epoch \
+--overwrite_output_dir \
+--do_train \
+--do_eval \
+--max_seq_length 128 \
+--per_device_train_batch_size 32 \
+--per_device_eval_batch_size 32 \
+--learning_rate 5e-05 \
+--optim adamw_torch \
+--num_train_epochs 30 \
+--logging_steps 1 \
+--evaluation_strategy steps \
+--eval_steps 250 \
+--save_strategy steps \
+--save_steps 250 \
+--save_total_limit 1 \
+--fp16 \
+--seed 1
+```
+The best model is stored in the `best_model` folder.
+### Framework versions
+- Transformers 4.26.0
+- Pytorch 1.13.1+cu116
+- Datasets 2.8.0
+- Tokenizers 0.13.2
+For a full description of the environment, please refer to `pip-requirements.txt` and `conda-requirements.txt`.