metadata

license: apache-2.0
language:
  - en
tags:
  - generated_from_trainer
datasets:
  - glue
metrics:
  - accuracy
model-index:
  - name: yujiepan/bert-base-uncased-sst2-int8-unstructured80
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: GLUE SST2
          type: glue
          config: sst2
          split: validation
          args: sst2
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.91284
base_model:
  - google-bert/bert-base-uncased
base_model_relation: quantized

bert-base-uncased-sst2-unstructured80-int8-ov

Model creator: Google
Original model: google-bert/bert-base-uncased

Description

This model conducts unstructured magnitude pruning, quantization and distillation at the same time on google-bert/bert-base-uncased when finetuning on the GLUE SST2 dataset. It achieves the following results on the evaluation set:

Torch accuracy: 0.9128
OpenVINO IR accuracy: 0.9128
Sparsity in transformer block linear layers: 0.80

The model was converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT8 by NNCF.

Compatibility

The provided OpenVINO™ IR model is compatible with:

OpenVINO version 2024.3.0 and higher
Optimum Intel 1.19.0 and higher

Optimization Parameters

Optimization was performed using nncf with the following nncf_config.json file:

[
    {
        "algorithm": "quantization",
        "preset": "mixed",
        "overflow_fix": "disable",
        "initializer": {
            "range": {
                "num_init_samples": 300,
                "type": "mean_min_max"
            },
            "batchnorm_adaptation": {
                "num_bn_adaptation_samples": 0
            }
        },
        "scope_overrides": {
            "activations": {
                "{re}.*matmul_0": {
                    "mode": "symmetric"
                }
            }
        },
        "ignored_scopes": [
            "{re}.*Embeddings.*",
            "{re}.*__add___[0-1]",
            "{re}.*layer_norm_0",
            "{re}.*matmul_1",
            "{re}.*__truediv__*"
        ]
    },
    {
        "algorithm": "magnitude_sparsity",
        "ignored_scopes": [
            "{re}.*NNCFEmbedding.*",
            "{re}.*LayerNorm.*",
            "{re}.*pooler.*",
            "{re}.*classifier.*"
        ],
        "sparsity_init": 0.0,
        "params": {
            "power": 3,
            "schedule": "polynomial",
            "sparsity_freeze_epoch": 10,
            "sparsity_target": 0.8,
            "sparsity_target_epoch": 9,
            "steps_per_epoch": 2105,
            "update_per_optimizer_step": true
        }
    }
]

For more information on optimization, check the OpenVINO model optimization guide.

Running Model Training

Install required packages:

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install optimum[openvino,nncf]
pip install datasets sentencepiece scipy scikit-learn protobuf evaluate
pip install wandb # optional

Run model training:

NNCFCFG=/path/to/nncf_config.json
python run_glue.py \
  --lr_scheduler_type cosine_with_restarts \
  --cosine_lr_scheduler_cycles 11 6 \
  --record_best_model_after_epoch 9 \
  --load_best_model_at_end True \
  --metric_for_best_model accuracy \
  --model_name_or_path textattack/bert-base-uncased-SST-2 \
  --teacher_model_or_path yoshitomo-matsubara/bert-large-uncased-sst2 \
  --distillation_temperature 2 \
  --task_name sst2 \
  --nncf_compression_config $NNCFCFG \
  --distillation_weight 0.95 \
  --output_dir /tmp/bert-base-uncased-sst2-int8-unstructured80 \
  --overwrite_output_dir \
  --run_name bert-base-uncased-sst2-int8-unstructured80 \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 5e-05 \
  --optim adamw_torch \
  --num_train_epochs 17 \
  --logging_steps 1 \
  --evaluation_strategy steps \
  --eval_steps 250 \
  --save_strategy steps \
  --save_steps 250 \
  --save_total_limit 1 \
  --fp16 \
  --seed 1

For more details, refer to the training configuration and script.

Usage examples

OpenVINO notebooks:
- Accelerate Inference of Sparse Transformer Models with OpenVINO™ and 4th Gen Intel® Xeon® Scalable Processors

Limitations

Check the original model card for limitations.

Legal information

The original model is distributed under apache-2.0 license. More details can be found in google-bert/bert-base-uncased model card.

Disclaimer

Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.