metadata

language:
  - en
license: mit
library_name: transformers
tags:
  - deberta
  - deberta-v3
  - question-answering
  - squad
  - squad_v2
  - lora
  - peft
datasets:
  - squad_v2
  - squad
base_model: microsoft/deberta-v3-large
model-index:
  - name: sjrhuschlee/deberta-v3-large-squad2
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - type: exact_match
            value: 87.956
            name: Exact Match
          - type: f1
            value: 90.781
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad
          type: squad
          config: plain_text
          split: validation
        metrics:
          - type: exact_match
            value: 89.29
            name: Exact Match
          - type: f1
            value: 95.008
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: adversarial_qa
          type: adversarial_qa
          config: adversarialQA
          split: validation
        metrics:
          - type: exact_match
            value: 41.4
            name: Exact Match
          - type: f1
            value: 55.676
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_adversarial
          type: squad_adversarial
          config: AddOneSent
          split: validation
        metrics:
          - type: exact_match
            value: 83.66
            name: Exact Match
          - type: f1
            value: 89.451
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squadshifts amazon
          type: squadshifts
          config: amazon
          split: test
        metrics:
          - type: exact_match
            value: 74.487
            name: Exact Match
          - type: f1
            value: 87.745
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squadshifts new_wiki
          type: squadshifts
          config: new_wiki
          split: test
        metrics:
          - type: exact_match
            value: 84.782
            name: Exact Match
          - type: f1
            value: 93.114
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squadshifts nyt
          type: squadshifts
          config: nyt
          split: test
        metrics:
          - type: exact_match
            value: 85.643
            name: Exact Match
          - type: f1
            value: 93.258
            name: F1
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squadshifts reddit
          type: squadshifts
          config: reddit
          split: test
        metrics:
          - type: exact_match
            value: 74.702
            name: Exact Match
          - type: f1
            value: 85.861
            name: F1

deberta-v3-large for Extractive QA

This is the deberta-v3-large model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.

This model was trained using LoRA available through the PEFT library.

Overview

Language model: deberta-v3-large
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Infrastructure: 1x NVIDIA 3070

Model Usage

Using Transformers

This uses the merged weights (base model weights + LoRA weights) to allow for simple use in Transformers pipelines. It has the same performance as using the weights separately when using the PEFT library.

import torch
from transformers import(
  AutoModelForQuestionAnswering,
  AutoTokenizer,
  pipeline
)
model_name = "sjrhuschlee/deberta-v3-large-squad2"

# a) Using pipelines
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input = {
'question': 'Where do I live?',
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
# {'score': 0.984, 'start': 30, 'end': 37, 'answer': ' London'}

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = 'Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
start_scores, end_scores = model(
  encoding["input_ids"],
  attention_mask=encoding["attention_mask"],
  return_dict=False
)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
answer_tokens = all_tokens[torch.argmax(start_scores):torch.argmax(end_scores) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
# 'London'

Metrics

# Squad v2
{
    "eval_HasAns_exact": 84.83468286099865,
    "eval_HasAns_f1": 90.48374860633226,
    "eval_HasAns_total": 5928,
    "eval_NoAns_exact": 91.0681244743482,
    "eval_NoAns_f1": 91.0681244743482,
    "eval_NoAns_total": 5945,
    "eval_best_exact": 87.95586625115808,
    "eval_best_exact_thresh": 0.0,
    "eval_best_f1": 90.77635490089573,
    "eval_best_f1_thresh": 0.0,
    "eval_exact": 87.95586625115808,
    "eval_f1": 90.77635490089592,
    "eval_runtime": 623.1333,
    "eval_samples": 11951,
    "eval_samples_per_second": 19.179,
    "eval_steps_per_second": 0.799,
    "eval_total": 11873
}

# Squad
{
    "eval_exact_match": 89.29044465468307,
    "eval_f1": 94.9846365606959,
    "eval_runtime": 553.7132,
    "eval_samples": 10618,
    "eval_samples_per_second": 19.176,
    "eval_steps_per_second": 0.8
}

Using with Peft

NOTE: This requires code in the PR https://github.com/huggingface/peft/pull/473 for the PEFT library.

#!pip install peft

from peft import LoraConfig, PeftModelForQuestionAnswering
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
model_name = "sjrhuschlee/deberta-v3-large-squad2"

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 24
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 1
total_train_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4.0

LoRA Config

{
  "base_model_name_or_path": "microsoft/deberta-v3-large",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": ["qa_outputs"],
  "peft_type": "LORA",
  "r": 8,
  "target_modules": [
    "query_proj",
    "key_proj",
    "value_proj",
    "dense"
  ],
  "task_type": "QUESTION_ANS"
}

Framework versions

Transformers 4.30.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3