language:
- en
license: mit
library_name: transformers
tags:
- deberta
- deberta-v3
- question-answering
- squad
- squad_v2
- lora
- peft
datasets:
- squad_v2
- squad
base_model: microsoft/deberta-v3-large
model-index:
- name: sjrhuschlee/deberta-v3-large-squad2
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: squad_v2
type: squad_v2
config: squad_v2
split: validation
metrics:
- type: exact_match
value: 87.956
name: Exact Match
- type: f1
value: 90.781
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squad
type: squad
config: plain_text
split: validation
metrics:
- type: exact_match
value: 89.29
name: Exact Match
- type: f1
value: 95.008
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: adversarial_qa
type: adversarial_qa
config: adversarialQA
split: validation
metrics:
- type: exact_match
value: 41.4
name: Exact Match
- type: f1
value: 55.676
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squad_adversarial
type: squad_adversarial
config: AddOneSent
split: validation
metrics:
- type: exact_match
value: 83.66
name: Exact Match
- type: f1
value: 89.451
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squadshifts amazon
type: squadshifts
config: amazon
split: test
metrics:
- type: exact_match
value: 74.487
name: Exact Match
- type: f1
value: 87.745
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squadshifts new_wiki
type: squadshifts
config: new_wiki
split: test
metrics:
- type: exact_match
value: 84.782
name: Exact Match
- type: f1
value: 93.114
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squadshifts nyt
type: squadshifts
config: nyt
split: test
metrics:
- type: exact_match
value: 85.643
name: Exact Match
- type: f1
value: 93.258
name: F1
- task:
type: question-answering
name: Question Answering
dataset:
name: squadshifts reddit
type: squadshifts
config: reddit
split: test
metrics:
- type: exact_match
value: 74.702
name: Exact Match
- type: f1
value: 85.861
name: F1
deberta-v3-large for Extractive QA
This is the deberta-v3-large model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.
This model was trained using LoRA available through the PEFT library.
Overview
Language model: deberta-v3-large
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Infrastructure: 1x NVIDIA 3070
Model Usage
Using Transformers
This uses the merged weights (base model weights + LoRA weights) to allow for simple use in Transformers pipelines. It has the same performance as using the weights separately when using the PEFT library.
import torch
from transformers import(
AutoModelForQuestionAnswering,
AutoTokenizer,
pipeline
)
model_name = "sjrhuschlee/deberta-v3-large-squad2"
# a) Using pipelines
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
qa_input = {
'question': 'Where do I live?',
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
# {'score': 0.984, 'start': 30, 'end': 37, 'answer': ' London'}
# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
question = 'Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
start_scores, end_scores = model(
encoding["input_ids"],
attention_mask=encoding["attention_mask"],
return_dict=False
)
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
answer_tokens = all_tokens[torch.argmax(start_scores):torch.argmax(end_scores) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
# 'London'
Metrics
# Squad v2
{
"eval_HasAns_exact": 84.83468286099865,
"eval_HasAns_f1": 90.48374860633226,
"eval_HasAns_total": 5928,
"eval_NoAns_exact": 91.0681244743482,
"eval_NoAns_f1": 91.0681244743482,
"eval_NoAns_total": 5945,
"eval_best_exact": 87.95586625115808,
"eval_best_exact_thresh": 0.0,
"eval_best_f1": 90.77635490089573,
"eval_best_f1_thresh": 0.0,
"eval_exact": 87.95586625115808,
"eval_f1": 90.77635490089592,
"eval_runtime": 623.1333,
"eval_samples": 11951,
"eval_samples_per_second": 19.179,
"eval_steps_per_second": 0.799,
"eval_total": 11873
}
# Squad
{
"eval_exact_match": 89.29044465468307,
"eval_f1": 94.9846365606959,
"eval_runtime": 553.7132,
"eval_samples": 10618,
"eval_samples_per_second": 19.176,
"eval_steps_per_second": 0.8
}
Using with Peft
NOTE: This requires code in the PR https://github.com/huggingface/peft/pull/473 for the PEFT library.
#!pip install peft
from peft import LoraConfig, PeftModelForQuestionAnswering
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
model_name = "sjrhuschlee/deberta-v3-large-squad2"
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 24
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 1
- total_train_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4.0
LoRA Config
{
"base_model_name_or_path": "microsoft/deberta-v3-large",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"lora_alpha": 32,
"lora_dropout": 0.1,
"modules_to_save": ["qa_outputs"],
"peft_type": "LORA",
"r": 8,
"target_modules": [
"query_proj",
"key_proj",
"value_proj",
"dense"
],
"task_type": "QUESTION_ANS"
}
Framework versions
- Transformers 4.30.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3