Fine-Tuned Question Answering Model - SK_Morph_BLM (SK-QuAD Dataset)

Model Overview

This model is a fine-tuned version of the SK_Morph_BLM model for extractive question answering tasks. The fine-tuning was conducted using the SK-QuAD dataset, which is the first manually annotated dataset for Slovak, containing over 91,000 questions and answers. This dataset includes both clearly answerable questions and unanswerable ones, as well as plausible but probably incorrect answers.

Dataset Details

For the purposes of fine-tuning, we focused solely on the records with clearly answerable questions. The original dataset was divided into training and test sets; however, we combined these into a single dataset for our research. Some records had extensive contexts that, when combined with the question, exceeded the context window size of our model. We therefore excluded all records where the combined length of the context and question exceeded 1,300 characters, which corresponds to approximately 256 tokens. This reduction resulted in a final dataset size of 54,319 question-answer pairs. To ensure robust evaluation, we applied stratified 10-fold cross-validation across the dataset. This approach allowed us to rigorously assess the model's performance and generalize well across different subsets of the data.

Fine-Tuning Hyperparameters

The following hyperparameters were used during the fine-tuning process:

Learning Rate: 5e-05
Training Batch Size: 64 sequences
Evaluation Batch Size: 64 sequences
Seed: 42
Optimizer: Adam (default)
Number of Epochs: 5

Evaluation Metrics

The model performance was assessed using both token-level and text-level metrics:

Token-Level Metrics:
- Precision
- Recall
- F1-Score: Measures how accurately the model identified the correct answer tokens within the context.
Text-Level Metrics:
- Levenshtein Distance: Evaluates the similarity between the predicted and correct answers.
- Exact Match: Measures the percentage of answers where the predicted answer exactly matched the correct one.

Model Performance

The model achieved the following median performance metrics:

F1-Score: 0.6768
Levenshtein Distance: 0.6535
Exact Match: 0.3791

Model Usage

This model is suitable for extractive question answering tasks in Slovak text, particularly for applications that require the identification of precise answers from a given context.

Example Usage

Below is an example of how to use the fine-tuned SK_Morph_BLM-qa model in a Python script:

import torch
from torch.nn.functional import softmax
from transformers import RobertaForQuestionAnswering
from huggingface_hub import snapshot_download
import sys
import json

class QuestionAnsweringModel:
    def __init__(self, model, tokenizer):
        self.model = RobertaForQuestionAnswering.from_pretrained(model)
        
        repo_path = snapshot_download(repo_id=tokenizer)
        sys.path.append(repo_path)

        from SKMT_lib_v2.SKMT_BPE import SKMorfoTokenizer
        self.tokenizer = SKMorfoTokenizer()
        
    def decode(self, tensor):
        result = "".join(self.tokenizer.convert_list_ids_to_tokens(tensor.tolist()))
        result = result.replace("Ġ", " ").strip()
        return result
    
    def predict(self, context, question):
        inputs = self.tokenizer.tokenizeQA(context, question, max_length=256, return_tensors="pt", return_subword=False)
        input_ids = inputs["input_ids"][0]
        
        outputs = self.model(**inputs)
        start_logits = outputs.start_logits
        end_logits = outputs.end_logits
        
        start_probs = softmax(start_logits, dim=1)
        end_probs = softmax(end_logits, dim=1)
        
        answer_start = torch.argmax(start_probs)
        answer_end = torch.argmax(end_probs) + 1

        answer = self.decode(input_ids[answer_start:answer_end])
        
        start_prob = start_probs[0, answer_start].item()
        end_prob = end_probs[0, answer_end - 1].item()
        
        return answer, start_prob, end_prob

# Instantiate the QA model with the specified tokenizer and model
qa_model = QuestionAnsweringModel(tokenizer="daviddrzik/SK_Morph_BLM", model="daviddrzik/SK_Morph_BLM-qa")

context = "Albert Einstein, narodený v roku 1879, je jedným z najvplyvnejších fyzikov všetkých čias. Vyvinul teóriu relativity, ktorá zmenila naše chápanie priestoru, času a gravitácie. Jeho slávna rovnica E = mc², ktorá vyjadruje vzťah medzi energiou a hmotou, je považovaná za jednu z najvýznamnejších rovníc vo fyzike. Einstein získal Nobelovu cenu za fyziku v roku 1921 za jeho prácu na fotoelektrickom jave, ktorý bol kľúčový pre rozvoj kvantovej mechaniky."
question = "V ktorom roku získal Albert Einstein Nobelovu cenu za fyziku?"

print("\nContext: " + context + "\n")
print("Question: " + question + "\n")

# Predict the answer
answer = qa_model.predict(context, question)
print(f"Predicted answer: {answer}")

Example Output Here is the output when running the above example:

Context: Albert Einstein, narodený v roku 1879, je jedným z najvplyvnejších fyzikov všetkých čias.
Vyvinul teóriu relativity, ktorá zmenila naše chápanie priestoru, času a gravitácie.
Jeho slávna rovnica E = mc², ktorá vyjadruje vzťah medzi energiou a hmotou,
je považovaná za jednu z najvýznamnejších rovníc vo fyzike.
Einstein získal Nobelovu cenu za fyziku v roku 1921 za jeho prácu na fotoelektrickom jave,
ktorý bol kľúčový pre rozvoj kvantovej mechaniky.

Question: V ktorom roku získal Albert Einstein Nobelovu cenu za fyziku?

Predicted answer: ('v roku 1921', 0.7977392673492432, 0.9985119700431824)

daviddrzik
/

SK_Morph_BLM-qa