roberta_qa_japanese

(Japanese caption : 日本語の (抽出型) 質問応答のモデル)

This model is a fine-tuned version of rinna/japanese-roberta-base (pre-trained RoBERTa model provided by rinna Co., Ltd.) trained for extractive question answering.

The model is fine-tuned on JaQuAD dataset provided by Skelter Labs, in which data is collected from Japanese Wikipedia articles and annotated by a human.

Intended uses

When running with a dedicated pipeline :

from transformers import pipeline

model_name = "tsmatz/roberta_qa_japanese"
qa_pipeline = pipeline(
    "question-answering",
    model=model_name,
    tokenizer=model_name)
result = qa_pipeline(
    question = "決勝トーナメントで日本に勝ったのはどこでしたか。",
    context = "日本は予選リーグで強豪のドイツとスペインに勝って決勝トーナメントに進んだが、クロアチアと対戦して敗れた。",
    align_to_words = False,
)
print(result)

When manually running through forward pass :

import torch
import numpy as np
from transformers import AutoModelForQuestionAnswering, AutoTokenizer

model_name = "tsmatz/roberta_qa_japanese"
model = (AutoModelForQuestionAnswering
         .from_pretrained(model_name))
tokenizer = AutoTokenizer.from_pretrained(model_name)

def inference_answer(question, context):
    question = question
    context = context
    test_feature = tokenizer(
        question,
        context,
        max_length=318,
    )
    with torch.no_grad():
        outputs = model(torch.tensor([test_feature["input_ids"]]))
    start_logits = outputs.start_logits.cpu().numpy()
    end_logits = outputs.end_logits.cpu().numpy()
    answer_ids = test_feature["input_ids"][np.argmax(start_logits):np.argmax(end_logits)+1]
    return "".join(tokenizer.batch_decode(answer_ids))

question = "決勝トーナメントで日本に勝ったのはどこでしたか。"
context = "日本は予選リーグで強豪のドイツとスペインに勝って決勝トーナメントに進んだが、クロアチアと対戦して敗れた。"
answer_pred = inference_answer(question, context)
print(answer_pred)

Training procedure

You can download the source code for fine-tuning from here.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
2.1293	0.13	150	1.0311
1.1965	0.26	300	0.6723
1.022	0.39	450	0.4838
0.9594	0.53	600	0.5174
0.9187	0.66	750	0.4671
0.8229	0.79	900	0.4650
0.71	0.92	1050	0.2648
0.5436	1.05	1200	0.2665
0.5045	1.19	1350	0.2686
0.5025	1.32	1500	0.2082
0.5213	1.45	1650	0.1715
0.4648	1.58	1800	0.1563
0.4698	1.71	1950	0.1488
0.4823	1.84	2100	0.1050
0.4482	1.97	2250	0.0821
0.2755	2.11	2400	0.0898
0.2834	2.24	2550	0.0964
0.2525	2.37	2700	0.0533
0.2606	2.5	2850	0.0561
0.2467	2.63	3000	0.0601
0.2799	2.77	3150	0.0562
0.2497	2.9	3300	0.0516

Framework versions

Transformers 4.23.1
Pytorch 1.12.1+cu102
Datasets 2.6.1
Tokenizers 0.13.1

tsmatz
/

roberta_qa_japanese

roberta_qa_japanese

Intended uses

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsmatz/roberta_qa_japanese

Dataset used to train tsmatz/roberta_qa_japanese

Spaces using tsmatz/roberta_qa_japanese 7

Evaluation results