Model Card: Context-to-QA-Generation using GPT-2

Description

This model is designed to generate questions, answers, hints, and multiple-choice options based on a given input context. It uses a fine-tuned GPT-2 model that has been trained to perform the task of generating questions and related content for a provided context. The model is trained to understand and follow the structure of providing questions, answers, hints, and multiple-choice options.

Intended Use

This model is intended to be used for generating questions, answers, hints, and multiple-choice options based on a given context. It can be used for educational purposes, exam preparation, content creation, and other applications where automatic question generation is needed.

Limitations

The quality of generated questions, answers, and hints depends on the quality and complexity of the input context. Simpler contexts are more likely to yield accurate and coherent outputs. The model may sometimes generate incorrect or nonsensical content, especially when the input context is complex or ambiguous. The model's output may be influenced by biases present in the training data, potentially leading to biased or inappropriate content generation.


#!pip install transformers
from transformers import AutoTokenizer, GPT2LMHeadModel
checkpoint = "AbdelrahmanFakhry/finetuned-gpt2-multi-QA-Generation"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = GPT2LMHeadModel.from_pretrained(checkpoint)
# Retrieve a test question from the test dataset
#test_text = test_dataset.to_dict()['question'][3]


# test_text should be like that 
test_text = '''Below is input text, the task is to generate questions from input text and multiple answers for 
each question and provide hint and correct answer for each question.\n\n### Input:\n<hl> Local intercellular
communication is the province of the paracrine , also called a paracrine factor , which is a chemical that
induces a response in neighboring cells . <hl> Although paracrines may enter the bloodstream , their concentration 
is generally too low to elicit a response from distant tissues . A familiar example to those with asthma is histamine , 
a paracrine that is released by immune cells in the bronchial tree . Histamine causes the smooth muscle cells of the bronchi 
to constrict , narrowing the airways . Another example is the neurotransmitters of the nervous system , which act only locally 
within the synaptic cleft .\n\n### Response: '''



def inference(text, model, tokenizer, max_input_tokens=3000, max_output_tokens=500):
    """
    Generate text continuation based on the given input text using a pretrained model.

    Args:
        text (str): The input text for which to generate a continuation.
        model (PreTrainedModel): The pretrained model to use for text generation.
        tokenizer (PreTrainedTokenizer): The tokenizer used to preprocess the input and decode the output.
        max_input_tokens (int): Maximum number of tokens allowed for the input text.
        max_output_tokens (int): Maximum number of tokens in the generated output.

    Returns:
        generated_text_answer (str): The generated text continuation.
    """
    # Tokenize the input text
    input_ids = tokenizer.encode(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=max_input_tokens
    )

    # Generate text continuation
    device = model.device
    generated_tokens_with_prompt = model.generate(
        input_ids=input_ids.to(device),
        max_length=max_output_tokens
    )

    # Decode the generated tokens into text
    generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)

    # Extract the generated text continuation without the input prompt
    generated_text_answer = generated_text_with_prompt[0][len(text):]
    generated_text_answer = generated_text_answer.lstrip(" '][{").rstrip(" '][{}")
    return generated_text_answer

generated_answer = inference(test_text, model, tokenizer)
#Generated Answer should be look like that:
'''
"Choices': ['paracrine factor', 'paracrine factor', 'paracrine factor II', 'paracrine factor III'], 
'Question': 'Which of the following is not a paracrine factor?',
'answer': 'paracrine factor II', 
'hint': 'Local intercellular communication is the province of the paracrine, also called a paracrine factor,
which is a chemical that induces a response in neighboring cells."
'''
print('Generated Answer:')
print(generated_answer)

Acknowledgments

This model is built upon the GPT-2 architecture and fine-tuned using a custom dataset for the specific task of generating questions, answers, hints, and choices.

Disclaimer

This model's performance may vary depending on the input context and task requirements. It is recommended to review and edit the generated content before using it in critical applications. The model's limitations and biases should also be considered when interpreting its outputs.