MostafaAhmed98's picture
Update README.md
b831a14 verified
|
raw
history blame
No virus
3.5 kB
metadata
language: ar
widget:
  - text: ما هو سبب غرق ضريح الشيخ محمد متولى الشعراوي ؟
    context: >-
      قال أنور عثمان  رئيس مركز ومدينة ميت غمر  إن سبب غرق ضريح الشيخ محمد متولى
      الشعراوى ومقابر ميت غمر عطل مفاجئ في ماكينة رفع الصرف الصحي ما أدى إلى غرق
      المقابر.
  - text: ما العدد الذري للهيدروجين ؟
    context: >-
      الهيدروجين هو عنصر كيميائي عدده الذري 1 ، وهو غاز عديم الرائحة واللون وهو
      سريع الاشتعال
  - text: ما خواص الهيدروجين ؟
    context: >-
      الهيدروجين هو عنصر كيميائي عدده الذري 1 ، وهو غاز عديم الرائحة واللون وهو
      سريع الاشتعال

Great! Here is the updated model card including the details about using the TyDi QA dataset:


Model Card for Arabic Question Answering with MARBERTv2

Model Details

Model Name: MARBERTv2-finetuned-ar-tydiqa

Model Type: MARBERTv2 (Pre-trained on Arabic text and fine-tuned on Arabic question answering task)

Language: Arabic

Model Creator: Mostafa Ahmed

Contact Information: mostafa.ahmed00976@gmail.com

Model Version: 1.0

Overview

MARBERTv2-finetuned-ar-tydiqa is a fine-tuned version of the MARBERTv2 model specifically designed for Arabic question answering tasks. The model has been trained to understand and generate accurate responses to questions posed in Arabic, making it suitable for various applications such as chatbots, virtual assistants, and customer support in Arabic-speaking regions.

Intended Use

The model is intended for use in:

  • Arabic question answering systems
  • Chatbots and virtual assistants
  • Educational tools and platforms
  • Customer support systems

Training Data

The model was fine-tuned on the Arabic portion of the TyDi QA multilingual dataset, which is a benchmark dataset for question answering tasks across multiple languages. The dataset was filtered to include only Arabic examples to ensure the model's proficiency in handling Arabic QA tasks.

Data Sources:

  • TyDi QA: A multilingual question answering dataset.

Training Procedure

The model was trained using the Hugging Face transformers library. The training process involved:

  • Preprocessing the TyDi QA dataset to filter and format Arabic question-answer pairs.
  • Fine-tuning the pre-trained MarBertv2 model on the Arabic QA dataset.

How to Use

You can load and use the model with the Hugging Face transformers library as follows:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("MostafaAhmed98/MARBERTv2-finetuned-ar-tydiqa")
model = AutoModelForQuestionAnswering.from_pretrained("MostafaAhmed98/MARBERTv2-finetuned-ar-tydiqa")

# Example usage
question = "ما هي عاصمة مصر؟"
context = "عاصمة مصر هي القاهرة."

inputs = tokenizer(question, context, return_tensors="pt")
outputs = model(**inputs)

# Extract answer
answer_start = outputs.start_logits.argmax()
answer_end = outputs.end_logits.argmax() + 1
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs.input_ids[0][answer_start:answer_end]))

print(answer)  # Expected output: "القاهرة"