MostafaAhmed98
/

MARBERTv2-finetuned-ar-tydiqa

Question Answering

Inference Endpoints

Model card Files Files and versions Community

MARBERTv2-finetuned-ar-tydiqa / README.md

MostafaAhmed98's picture

Update README.md

183bf54 verified 5 months ago

|

3.34 kB

	---
	language: ar
	widget:
	- text: كم انتهت المبارة؟
	context: >-
	تسبب خطآن دفاعيان في هدفين، بمباراة ساخنة بين إنجلترا والدنمارك ضمن بطولة أوروبا، الخميس، انتهت بالتعادل 1-1.
	- text: ما العدد الذري للهيدروجين ؟
	context: >-
	الهيدروجين هو عنصر كيميائي عدده الذري 1 ، وهو غاز عديم الرائحة واللون وهو
	سريع الاشتعال
	- text: من هو أحمد مازن أحمد أسعد الشقيري؟
	context: >-
	أحمد مازن أحمد أسعد الشقيري (ولد في 6 يونيو 1973) إعلامي وكاتب سعودي ومقدم برامج تلفزيونية
	datasets:
	- google-research-datasets/tydiqa
	license: mit
	pipeline_tag: question-answering
	---

	---


	## Model Details

	Model Name: MARBERTv2-finetuned-ar-tydiqa

	Model Type: MARBERTv2 (Pre-trained on Arabic text and fine-tuned on Arabic question answering task)

	Language: Arabic

	Model Creator: Mostafa Ahmed

	Contact Information: mostafa.ahmed00976@gmail.com

	Model Version: 1.0

	## Overview

	MARBERTv2-finetuned-ar-tydiqa is a fine-tuned version of the MARBERTv2 model specifically designed for Arabic question answering tasks. The model has been trained to understand and generate accurate responses to questions posed in Arabic, making it suitable for various applications such as chatbots, virtual assistants, and customer support in Arabic-speaking regions.

	## Intended Use

	The model is intended for use in:

	- Arabic question answering systems
	- Chatbots and virtual assistants
	- Educational tools and platforms
	- Customer support systems

	## Training Data

	The model was fine-tuned on the Arabic portion of the TyDi QA multilingual dataset, which is a benchmark dataset for question answering tasks across multiple languages. The dataset was filtered to include only Arabic examples to ensure the model's proficiency in handling Arabic QA tasks.

	Data Sources:

	- [TyDi QA](https://huggingface.co/datasets/google-research-datasets/tydiqa): A multilingual question answering dataset.

	## Training Procedure

	The model was trained using the Hugging Face `transformers` library. The training process involved:

	- Preprocessing the TyDi QA dataset to filter and format Arabic question-answer pairs.
	- Fine-tuning the pre-trained MarBertv2 model on the Arabic QA dataset.

	## How to Use

	You can load and use the model with the Hugging Face `transformers` library as follows:

	```python
	from transformers import AutoTokenizer, AutoModelForQuestionAnswering

	tokenizer = AutoTokenizer.from_pretrained("MostafaAhmed98/MARBERTv2-finetuned-ar-tydiqa")
	model = AutoModelForQuestionAnswering.from_pretrained("MostafaAhmed98/MARBERTv2-finetuned-ar-tydiqa")

	# Example usage
	question = "ما هي عاصمة مصر؟"
	context = "عاصمة مصر هي القاهرة."

	inputs = tokenizer(question, context, return_tensors="pt")
	outputs = model(**inputs)

	# Extract answer
	answer_start = outputs.start_logits.argmax()
	answer_end = outputs.end_logits.argmax() + 1
	answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs.input_ids[0][answer_start:answer_end]))

	print(answer) # Expected output: "القاهرة"
	```

	---