llama3.2-1B-persianQAV2.0

Model description

Persian-QA-LLaMA is a fine-tuned version of LLaMA (1B parameters) optimized for Persian language question answering tasks. The model uses Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning, making it a lightweight adaptation that maintains the base model's capabilities while adding Persian language understanding. This adaptation specifically targets question-answering capabilities, allowing the model to understand and respond to questions in Persian while maintaining computational efficiency. This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on azizmatin/question_answering dataset.

Intended uses & limitations

Intended Uses

Persian language question answering Information extraction from Persian texts Educational support and tutoring systems Research and academic applications

Limitations

Performance may vary on domain-specific questions Not optimized for other NLP tasks beyond question answering May struggle with highly colloquial or dialectal Persian Limited by the training dataset's coverage and diversity Should not be used for generating factual content without verificatio

Training and evaluation data

The model was trained on a modified version of a Persian question-answering dataset structured similarly to SQuAD v2. The dataset includes:

Question-answer pairs in Persian
Contextual passages for answer extraction
Various question types and difficulty levels
Coverage across different topics and domains

The dataset was split into training, validation, and test sets to ensure robust evaluation of model performance.

Training procedure

Training Details

Base Model: LLaMA 1B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Framework: Hugging Face Transformers with PEFT
Optimization: AdamW optimizer with learning rate scheduling
Training was conducted with parameter-efficient fine-tuning techniques to maintain efficiency

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.15
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time
3.7277	0.1973	100	3.5422	0.003
3.1974	0.3947	200	2.8022	0.003
2.5995	0.5920	300	2.5565	0.003
2.5075	0.7893	400	2.5167	0.003
2.4734	0.9867	500	2.4958	0.003
2.4547	1.1845	600	2.4823	0.003
2.4308	1.3818	700	2.4721	0.003
2.4191	1.5792	800	2.4649	0.003
2.4162	1.7765	900	2.4593	0.003
2.4033	1.9739	1000	2.4559	0.003
2.4093	2.1717	1100	2.4534	0.003
2.3859	2.3690	1200	2.4518	0.003
2.3967	2.5664	1300	2.4510	0.003
2.3894	2.7637	1400	2.4506	0.003
2.3963	2.9610	1500	2.4505	0.003

Performance Metrics

The model was evaluated on the validation set, with the following results:

Metric	Value
Precision	0.6608
Recall	0.6511
F1 Score	0.6455
Exact Match	0.2484

Framework versions

PEFT 0.13.2
Transformers 4.46.1
Pytorch 2.5.0+cu121
Datasets 3.1.0
Tokenizers 0.20.1

azizmatin
/

llama3.2-1B-persianQAV2.0