File size: 4,380 Bytes
02cc712 ccaf04b 02cc712 274a7a3 dfa3b40 ccaf04b dfa3b40 ccaf04b dfa3b40 ccaf04b dfa3b40 ccaf04b dfa3b40 ccaf04b 02cc712 274a7a3 c03a7db 274a7a3 c03a7db 274a7a3 c03a7db 6e1276e 274a7a3 6e1276e 274a7a3 6e1276e 274a7a3 6e1276e 274a7a3 6e1276e c03a7db 6e1276e c03a7db 6e1276e 274a7a3 1566d52 274a7a3 dfa3b40 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
language:
- ur
license: mit
tags:
- generated_from_trainer
datasets:
- imdb_urdu_reviews
widget:
- text: میں نے یہ فلم دیکھنے کے لئے بہت احتیاط کی تھی، لیکن اس کی کہانی اور اداکاری
نے میری توقعات کو پورا کیا۔ بالکل شاندار فلم!
example_title: Positive Example 1
- text: اس فلم کی کہانی بہت بے معنی اور بے چارہ ہے۔ میں نے اپنا وقت اور پیسہ برباد
کر دیا۔ براہ کرم اس سے بچیں!
example_title: Negative Example 1
- text: یہ ناقابل فہم فلم ہے۔ کوئی بھی اسے دیکھ کر توڑ دل ہو جائے گا۔ بلکل بے فائدہ!
example_title: Negative Example 2
- text: میں نے ہمیشہ کی طرح اس فلم کو بھی بہت مزہ دیا۔ اداکاری، کہانی، اور ڈائریکشن
سب بہترین تھی۔ دل کھول کر تصویر دیکھنے کا موقع!
example_title: Positive Example 2
- text: اس فلم میں اتنی بے وقوفی دکھائی گئی ہے کہ آپ بھی اپنے دماغ کو چیک کریں گے۔
بلکل بکواس!
example_title: Negative Example 3
base_model: urduhack/roberta-urdu-small
model-index:
- name: UrduClassification
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# UrduClassification
This model is a fine-tuned version of [urduhack/roberta-urdu-small](https://huggingface.co/urduhack/roberta-urdu-small) on the imdb_urdu_reviews dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4703
## Model Details
- Model Name: Urdu Sentiment Classification
- Model Architecture: RobertaForSequenceClassification
- Base Model: urduhack/roberta-urdu-small
- Dataset: IMDB Urdu Reviews
- Task: Sentiment Classification (Positive/Negative)
## Training Procedure
The model was fine-tuned using the transformers library and the Trainer class from Hugging Face. The training process involved the following steps:
1. Tokenization: The input Urdu text was tokenized using the RobertaTokenizerFast from the "urduhack/roberta-urdu-small" pre-trained model. The texts were padded and truncated to a maximum length of 256 tokens.
2. Model Architecture: The "urduhack/roberta-urdu-small" pre-trained model was loaded as the base model for sequence classification using the RobertaForSequenceClassification class.
3. Training Arguments: The training arguments were set, including the number of training epochs, batch size, learning rate, evaluation strategy, logging strategy, and more.
4. Training: The model was trained on the training dataset using the Trainer class. The training process was performed with gradient-based optimization techniques to minimize the cross-entropy loss between predicted and actual sentiment labels.
5. Evaluation: After each epoch, the model was evaluated on the validation dataset to monitor its performance. The evaluation results, including training loss and validation loss, were logged for analysis.
6. Fine-Tuning: The model parameters were fine-tuned during the training process to optimize its performance on the IMDb Urdu movie reviews sentiment analysis task.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.4078 | 1.0 | 2500 | 0.3954 |
| 0.2633 | 2.0 | 5000 | 0.4007 |
| 0.1205 | 3.0 | 7500 | 0.4703 |
## Evaluation Results
The model was evaluated on an undisclosed dataset using a language modeling task. The evaluation results after 3 epochs of fine-tuning are as follows:
- Evaluation Loss: 0.3954
- Evaluation Runtime: 51.60 seconds
- Average Samples per Second: 96.89
- Average Steps per Second: 6.06
- Epoch: 3.0
### Framework versions
- Transformers 4.30.2
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3 |