text2gloss_ar

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ar on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0306
  • Word Bleu: 97.0831
  • Char Bleu: 98.9391

Model description

  • Source: Text (spoken text)
  • Target: gloss (ArSL gloss)
  • Domain: ArSL Friday sermon translation from text to gloss We used a pre-trained model (apus_mt) for domain specification.

Intended uses & limitations

  • Data Specificity: The model is trained specifically on Arabic text and ArSL glosses. It may not perform well when applied to other languages or sign languages.

  • Contextual Accuracy: While the model handles straightforward translations effectively, it might struggle with complex sentences or phrases that require a deep understanding of context, especially when combining or shuffling sentences.

  • Generalization to Unseen Data: The model’s performance may degrade when exposed to text that significantly differs in style or content from the training data, such as highly specialized jargon or informal language.

  • Gloss Representation: The model translates text into glosses, which are a written representation of sign language but do not capture the full complexity of sign language grammar and non-manual signals (facial expressions, body language).

  • Test Dataset Limitations: The test dataset used is a shortened version of a sermon that does not cover all possible sentence structures and contexts, which may limit the model’s ability to generalize to other domains.

  • Ethical Considerations: Care must be taken when deploying this model in real-world applications, as misinterpretations or inaccuracies in translation can lead to misunderstandings, especially in sensitive communications.

Training and evaluation data

  • Dataset size before augmentation: 131
  • Dataset size after augmentation: 8646
  • (For training and validation): Augmented Dataset Splitter:
  • train: 7349
  • validation: 1297
  • (For testing): We used a dataset that contained the actual scenario of the Friday sermon phrases to generate a short Friday sermon.

Training procedure

1- Train and Evaluation Result:

  • Train and Evaluation Loss: 0.464023
  • Train and Evaluation Word BLEU Score: 97.08
  • Train and Evaluation Char BLEU Score: 98.94
  • Train and Evaluation Runtime (seconds): 562.8277
  • Train and Evaluation Samples per Second: 391.718
  • Train and Evaluation Steps per Second: 12.26
  • Test Results:

2- Test Loss: 0.289312

  • Test Word BLEU Score: 76.92
  • Test Char BLEU Score: 86.30
  • Test Runtime (seconds): 1.1038
  • Test Samples per Second: 41.67
  • Test Steps per Second: 0.91

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Word Bleu Char Bleu
2.726 1.0 230 0.8206 24.8561 42.0470
0.6983 2.0 460 0.3166 61.8643 74.7375
0.3167 3.0 690 0.1288 85.4787 92.1539
0.1599 4.0 920 0.0699 92.9287 97.2020
0.0971 5.0 1150 0.0504 94.6364 97.6967
0.0626 6.0 1380 0.0383 96.3441 98.6000
0.0507 7.0 1610 0.0396 95.9440 98.5028
0.036 8.0 1840 0.0364 96.0036 98.3957
0.0289 9.0 2070 0.0306 97.0831 98.9391

Framework versions

  • Transformers 4.42.4
  • Pytorch 1.12.0+cu102
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
76.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sabbas/Text2Gloss_ar

Finetuned
(10)
this model