metadata
datasets:
- SLPG/Punjabi_Transliteration_Corpus
language:
- pa
metrics:
- bleu
- cer
library_name: fairseq
pipeline_tag: translation
tags:
- punjabi shahmukhi
- punjabi gurmukhi
- transliteration
- punjabi transliteration
- punjabi gur to shahmukhi
- transliteration system
- punjabi transliteration system
Punjabi Gurmukhi to Shahmukhi Transliteration System
Our supervised Punjabi transliteration systems built using unsupervised corpus are bidirectional NMT systems which effectively convert text between Gurmukhi and Shahmukhi scripts. The Gurmukhi-to-Shahmukhi model achieves a 98.1 BLEU score and 99.5% word-level accuracy, while the Shahmukhi-to-Gurmukhi model scores 87.7 BLEU.
Corpus Details
- Total Sentences: 6.3 million
- Domains Covered: Various domains including CCaligned, ccmatrix, TED, QED, OPUS, TIco, Wikimedia, Multicclaigned, Emille, IJCNLP, xlent, and paracrawl.
- Test Corpus: FLORES-101
Model Details
- **BLEU Score:** 98.1
- **Word-level Accuracy:** 99.5%
- **Character Error Rate (CER):** 99.1%
You may also explore our Shahmukhi-to-Gurmukhi Model with BLEU Score of 87.7 here.
Usage
These resources are intended to facilitate research and development in the field of Punjabi transliteration. They can be used to train new models or improve existing ones, enabling high-quality transliteration between Gurmukhi and Shahmukhi scripts.
Citation
If you use our model, kindly cite our paper:
@article{Shehzadi2024,
title={Unsupervised Punjabi Corpus and Neural Machine Transliteration
System},
author={Shehzadi Ambreen, Sadaf Abdul Rauf, MG Abbas Malik and Muhammad Imran }, journal={Heliyon},
year={2024},
note={Under review}
}