partex-nv
/

Llama-3.1-8B-VaaniSetu-EN2PA

text-2-text translation

English2Punjabi

Model card Files Files and versions Community

partex-nv commited on Sep 25

Commit

4a7b1b4

•

1 Parent(s): 96f521d

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ This model aims to bridge the gap in **open-source English to Punjabi translatio
 - **Training Data**: 10 million English<>Punjabi parallel sentences from [AI4Bharat's Bharat Parallel Corpus Collection (BPCC)](https://github.com/AI4Bharat/IndicTrans2).
 - **Evaluation Data**: The model has been evaluated on **1503 samples** from the **IN22-Conv dataset**, which is also available via [IndicTrans2](https://github.com/AI4Bharat/IndicTrans2).
 - **Model Architecture**: Based on **LLaMA 3.1 8B** with BF16 precision.
-- **Score (chrF++)**: Achieved a **chrF++ score of 28.1** on the IN22-Conv dataset, which is an excellent score for an open-source model. The benchmark chrF++ score for Google Translate is 61.1 (as noted in [this paper](https://arxiv.org/pdf/2305.16307)).
 This is the **first release** of the model, and future updates aim to improve the chrF++ score for enhanced translation quality.
@@ -133,7 +133,6 @@ Stay tuned for updates, and feel free to contribute or raise issues on Hugging F
 - **Training Data**: [Bharat Parallel Corpus Collection (BPCC)](https://github.com/AI4Bharat/IndicTrans2) by AI4Bharat.
 - **Evaluation Data**: [IN22-Conv dataset](https://github.com/AI4Bharat/IndicTrans2).
-- **Benchmarks**: [Translation Benchmarks Paper](https://arxiv.org/pdf/2305.16307).
 ## Contributors

 - **Training Data**: 10 million English<>Punjabi parallel sentences from [AI4Bharat's Bharat Parallel Corpus Collection (BPCC)](https://github.com/AI4Bharat/IndicTrans2).
 - **Evaluation Data**: The model has been evaluated on **1503 samples** from the **IN22-Conv dataset**, which is also available via [IndicTrans2](https://github.com/AI4Bharat/IndicTrans2).
 - **Model Architecture**: Based on **LLaMA 3.1 8B** with BF16 precision.
+- **Score (chrF++)**: Achieved a **chrF++ score of 28.1** on the IN22-Conv dataset, which is an excellent score for an open-source model.
 This is the **first release** of the model, and future updates aim to improve the chrF++ score for enhanced translation quality.
 - **Training Data**: [Bharat Parallel Corpus Collection (BPCC)](https://github.com/AI4Bharat/IndicTrans2) by AI4Bharat.
 - **Evaluation Data**: [IN22-Conv dataset](https://github.com/AI4Bharat/IndicTrans2).
 ## Contributors