language:
- fr
- en
metrics:
- bleu
pipeline_tag: translation
model-index:
- name: NMT-EN-FR
results:
- task:
type: translation
dataset:
name: UN Corpus
type: bilingual
metrics:
- name: BLEU
type: BLEU
value: 49
library_name: ctranslate2
license: cc-by-sa-4.0
Model Details
French-to-English Machine Translation model trained by Yasmin Moslem. This model depends on the Transformer (base) architecture. The model was originally trained with OpenNMT-py and then converted to the CTranslate2 format for efficient inference.
Tools
- OpenNMT-py
- CTranslate2
Data
This model is trained on the French-to-English portion of the UN Corpus, consisting of approx. 20 million segments.
Tokenizer
The tokenizer was trained using SentencePiece on shared vocabulary. Hence, there is only one SentencePiece model that can be used for tokenizing both the source and target texts.
Demo
A demo of this model is available at: https://www.machinetranslation.io/
The demo also illustrates word-level auto-suggestions with teacher forcing.
Inference
If you want to run this model locally, you can use the CTranslate2 library.
Citation
@inproceedings{moslem-etal-2022-translation,
title = "Translation Word-Level Auto-Completion: What Can We Achieve Out of the Box?",
author = "Moslem, Yasmin and
Haque, Rejwanul and
Way, Andy",
booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wmt-1.119",
pages = "1176--1181",
}