t2p-nmt-orfeo

t2p-nmt-orfeo is a text-to-pictograms translation model built by training from scratch the NMT model on a dataset of pairs of transcriptions / pictogram token sequence (each token is linked to a pictogram image from ARASAAC). The model is used only for inference.

Training details

The model was trained with Fairseq.

Datasets

The Propicto-orféo dataset is used, which was created from the CEFC-Orféo corpus. This dataset was presented in the research paper titled "A Multimodal French Corpus of Aligned Speech, Text, and Pictogram Sequences for Speech-to-Pictogram Machine Translation" at LREC-Coling 2024. The dataset was split into training, validation, and test sets.

Split	Number of utterances
train	231,374
valid	28,796
test	29,009

Parameters

This is the arguments in the training pipeline :

fairseq-train \
    data-bin/orfeo.tokenized.fr-frp \
    --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.3 --weight-decay 0.0001 \
    --save-dir exp_orfeo/checkpoints/nmt_fr_frp_orfeo \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 \
    --eval-bleu \
    --eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
    --eval-bleu-detok moses \
    --eval-bleu-remove-bpe \
    --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --max-epoch 40 \
    --keep-best-checkpoints 5 \
    --keep-last-epochs 5

Evaluation

The model was evaluated with sacreBLEU, where we compared the reference pictogram translation with the model hypothesis.

fairseq-generate exp_orfeo/data-bin/orfeo.tokenized.fr-frp \
    --path exp_orfeo/checkpoints/nmt_fr_frp_orfeo/checkpoint.best_bleu_87.2803.pt \
    --batch-size 128 --beam 5 --remove-bpe > gen_orfeo.out

The output file prints the following information :

S-16709	peut-être vous pouvez vous exprimer
T-16709	vous pouvoir exprimer
H-16709	-0.0769597738981247	vous pouvoir exprimer
D-16709	-0.0769597738981247	vous pouvoir exprimer
P-16709	-0.0936 -0.0924 -0.0065 -0.1154
Generate test with beam=5: BLEU4 = 87.43, 95.2/89.8/85.0/80.4 (BP=1.000, ratio=1.006, syslen=250949, reflen=249520)

Results

Comparison to other translation models :

Model	validation	test
t2p-t5-large-orféo	85.2	85.8
t2p-nmt-orféo	87.2	87.4
t2p-mbart-large-cc25-orfeo	75.2	75.6
t2p-nllb-200-distilled-600M-orfeo	86.3	86.9

Environmental Impact

Training was performed using a single Nvidia V100 GPU with 32 GB of memory which took around 2 hours in total.

Using t2p-nmt-orfeo model

The scripts to use the t2p-nmt-orfeo model are located in the speech-to-pictograms GitHub repository.

Information

Language(s): French
License: Apache-2.0
Developed by: Cécile Macaire
Funded by
- GENCI-IDRIS (Grant 2023-AD011013625R1)
- PROPICTO ANR-20-CE93-0005
Authors
- Cécile Macaire
- Chloé Dion
- Emmanuelle Esperança-Rodier
- Benjamin Lecouteux
- Didier Schwab

Citation

If you use this model for your own research work, please cite as follows:

@inproceedings{macaire_jeptaln2024,
  title = {{Approches cascade et de bout-en-bout pour la traduction automatique de la parole en pictogrammes}},
  author = {Macaire, C{\'e}cile and Dion, Chlo{\'e} and Schwab, Didier and Lecouteux, Benjamin and Esperan{\c c}a-Rodier, Emmanuelle},
  url = {https://inria.hal.science/hal-04623007},
  booktitle = {{35{\`e}mes Journ{\'e}es d'{\'E}tudes sur la Parole (JEP 2024) 31{\`e}me Conf{\'e}rence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26{\`e}me Rencontre des {\'E}tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024)}},
  address = {Toulouse, France},
  publisher = {{ATALA \& AFPC}},
  volume = {1 : articles longs et prises de position},
  pages = {22-35},
  year = {2024}
}