Andrey Kutuzov
Camera ready
fb38b14
|
raw
history blame
2.2 kB
metadata
tags:
  - text2text-generation
  - definition-modeling
metrics:
  - rouge
model-index:
  - name: mt0-definition-ru-xl
    results: []
language:
  - ru
widget:
  - text: Мы сели в тачку и поехали по ресторанам. Что такое тачка?
    example_title: Definition generation
license: cc-by-sa-4.0

mT0-Definition-Ru XL

This model is a version of mT0 XL finetuned on the Russian part of CodWoE, a dataset of definitions and usage examples.

It generates definitions of Russian words in context. Its input is the usage example and the instruction question "Что такое TARGET_WORD?"

Model description

See details in the paper Enriching Word Usage Graphs with Cluster Definitions (LREC-COLING'2024) by Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions. Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Russian subset of CodWoE (Mickus et al., SemEval 2022).

Training results

mT0-Definition-Ru XL achieves the following results on the CodWoE evaluation set:

  • Loss: 1.7996
  • Rouge1: 17.5576
  • Rouge2: 8.7614
  • Rougel: 17.2533
  • Rougelsum: 17.3204
  • Gen Len: 21.6774

Training procedure

mT0-Definition-Ru XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20.0

Framework versions

  • Transformers 4.37.1
  • Pytorch 1.13.1+rocm5.2
  • Datasets 2.16.1
  • Tokenizers 0.15.1

Citation