jaio98's picture
Update README.md
6e66b5f verified
|
raw
history blame
3.53 kB
metadata
license: apache-2.0
datasets:
  - HiTZ/CONAN-EUS
language:
  - es
metrics:
  - bleu
library_name: transformers
pipeline_tag: text2text-generation
tags:
  - counternarrative
  - hate speech
  - text generation

Content Warning: This card may contain examples of offensive language that do not reflect the authors’ views

Model Card for mT5-counternarrative-es

This is a fine-tuned text-to-text mT5-base model to generate counternarratives against hate speech in Spanish. The model has been fine-tuned on the Spanish splits of the CONAN-EUS dataset.

CONAN-EUS was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into Basque and Spanish. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them.

CONAN-EUS Splits Total HS-CN Count
train 4833
validation 537
test 1278

HS-CN example

The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. An example of a HS-CN pair in Basque, Spanish and English is illustrated below:

HS CN
Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra.
Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura. ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.
Muslims do not have anything useful that can enrich our culture. What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.

If you use the model please cite the following paper:

Citation

@inproceedings{bengoetxea-et-al-2024,
      title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation},
      author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri},
      year={2024},
      publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)",
}

Contact: Rodrigo Agerri HiTZ Center - Ixa, University of the Basque Country UPV/EHU