license: apache-2.0
datasets:
- HiTZ/CONAN-EUS
language:
- es
metrics:
- bleu
library_name: transformers
pipeline_tag: text2text-generation
tags:
- counternarrative
- hate speech
- text generation
Content Warning: This card may contain examples of offensive language that do not reflect the authors’ views
Model Card for mT5-counternarrative-es
This is a fine-tuned text-to-text mT5-base model to generate counternarratives against hate speech in Spanish. The model has been fine-tuned on the Spanish splits of the CONAN-EUS dataset.
CONAN-EUS was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into Basque and Spanish. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them.
CONAN-EUS Splits | Total HS-CN Count |
---|---|
train | 4833 |
validation | 537 |
test | 1278 |
- 📖 Paper: Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation In LREC-COLING 2024.
- 💻 Github Repo (Data and Code): https://github.com/ixa-ehu/conan-e/
HS-CN example
The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. An example of a HS-CN pair in Basque, Spanish and English is illustrated below:
HS | CN |
---|---|
Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. |
Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura. | ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra. |
Muslims do not have anything useful that can enrich our culture. | What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra. |
If you use the model please cite the following paper:
Citation
@inproceedings{bengoetxea-et-al-2024,
title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation},
author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri},
year={2024},
publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)",
}
Contact: Rodrigo Agerri HiTZ Center - Ixa, University of the Basque Country UPV/EHU