Model
Fine-tuned mt5-base model for resolving elliptical coordinated compound noun phrases (ECCNPs) in German text. ECCNPs are are special type of coordination ellipses, where a part of a compound noun is omitted due to coordination (e.g., "and", "or", "/").
For instance, Chemo- und Strahlentherapie (chemo- and radiotherapy) is the elliptical form of Chemotherapie und Strahlentherapie (chemotherapy and radiotherapy).
Dataset
The model has been fine-tuned with a subset of sentences of GGPONC 2.0 containing manually annotated ECCNPs and their resolution. The annotated dataset is available on Zenodo: https://zenodo.org/records/12529883
Usage
The model can be loaded as a Text2TextGenerationPipeline
:
from transformers import pipeline
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base")
pipe("Chemo- und Strahlentherapie")
>>> [{'generated_text': 'Chemotherapie und Strahlentherapie'}]
pipe("Vitamin C, E und A")
>>> [{'generated_text': 'Vitamin C, Vitamin E und Vitamin A'}]
It is recommended to set max_length
to control the maximum output length. For most German sentences, a value of 256
should be enough:
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base", max_length=256)
Paper
Our approach and its evaluation have been published at the ACL BioNLP'23 workshop.
Please cite the following paper if you find our model useful:
@inproceedings{kammer-etal-2023-resolving,
title = "Resolving Elliptical Compounds in {G}erman Medical Text",
author = "Kammer, Niklas and
Borchert, Florian and
Winkler, Silvia and
de Melo, Gerard and
Schapranow, Matthieu-P.",
editor = "Demner-fushman, Dina and
Ananiadou, Sophia and
Cohen, Kevin",
booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.bionlp-1.26",
doi = "10.18653/v1/2023.bionlp-1.26",
pages = "292--305"
}
- Downloads last month
- 3