Impression section Generator For Radiology Reports 🏥

This model is is the result of participation of SINAI team in Task 1B: Radiology Report Summarization at the BioNLP workshop held on ACL 2023. The goal of this task is to foster development of automatic radiology report summarization systems and expanding their applicability by incorporating seven different modalities and anatomies in the provided data. We propose to automate the generation of radiology impressions with "sequence-to-sequence" learning that leverages the power of publicly available pre-trained models, both general domain and biomedical domain-specific. This repository provides access to our best-performing system that resulted from fine-tuning of Sci-Five base, which is T5 model trained for extra 200k steps to optimize it in the context of biomedical literature.

Results

The official evaluation results prove that adaptation of a general-domain system for biomedical literature is beneficial for the subsequent fine-tuning for radiology report summarization task. The Table below summarizes the official scores obtained by this model during the official evaluation. Team standings re available here.

BLEU4	ROUGE-L	BERTscore	F1-RadGraph
017.38	32.32	55.04	33.96

System description paper and citation

The paper with the detailed description of the system is published in the Proceedings of the 22st Workshop on Biomedical Language Processing.

BibTeX citation:

@inproceedings{chizhikova-etal-2023-sinai,
    title = "{SINAI} at {R}ad{S}um23: Radiology Report Summarization Based on Domain-Specific Sequence-To-Sequence Transformer Model",
    author = "Chizhikova, Mariia  and
      Diaz-Galiano, Manuel  and
      Urena-Lopez, L. Alfonso  and
      Martin-Valdivia, M. Teresa",
    booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.bionlp-1.53",
    pages = "530--534",
    abstract = "This paper covers participation of the SINAI team in the shared task 1B: Radiology Report Summarization at the BioNLP workshop held on ACL 2023. Our proposal follows a sequence-to-sequence approach which leverages pre-trained multilingual general domain and monolingual biomedical domain pre-trained language models. The best performing system based on domain-specific model reached 33.96 F1RadGraph score which is the fourth best result among the challenge participants. This model was made publicly available on HuggingFace. We also describe an attempt of Proximal Policy Optimization Reinforcement Learning that was made in order to improve the factual correctness measured with F1RadGraph but did not lead to satisfactory results.",
}