RadReportX

Model description

Llama3.1-8B-instruct model fine-tuned on the synthetic data. This model can achieve two tasks. The first task is an open-ended question, which is to detect phrases in a radiology report that represents an ICD-10 code. There is no restriction about the underlying disease. The second task is to detect disease out of 13 candidates from a radiology report. The candidate diseases are Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged Cardiomediastinum, Fracture, Lung Lesion, Lung Opacity, Pleural Effusion, Pleural Other, Pneumonia, Pneumothorax, Support Devices. When there are no diseases out of the candidates, the model will output 'Normal'.

Training set and training process

There are two sources of training data. The first set is generated by GPT-4o. The second source comes from the MIMIC-CXR dataset (https://arxiv.org/pdf/1901.07042), with labels being extracted by Negbio. The training is conducted using torchtune framework (https://github.com/pytorch/torchtune). For details, please refer to our paper listed below.

How to use

Please refer to https://github.com/bionlplab/RadReportX

Paper

https://arxiv.org/pdf/2409.16563

Citation

@article{wei2024enhancing,
  title={Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels},
  author={Wei, Yishu and Wang, Xindi and Ong, Hanley and Zhou, Yiliang and Flanders, Adam and Shih, George and Peng, Yifan},
  journal={arXiv preprint arXiv:2409.16563},
  year={2024}
}

Disclaimer

The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. We do not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional.

Acknowledgment

This work was supported by the National Science Foundation Faculty Early Career Development (CAREER) award number 2145640, the Intramural Research Program of the National Institutes of Health, and the Amazon Research Award. The Medical Imaging and Data Resource Center (MIDRC) is funded by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under contract 75N92020D00021 and through The Advanced Research Projects Agency for Health (ARPA-H).

bionlp
/

RadReportX