Biomedical Informatics Lab "Mario Stefanelli"

university

AI & ML interests

Artificial Intelligence in Medicine, Biomedical, NLP, Decision Support, Temporal Data Mining, E-Health, Healthcare Risk Analysis, IT Infrastructure for Biomedical Research

BMI - Biomedical Informatics Lab "Mario Stefanelli"

About Us

BMI belongs to the Department of Electrical, Computer, and Biomedical Engineering (Faculty of Engineering) of the University of Pavia, Italy. Established in 1982, it is a leading center for education, research, and IT innovative solutions in the healthcare area. Nowadays about 30 people are working at BMI, focusing their research on:

Biomedical NLP Medical Imaging Clinical Data Mining Biomedical Knowledge Management Decision Support Systems Telemedicine E-learning

NLP Models

Our research interests have led us to frequently explore the realm of Natural Language Processing, including Transformers. Here we host public weights for our biomedical language models. There are several options to choose from, please check the details below.

Model Domain Type Details
Igea Biomedical CausalLM Pretrain Small language model trained after sapienzanlp/Minerva with more than 5 billion biomedical words in Italian. Three versions available: 350M params, 1B params, and 3B params. Use the quantized GGUF version for CPU-only, limited-hardware machines.
BioBIT * Biomedical MaskedLM Pretrain BERT model trained after dbmdz/bert-base-italian-xxl-cased with 28GB Pubmed abstracts (as in BioBERT) that have been translated from English into Italian using Neural Machine Translation (GNMT).
MedBIT * Medical MaskedLM Pretrain BERT model trained after BioBIT with additional 100MB of medical textbook data without any regularization.
MedBIT-R3+ (recommended) * Medical MaskedLM Pretrain BERT model trained after BioBIT with additional 200MB of medical textbook data and web-crawled medical resources in Italian. Regularized with LLRD (.95), Mixout (.9), and Warmup (.02).

* model developed for the Italian Neuroscience and Rehabilitation Network in partnership with the Neuroinformatics Lab of IRCCS Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy

Related Research Papers

  • Buonocore T. M., Rancati S., and Parimbelli E (2024). Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian, ArXiv. https://arxiv.org/abs/2407.06011
  • Buonocore T. M., Parimbelli E., Tibollo V., Napolitano C., Priori S., and Bellazzi R. (2023). A Rule-Free Approach for Cardiological Registry Filling from Italian Clinical Notes with Question Answering Transformers, Artificial Intelligence in Medicine: 21st International Conference on Artificial Intelligence in Medicine, AIME 2023. https://doi.org/10.1007/978-3-031-34344-5_19
  • Crema C., Buonocore T.M., Fostinelli S., Parimbelli E., Verde F., Fundarò C., Manera M., Ramusino M.C., Capelli M., Costa A., Binetti G., Bellazzi R., Redolfi A. (2023). Advancing Italian biomedical information extraction with transformers-based models: Methodological insights and multicenter practical application, Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2023.104557
  • Buonocore T. M., Crema C., Redolfi A., Bellazzi R., Parimbelli E. (2023). Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models, Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2023.104431
  • Buonocore T. M., Parimbelli E., Sacchi L., Bellazzi R., Del Campo L., & Quaglini S. (2022). Improving Keyword-Based Topic Classification in Cancer Patient Forums with Multilingual Transformers. Studies in health technology and informatics, 290, 597–601. https://doi.org/10.3233/SHTI220147

datasets

None public yet