Access CXR Foundation on Hugging Face
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
To access CXR Foundation on Hugging Face, you're required to review and agree to Health AI Developer Foundation's terms of use. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately.
Log in or Sign Up to review the conditions and access this model content.
CXR Foundation model card
Model documentation: CXR Foundation
Resources:
- Model on Google Cloud Model Garden: CXR Foundation
- Model on Hugging Face: google/cxr-foundation
- GitHub repository (supporting code, Colab notebooks, discussions, and issues): cxr-foundation
- Quick start notebook: notebooks/quick_start
- Support: See Contact.
Terms of use: Health AI Developer Foundations terms of use
Author: Google
Model information
This section describes the CXR Foundation model and how to use it.
Description
CXR Foundation is a machine learning model designed to accelerate AI development for chest X-ray image analysis. It is pre-trained on large amounts of chest X-rays, to produce embeddings that capture dense features relevant for analyzing these images. As a result, the embeddings CXR Foundation produces enable the efficient training of AI models with significantly less data and compute than traditional methods. CXR Foundation offers two types of embeddings:
- ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image features relevant to X-ray analysis.
- ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and allows for projecting chest X-ray images and textual prompts into a shared embedding space. This enables powerful applications like semantic image retrieval and zero-shot classification.
You can read more about the research behind CXR Foundation in our manuscript: ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders.
How to use
For getting started quickly with Hugging Face, refer to the Quick start notebook in the next section.
If you want to use the model at scale, we recommend that you create a production version using Model Garden.
Examples
See the following Colab notebooks for examples of how to use CXR Foundation:
To give the model a quick try, running it locally with weights from Hugging Face, see Quick start notebook in Colab.
For an example of how to use the model to train a linear classifier see Linear classifier notebook in Colab.
For an example of how to retrieve images from a database using text-image similarity see Text retrieval notebook in Colab.
For an example of how to use the text embeddings to perform zero-shot inference see Zero-shot inference notebook in Colab.
Model architecture overview
The model uses the EfficientNet-L2 architecture and BERT architecture. It was trained on 821,544 CXRs from India and the US using abnormal vs. normal labels, i.e. the image contained any kind of abnormality, and the Supervised Contrastive loss as well as accompanying radiology reports and the CLIP loss and BLIP-2 losses. The abnormal vs. normal labels were obtained from more granular labels (e.g. pneumothorax, fracture) as well as regular expressions on radiology reports.
You can read more about the research behind CXR Foundation in our recent publication: Simplified Transfer Learning for Chest Radiography Models Using Less Data.
Technical specifications
- Model type: Convolutional neural network that produces embeddings
- Key publications:
- Model created: August 2, 2024
- Model version: Version: 2.0.0
Performance and validation
CXR Foundation was evaluated across a range of different tasks for data-efficient classification, zero-shot classification, semantic image retrieval, visual-question answering and report quality assurance.
Key performance metrics
Data-efficient Classification: Mean AUCs of 0.898 (across atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on CheXPert test
Zero-shot classification: Mean AUC of 0.846 across 13 findings on CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural other, pneumothorax, support devices, airspace opacity, lung lesion, pneumonia, and fracture.
Semantic image retrieval: 0.76 normalized discounted cumulative gain (NDCG) @5 across 19 queries for semantic image retrieval, including perfect retrieval on 12 of them.
Inputs and outputs
Input: Serialized
tf.Example
(with the bytes of aPNG
written in the image/encoded feature key).Output: Embedding (a vector of floating points representing a projection of the original image into a compressed feature space)
Dataset details
Training dataset
CXR Foundation was trained using the following de-identified datasets:
- MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited below);
- A private US dataset from an AMC in Illinois comprising of 165,182 images of 12,988 unique patients; and
- A private Indian dataset from five hospitals comprising of 485,082 patients of 348,335 unique patients
Labeling
Supervised learning was used to label abnormal and normal human data from radiology reports.
A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels were consistent with the report, and a board certified thoracic radiologist (CL) adjudicated cases where the LLM results differed from the ground truth in MIMIC-CXR.
Additional information about data and labels used to evaluate CXR Foundation for downstream tasks can be found in the following references:
- Sellergren A, Chen C, et al. Simplified Transfer Learning for Chest Radiography Models Using Less Data. Radiology. 2022.
- https://pubs.rsna.org/doi/10.1148/radiol.212482 (Table 1, 2, 3)
- https://github.com/google-research/google-research/tree/master/supcon
License
The use of CXR Foundation is governed by the Health AI Developer Foundations terms of use.
Data citation
- MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet.
- Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019).
- Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Implementation information
Details about the model internals.
Software
Training was done using JAX
JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
Use and limitations
Intended use
- CXR Foundation can reduce the training data, compute, and technical expertise necessary to develop AI applications for radiographs. The model has been optimized for chest X-rays, but researchers have reported success using it for other types of X-rays, including X-rays of other body parts and even veterinary X-rays. Some example applications include:
Data-efficient classification:
With a low amount of labeled data, you can train a classifier model on top of CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used downstream as an input for a variety of different classifiers, with very little additional compute. Below are some example classification tasks:
- Clinical findings like fracture or pneumothorax
- Determining X-ray image quality
- Determining the X-ray view or body part
- Determining the presence of devices
- Discovering misplaced tubes
Zero-shot classification
By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a classification score without any additional training data through textual prompts. Zero-shot works by measuring the relative distance of the image embeddings from a positive e.g., "pleural effusion present", and negative text prompt e.g., "normal X-ray". The use cases are the same as data-efficient classification but don't require data to train. The zero-shot method will outperform data-efficient classifications at low levels of training data, while the data-efficient classification will tend to exceed zero-shot performance with larger amounts of data. See ELIXR paper for more details.
Semantic image retrieval
By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a set of X-rays across a search query. Similar to Zero-shot classification, language-based image retrieval relies on the distance between the embeddings of the set of images and the text embeddings from the search query.
Benefits
CXR Foundation Embeddings can be used for efficient training of AI development for chest X-ray image analysis with significantly less data and compute than traditional methods.
By leveraging the large set of pre-trained images CXR Foundation is trained on, users need less data but can also build more generalizable models than training on more limited datasets.
Limitations
The following are known factors that might limit the generalizability or usefulness of the model output for application in downstream tasks:
The model was trained using only de-identified data from the US and India and may not generalize well to data from other countries, patient populations, or manufacturers not used in training.
The model has only been validated for a limited number of the many potential downstream tasks involving chest radiographs.
Image quality and min resolution. 1024x1024 recommended.
The model is only used to generate embeddings of user-provided data. It does not generate any predictions or diagnosis on its own.
Task-specific validation remains an important aspect of downstream model development by the end user.
As with any research, developers should ensure that any downstream application is validated to understand performance using data that is appropriately representative of the intended use setting for the specific application (e.g., age, sex, gender, condition, scanner, etc.).