CXR Foundation model card

Model documentation: CXR Foundation

Resources:

Model on Google Cloud Model Garden: CXR Foundation
Model on Hugging Face: google/cxr-foundation
GitHub repository (supporting code, Colab notebooks, discussions, and issues): cxr-foundation
Quick start notebook: notebooks/quick_start
Support: See Contact.

Terms of use: Health AI Developer Foundations terms of use

Author: Google

Model information

This section describes the CXR Foundation model and how to use it.

Description

CXR Foundation is a machine learning model designed to accelerate AI development for chest X-ray image analysis. It is pre-trained on large amounts of chest X-rays, to produce embeddings that capture dense features relevant for analyzing these images. As a result, the embeddings CXR Foundation produces enable the efficient training of AI models with significantly less data and compute than traditional methods. CXR Foundation offers two types of embeddings:

ELIXR v2.0: Produces 32x768 dimensional vectors, capturing detailed image features relevant to X-ray analysis.
ELIXR-contrastive / v2.0 text: Generates 32x128 dimensional vectors and allows for projecting chest X-ray images and textual prompts into a shared embedding space. This enables powerful applications like semantic image retrieval and zero-shot classification.

You can read more about the research behind CXR Foundation in our manuscript: ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders.

How to use

For getting started quickly with Hugging Face, refer to the Quick start notebook in the next section.

If you want to use the model at scale, we recommend that you create a production version using Model Garden.

Examples

See the following Colab notebooks for examples of how to use CXR Foundation:

To give the model a quick try, running it locally with weights from Hugging Face, see Quick start notebook in Colab.
For an example of how to use the model to train a linear classifier see Linear classifier notebook in Colab.
For an example of how to retrieve images from a database using text-image similarity see Text retrieval notebook in Colab.
For an example of how to use the text embeddings to perform zero-shot inference see Zero-shot inference notebook in Colab.

Model architecture overview

The model uses the EfficientNet-L2 architecture and BERT architecture. It was trained on 821,544 CXRs from India and the US using abnormal vs. normal labels, i.e. the image contained any kind of abnormality, and the Supervised Contrastive loss as well as accompanying radiology reports and the CLIP loss and BLIP-2 losses. The abnormal vs. normal labels were obtained from more granular labels (e.g. pneumothorax, fracture) as well as regular expressions on radiology reports.

You can read more about the research behind CXR Foundation in our recent publication: Simplified Transfer Learning for Chest Radiography Models Using Less Data.

Technical specifications

Model type: Convolutional neural network that produces embeddings
Key publications:
- Simplified Transfer Learning for Chest Radiography Models Using Less Data
- ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Model created: August 2, 2024
Model version: Version: 2.0.0

Performance and validation

CXR Foundation was evaluated across a range of different tasks for data-efficient classification, zero-shot classification, semantic image retrieval, visual-question answering and report quality assurance.

Key performance metrics

Data-efficient Classification: Mean AUCs of 0.898 (across atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) on CheXPert test
Zero-shot classification: Mean AUC of 0.846 across 13 findings on CheXpert test. Findings included: atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema, enlarged cardiomediastinum, pleural other, pneumothorax, support devices, airspace opacity, lung lesion, pneumonia, and fracture.
Semantic image retrieval: 0.76 normalized discounted cumulative gain (NDCG) @5 across 19 queries for semantic image retrieval, including perfect retrieval on 12 of them.
Reference: ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Inputs and outputs

Input: Serialized tf.Example (with the bytes of a PNG written in the image/encoded feature key).
Output: Embedding (a vector of floating points representing a projection of the original image into a compressed feature space)

Dataset details

Training dataset

CXR Foundation was trained using the following de-identified datasets:

MIMIC-CXR, comprising of 243,324 images of 60,523 unique patients (cited below);
A private US dataset from an AMC in Illinois comprising of 165,182 images of 12,988 unique patients; and
A private Indian dataset from five hospitals comprising of 485,082 patients of 348,335 unique patients

Labeling

Supervised learning was used to label abnormal and normal human data from radiology reports.

A medically tuned LLM, Med-Palm 2 29, was then applied to ensure that the labels were consistent with the report, and a board certified thoracic radiologist (CL) adjudicated cases where the LLM results differed from the ground truth in MIMIC-CXR.

Additional information about data and labels used to evaluate CXR Foundation for downstream tasks can be found in the following references:

License

The use of CXR Foundation is governed by the Health AI Developer Foundations terms of use.

Data citation

MIMIC-CXR Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet.
Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019).
Available on Physionet Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Implementation information

Details about the model internals.

Software

Training was done using JAX

JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.

Use and limitations

Intended use

CXR Foundation can reduce the training data, compute, and technical expertise necessary to develop AI applications for radiographs. The model has been optimized for chest X-rays, but researchers have reported success using it for other types of X-rays, including X-rays of other body parts and even veterinary X-rays. Some example applications include:

Data-efficient classification:

With a low amount of labeled data, you can train a classifier model on top of CXR Foundation embeddings (ELIXR v2.0). Furthermore, each embedding can be used downstream as an input for a variety of different classifiers, with very little additional compute. Below are some example classification tasks:

Clinical findings like fracture or pneumothorax
Determining X-ray image quality
Determining the X-ray view or body part
Determining the presence of devices
Discovering misplaced tubes

Zero-shot classification

By using the contrastive mode (ELIXR-contrastive / v2.0 text), users can get a classification score without any additional training data through textual prompts. Zero-shot works by measuring the relative distance of the image embeddings from a positive e.g., "pleural effusion present", and negative text prompt e.g., "normal X-ray". The use cases are the same as data-efficient classification but don't require data to train. The zero-shot method will outperform data-efficient classifications at low levels of training data, while the data-efficient classification will tend to exceed zero-shot performance with larger amounts of data. See ELIXR paper for more details.

Semantic image retrieval

By using the contrastive mode (ELIXR-contrastive / v2.0 text) users can rank a set of X-rays across a search query. Similar to Zero-shot classification, language-based image retrieval relies on the distance between the embeddings of the set of images and the text embeddings from the search query.

Benefits

CXR Foundation Embeddings can be used for efficient training of AI development for chest X-ray image analysis with significantly less data and compute than traditional methods.
By leveraging the large set of pre-trained images CXR Foundation is trained on, users need less data but can also build more generalizable models than training on more limited datasets.

Limitations

The following are known factors that might limit the generalizability or usefulness of the model output for application in downstream tasks:

The model was trained using only de-identified data from the US and India and may not generalize well to data from other countries, patient populations, or manufacturers not used in training.
The model has only been validated for a limited number of the many potential downstream tasks involving chest radiographs.
Image quality and min resolution. 1024x1024 recommended.
The model is only used to generate embeddings of user-provided data. It does not generate any predictions or diagnosis on its own.
Task-specific validation remains an important aspect of downstream model development by the end user.
As with any research, developers should ensure that any downstream application is validated to understand performance using data that is appropriately representative of the intended use setting for the specific application (e.g., age, sex, gender, condition, scanner, etc.).

google
/

cxr-foundation

Access CXR Foundation on Hugging Face