README.md · ibm-nasa-geospatial/Prithvi-100M at 3557d80961faf573d18e5da1e204d05ae75b4326

metadata

license: apache-2.0
tags:
  - Pytorch
  - Geospatial
  - Temporal ViT

This repository includes the foundation model architecture of Prithvi, a first-of-its-kind temporal Vision transformer pretrained by the IBM and NASA team on continental US Harmonised Landsat Sentinel 2 (HLS) data. This is contained in the hls-gfm folder, alongside all the relevant info on how to obtain the pre-trained weights through Hugging Face. This repo also contains a practical implementation of finetuning Prithvi to flood detection and fire scars detection as an example of a specific downstream application. See the fine-tuning-example folder for more details.

Model and Input

The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most other works around remote sensing modeling. Being able to handle a time series of remote sensing images can be very helpful to a variety of downstream tasks. The model can also handle static image which can be simply fed into the model with T=1.

Code

The model follows original mae repo with modifications including:

replace 2D patch embed with 3D patch embed
replace 2D positional embed with 3D positional embed
replace 2D patchify and unpatchify with 3D
etc.

Pre-training

The model was pre-trained with Harmonised Landsat and Sentinel 2 data from NASA using the following bands:

Blue
Green
Red
Narrow NIR
SWIR 1
SWIR 2