ibm-nasa-geospatial
/

Prithvi-100M

Model card Files Files and versions Community

Prithvi-100M / README.md

Paolo-Fraccaro's picture

Update README.md

1ecae73 over 1 year ago

|

1.74 kB

	---
	license: apache-2.0
	tags:
	- Pytorch
	- Geospatial
	- Temporal ViT
	- Vit
	---

	### Model and Inputs
	Prithvi is a first-of-its-kind temporal Vision transformer pretrained by the IBM and NASA team on continental US Harmonised Landsat Sentinel 2 (HLS) data. Particularly, the model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder learning strategy, with a MSE as a loss function. The model includes spatial attention across multiple patchies and also temporal attention for each patch.

	![](Prithvi_training.png)

	The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most
	other works around remote sensing modeling. Being able to handle a time series of remote sensing images can be very helpful to a variety of downstream tasks. The model can also handle static image which can be simply fed into the model with T=1.

	### Pre-training
	The model was pre-trained with NASA's HLS2 L30 product (30m granularity) from Continental United States. The bands that were used are the following:

	1. Blue
	2. Green
	3. Red
	4. Narrow NIR
	5. SWIR 1
	6. SWIR 2

	### Code
	The model follows the [original mae repo](https://github.com/facebookresearch/mae) with some modifications including:

	1. replace 2D patch embed with 3D patch embed;
	2. replace 2D positional embed with 3D positional embed;
	3. replace 2D patchify and unpatchify with 3D.

	### Finetuning examples
	Two examples of finetuning the model for image segmentation (i.e. flood detection and burn scars detection) using the mmsegmentation library are available through [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples).