File size: 2,189 Bytes
4b667e1
 
4076a1a
 
 
 
e4a3640
3557d80
 
e4a3640
005aafd
3557d80
ffd5a2f
3557d80
 
8dec2a1
3557d80
 
005aafd
3557d80
1ecae73
 
 
 
 
 
3557d80
e4a3640
 
1ecae73
e4a3640
 
1ecae73
97fd090
1ecae73
abf9754
4b0e5c7
8fe21c3
1ecae73
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
tags:
- Pytorch
- Geospatial
- Temporal ViT
- Vit
---

### Model and Inputs
Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on continental US Harmonised Landsat Sentinel 2 (HLS) data. Particularly, the model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder learning strategy with an L1 loss function. The model includes spatial attention across multiple patches and also temporal attention for each patch. 

![](GFM.png)

The model expects remote sensing data in a video format (B, C, T, H, W). Note that the temporal dimension is very important here and not present in most 
other works around remote sensing modeling. Being able to handle a time series of remote sensing images can benefit a variety of downstream tasks. The model can also handle static images, which can be simply fed into the model with T=1.

### Pre-training
The model was pre-trained with NASA's HLS2 L30 product (30m granularity) from the Continental United States. The bands that were used are the following: 

1. Blue
2.  Green
3.  Red
4.  Narrow NIR
5.  SWIR 1
6.  SWIR 2

### Code
The model follows the [original mae repo](https://github.com/facebookresearch/mae) with some modifications including:

1. replace 2D patch embed with 3D patch embed;
2. replace 2D positional embed with 3D positional embed;
3. replace 2D patchify and unpatchify with 3D.
4. adding infrared bands besides RGB

### Inference and demo
There is an inference script (`Prithvi_run_inference.py`) that allows to run the image reconstruction on a set of three HLS images. These images have to be geotiff format, including the channels described above (Blue, Green, Red, Narrow NIR, SWIR, SWIR 2) in reflectance units. There is also a **demo** that leverages the same code [here](https://huggingface.co/spaces/ibm-nasa-geospatial/Prithvi-100M-demo)

### Finetuning examples
Two examples of finetuning the model for image segmentation (i.e. flood detection and burn scars detection) using the mmsegmentation library are available through [github](https://github.com/NASA-IMPACT/hls-foundation-os/tree/main/fine-tuning-examples).