metadata

license: mit

OmniSat: Self-Supervised Modality Fusion for Earth Observation (ECCV 2024)

Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu

Official models for OmniSat: Self-Supervised Modality Fusion for Earth Observation

Abstract

We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. We demonstrate the advantages of combining modalities of different natures across three downstream tasks (forestry, land cover classification, and crop mapping), and propose two augmented datasets with new modalities: PASTIS-HD and TreeSatAI-TS. For more details and results, please check out our github and project page.

Datasets

Dataset name	Modalities	Labels	Link
PASTIS-HD	SPOT 6-7 (1m) + S1/S2 (30-140 / year)	Crop mapping (0.2m)	huggingface or zenodo
TreeSatAI-TS	Aerial (0.2m) + S1/S2 (10-70 / year)	Forestry (60m)	huggingface
FLAIR	aerial (0.2m) + S2 (20-114 / year)	Land cover (0.2m)	huggingface

Inference 🔥

In order to load our pretrained models, you can run:

from models.huggingface import AnySat

## Code to use pretrained weights
model = AnySat(size="base", pretrained=True) #Exists also "small" and "tiny"

To get features from an observation of a batch of observations, you need to provide to the model a dictionnary where keys are from the list:

"aerial": Single date tensor (Bx4xHxW) with 4 channels (RGB NiR), 0.2m resolution
"aerial-flair": Single date tensor (Bx5xHxW) with 5 channels (RGB NiR Elevation), 0.2m resolution
"spot": Single date tensor (Bx3xHxW) with 3 channels (RGB), 1m resolution
"naip": Single date tensor (Bx4xHxW) with 3 channels (RGB), 1.25m resolution
"s2": Time series tensor (BxTx10xHxW) with 10 channels (B0,B1???), 10m resolution
"s1-asc": Time series tensor (BxTx2xHxW) with 2 channels (VV VH), 10m resolution
"s1": Time series tensor (BxTx3xHxW) with 3 channels, 10m resolution
"alos": Time series tensor (BxTx3xHxW) with 3 channels, 30m resolution
"l7": Time series tensor (BxTx6xHxW) with 6 channels, 30m resolution
"l8": Time series tensor (BxTx11xHxW) with 11 channels, rescaled to 10m resolution
"modis": Time series tensor (BxTx7xHxW) with 7 channels, 250m resolution

Time series keys require a "{key}_dates" (for example "s2_dates") tensor of size BxT that value an integer that represent the day of the year. Then, you can run:

features = AnySat(data)

And then apply those features to the desired downstream task

To reproduce results, add new modalities, or do more experiments see the full code on github.

g-astruc
/

AnySat

OmniSat: Self-Supervised Modality Fusion for Earth Observation (ECCV 2024)

Abstract

Datasets

Inference 🔥

Citing 💫