license: mit
OmniSat: Self-Supervised Modality Fusion for Earth Observation (ECCV 2024)
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
Official models for OmniSat: Self-Supervised Modality Fusion for Earth Observation
Abstract
We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. We demonstrate the advantages of combining modalities of different natures across three downstream tasks (forestry, land cover classification, and crop mapping), and propose two augmented datasets with new modalities: PASTIS-HD and TreeSatAI-TS. For more details and results, please check out our github and project page.
Datasets
Dataset name | Modalities | Labels | Link |
---|---|---|---|
PASTIS-HD | SPOT 6-7 (1m) + S1/S2 (30-140 / year) | Crop mapping (0.2m) | huggingface or zenodo |
TreeSatAI-TS | Aerial (0.2m) + S1/S2 (10-70 / year) | Forestry (60m) | huggingface |
FLAIR | aerial (0.2m) + S2 (20-114 / year) | Land cover (0.2m) | huggingface |
Inference π₯
In order to load our pretrained models, you can run:
from models.huggingface import AnySat
## Code to use pretrained weights
model = AnySat(size="base", pretrained=True) #Exists also "small" and "tiny"
To get features from an observation of a batch of observations, you need to provide to the model a dictionnary where keys are from the list:
- "aerial": Single date tensor (Bx4xHxW) with 4 channels (RGB NiR), 0.2m resolution
- "aerial-flair": Single date tensor (Bx5xHxW) with 5 channels (RGB NiR Elevation), 0.2m resolution
- "spot": Single date tensor (Bx3xHxW) with 3 channels (RGB), 1m resolution
- "naip": Single date tensor (Bx4xHxW) with 3 channels (RGB), 1.25m resolution
- "s2": Time series tensor (BxTx10xHxW) with 10 channels (B0,B1???), 10m resolution
- "s1-asc": Time series tensor (BxTx2xHxW) with 2 channels (VV VH), 10m resolution
- "s1": Time series tensor (BxTx3xHxW) with 3 channels, 10m resolution
- "alos": Time series tensor (BxTx3xHxW) with 3 channels, 30m resolution
- "l7": Time series tensor (BxTx6xHxW) with 6 channels, 30m resolution
- "l8": Time series tensor (BxTx11xHxW) with 11 channels, rescaled to 10m resolution
- "modis": Time series tensor (BxTx7xHxW) with 7 channels, 250m resolution
Time series keys require a "{key}_dates" (for example "s2_dates") tensor of size BxT that value an integer that represent the day of the year. Then, you can run:
features = AnySat(data)
And then apply those features to the desired downstream task
To reproduce results, add new modalities, or do more experiments see the full code on github.
Citing π«