| Literature DB >> 36209278 |
Vít Růžička1,2, Anna Vaughan3, Daniele De Martini4, James Fulton5, Valentina Salvatelli6,7, Chris Bridges8, Gonzalo Mateo-Garcia9,7, Valentina Zantedeschi10,7.
Abstract
Applications such as disaster management enormously benefit from rapid availability of satellite observations. Traditionally, data analysis is performed on the ground after being transferred-downlinked-to a ground station. Constraints on the downlink capabilities, both in terms of data volume and timing, therefore heavily affect the response delay of any downstream application. In this paper, we introduce RaVÆn, a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs), with the specific purpose of on-board deployment. RaVÆn pre-processes the sampled data directly on the satellite and flags changed areas to prioritise for downlink, shortening the response time. We verified the efficacy of our system on a dataset-which we release alongside this publication-composed of time series containing a catastrophic event, demonstrating that RaVÆn outperforms pixel-wise baselines. Finally, we tested our approach on resource-limited hardware for assessing computational and memory limitations, simulating deployment on real hardware.Entities:
Mesh:
Year: 2022 PMID: 36209278 PMCID: PMC9547912 DOI: 10.1038/s41598-022-19437-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Locations used for training (a) and validation (b) images.
Figure 2Example of validation sample—in this case, a hurricane event—and its corresponding ground-truth mask (which contains labels of change and clouds).
The RaVÆn dataset statistics.
| Number of locations | Cumulative | Positive rate | |
|---|---|---|---|
| Landslides | 5 | 108 | 10.48 |
| Floods | 4 | 1301 | 6.74 |
| Hurricanes | 5 | 1622 | 24.31 |
| Fires | 5 | 3485 | 53.79 |
Each location is captured in 4 time-steps before the event and once after the event. Positive rate denotes the ratio of changed to non-cloudy pixels in the last pair of images (the only frames that are annotated).
Figure 3Diagram of the proposed system. Tiles of dimension from the original Sentinel-2 multiband L1C data from the training dataset are fed to a VAE model. Here, a and b correspond to the location of the tile. The VAE is trained in an unsupervised fashion as its encoder learns to compress the tile in an Gaussian embedding representation and and the decoder to reconstruct them from there. At inference, only the trained encoder is needed as we compress evalutation dataset tiles into their embeddings and which can be compared against an history of k embeddings extracted at the same location to assess whether the tile has changed significantly and prioritise for downlink.
Differences in the architecture for different proposed model sizes.
| Total params. (millions) | Encoder params. (millions) | Extra depth | Hidden channels | Latent size | |
|---|---|---|---|---|---|
| Small model | 0.443 | 0.285 | 0 | 16, 32, 64 | 128 |
| Medium model | 0.979 | 0.617 | 0 | 32, 64, 128 | 128 |
| Large model | 1.463 | 1.007 | 2 | 32, 64, 128 | 128 |
Note that during inference, we only need the encoder network of the VAE model. We also only need to process the newly acquired image to obtain their latent representation, while the latent vectors of the previous image can be loaded.
Figure 4Comparison of the change detected using the baseline and the large VAE method on an example of a flooding river. Two images immediately before and immediately after a change are shown, along with the human labels of change and the calculated change scores. Both methods used a history of frames.
Figure 5Additional comparison of the change detected using the baseline and the large VAE method on an example of a fire disaster. Both methods used a history of frames. The cosine baseline prediction seems to more closely copy the details present in the image, making it susceptible to small, noisy variations between the two images.
AUPRC for baseline and VAE methods with time window (averaged over 5 runs).
| Detection method | Dataset | |||
|---|---|---|---|---|
| Landslides | Floods | Hurricanes | Fires | |
| Cosine baseline | 0.378 | 0.513 | 0.818 | |
| Euclidean baseline | 0.267 | 0.326 | 0.351 | 0.770 |
| Cosine embedding | 0.599 ± 0.012 | |||
| Euclidean embedding | 0.266 ± 0.004 | 0.478 ± 0.019 | 0.800 ± 0.011 | |
| KL-Divergence | 0.258 ± 0.022 | 0.247 ± 0.018 | 0.301 ± 0.035 | 0.731 ± 0.016 |
AUPRC for the best performing metrics from Table 3 with and without an extended history k (averaged over 5 runs).
| Landslides | Floods | Hurricanes | Fires | ||
|---|---|---|---|---|---|
| Cosine baseline | 1 | 0.629 | 0.378 | 0.513 | 0.818 |
| 3 | 0.622 | 0.378 | 0.570 | 0.865 | |
| Cosine embedding | 1 | 0.599 ± 0.012 | 0.676 ± 0.014 | 0.833 ± 0.008 | |
| 3 | |||||
AUPRC and timings for different sizes of model (averaged over 5 runs).
| Dataset | Runtime (seconds) | ||||
|---|---|---|---|---|---|
| Landslides | Floods | Hurricanes | Fires | ||
| Small model | 0.907 ± 0.002 | 2.06 | |||
| Medium model | 0.428 ± 0.004 | 4.86 | |||
| Large model | 0.726 ± 0.011 | 13.98 | |||
The AUPRC results are for the cosine similarity of the embedding with a history of 3 frames. Runtime is measured on-board of the Xilinx Pynq board.
AUPRC and for models with different latent sizes (averaged over 5 runs).
| Latent size | Dataset | Total params. (millions) | |||
|---|---|---|---|---|---|
| Landslides | Floods | Hurricanes | Fires | ||
| 128 | 1.463 | ||||
| 96 | 0.723 ± 0.005 | 0.419 ± 0.010 | 0.687 ± 0.034 | 0.905 ± 0.004 | 1.266 |
| 64 | 0.699 ± 0.015 | 0.392 ± 0.017 | 0.903 ± 0.006 | 1.069 | |
The AUPRC results are for the cosine similarity of the embedding with a history of 3 frames. Models use parameters for the default large model, but use variable latent sizes. We also show the total number of parameters of each model.
Figure 6UMAP visualisation of encoded tiles from flooded scene presented on Fig. 4. Tiles from the image before the event are marked as green, while the tiles from after the event are shown in red. Tiles corresponding to the flooded tiles, marked with blue, can be seen clustered together in contrast to the rest of the data from this scene.