| Literature DB >> 35891647 |
Martijn Witjes1, Leandro Parente1, Chris J van Diemen1, Tomislav Hengl1,2, Martin Landa3, Lukáš Brodský3, Lena Halounova3, Josip Križan4, Luka Antonić4, Codrina Maria Ilie5,6, Vasile Craciunescu5,7, Milan Kilibarda8, Ognjen Antonijević8, Luka Glušica9.
Abstract
A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with "urbanization" showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict LULC in years that were not included in its training dataset, allowing generalization to past and future periods, e.g. to predict LULC for years prior to 2000 and beyond 2020. The generated LULC time-series data stack (ODSE-LULC), including the training points, is publicly available via the ODSE Viewer. Functions used to prepare data and run modeling are available via the eumap library for Python. ©2022 Witjes et al.Entities:
Keywords: Big data; Ensemble; Environmental monitoring; Land use/land cover; Landsat; Machine learning; Probability; Spatial analysis; Spatiotemporal; Uncertainty
Year: 2022 PMID: 35891647 PMCID: PMC9308969 DOI: 10.7717/peerj.13573
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 3.061
Inventory and comparison of existing land cover data products at finer spatial resolutions (≤300 m) available for the continental Europe.
| Product/reference | Time span | Spatial resolution | Mapping accuracy | Number of classes | Uncertainty/ probability |
|---|---|---|---|---|---|
| CLC | 1990, 2000, 2006, 2012, 2018 | 100 m (25 ha) | ≤85% | 44 | N/N |
| ESA CCI-LC | 1998–2002, 2003–2007, 2008–2012 | 300-m | 73% | 22 | N/N |
|
| 2006 | 100-m | 70% | 42 | N/N |
| S2GLC ( | 2017 | 10 m | 89% | 15 | N/N |
|
| 2014–2016 | 30 m | 75% | 12 | N/N |
| GLCFCS30 ( | 2015, 2020 | 30-m | 83%/71%/69% | 9/16/24 | N/N |
|
| 2015, 2016, 2017, 2018 | 100 m | 80% | 10 | N/ |
| ESA WorldCover | 2020 | 10 m | ≤75% | ≤10 | N/N |
| ELC10 ( | 2020 | 10 m | 90% | 8 | N/N |
| ODSE-LULC (our product) | 2000, 2001, …, 2019 | 30 m | 43 |
Figure 1General workflow used to prepare point data and variable layers, fit models and generate annual land cover products (2000–2019).
Components of the workflows are described in detail via the eumap library (https://eumap.readthedocs.io/), with technical documentation available via https://gitlab.com/geoharmonizer_inea.
Figure 2Structure of the ensemble.
Time-series data and static data are used to train three component models. Each component model predicts 43 probabilities (one per class). We calculate class-wise uncertainty as a separate output by taking the standard deviation of the three component probabilities per class. The 129 probabilities are used to train the logistic regression meta-learner, which predicts 43 probabilities that are used to map LULC.
Minimum and maximum value of each hyperparameter that was optimized for the random forest and gradient boosted tree learners.
| Model | Hyperparameter | Lower value | Upper value |
|---|---|---|---|
| Random Forest | Number of estimators | 50 | 100 |
| Maximum tree depth | 5 | 50 | |
| Maximum number of features | 0 | 0.9 | |
| Minimum samples per leaf | 5 | 30 | |
| Gradient boosted trees | Eta | 0.001 | 0.9 |
| Gamma | 0 | 12 | |
| Alpha | 0 | 1 | |
| Maximum tree depth | 2 | 10 | |
| Number of estimators | 10 | 50 |
Figure 3Map of the study area, overlaid with a grid of 30 km tiles that was used for spatial 5-fold cross-validation.
Grid color indicates the number of training points aggregated per tile.
The ODSE-LULC land cover legend used based on CLC (Bossard et al., 2000).
Note: To make table formatting easier, we refer to class 243 as ‘Agriculture with significant natural vegetation‘ in all other tables.
| Class name | Class description |
|---|---|
| 111: Continuous urban fabric | Surface area covered for more than 80% by urban structures and other impermeable, artificial features. |
| 112: Discontinuous urban fabric | Surface area covered between 30% and 80% by urban structures and other impermeable, artificial features. |
| 121: Industrial or commercial units | Land units that are under industrial or commercial use or serve for public service facilities. |
| 122: Road and rail networks | Motorways and railways, including associated installations. |
| 123: Port areas | Infrastructure of port areas, including quays, dockyards and marinas. |
| 124: Airports | Airports installations: runways, buildings and associated land. |
| 131: Mineral extraction sites | Areas of open-pit extraction of construction materials (sandpits, quarries) or other minerals (open-cast mines). |
| 132: Dump sites | Public, industrial or mine dump sites. |
| 133: Construction sites | Spaces under construction development, soil or bedrock excavations, earthworks. |
| 141: Urban green | Areas with vegetation within urban fabric. |
| 142: Sport and leisure facilities | Areas used for sports, leisure and recreation purposes. |
| 211: Non-irrigated arable land | Cultivated land parcels under rain-fed agricultural use for annually harvested non-permanent crops, normally under a crop rotation system. |
| 212: Permanently irrigated arable land | Cultivated land parcels under agricultural use for arable crops that are permanently or periodically irrigated. |
| 213: Rice fields | Cultivated land parcels prepared for rice production, consisting of periodically flooded flat surfaces with irrigation channels. |
| 221: Vineyards | Areas planted with vines. |
| 222: Fruit trees and berry plantations | Cultivated parcels planted with fruit trees and shrubs, including nuts, intended for fruit production. |
| 223: Olive groves | Cultivated areas planted with olive trees, including mixed occurrence of vines on the same parcel. |
| 231: Pastures | Meadows with dispersed trees and shrubs occupying up to 50% of surface characterized by rich floristic composition. |
| 241: Annual crops associated with permanent crops | Cultivated land parcels with a mixed coverage of non-permanent (e.g., wheat) and permanent crops (e.g., olive trees). |
| 242: Complex cultivation patterns | Mosaic of small cultivated land parcels with different cultivation types (annual and permanent crops, as well as pastures), potentially with scattered houses or gardens. |
| 243: Land principally occupied by agriculture with significant areas of natural vegetation | Areas principally occupied with agriculture, interspersed with significant semi-natural areas in a mosaic pattern. |
| 244: Agro-forestry areas | Annual crops or grazing land under the wooded cover of forestry species. |
| 311: Broad-leaved forest | Vegetation formation composed principally of trees, including shrub and bush understorey, where broad-leaved species predominate. |
| 312: Coniferous forest | Vegetation formation composed principally of trees, including shrub and bush understorey, where coniferous species predominate. |
| 313: Mixed forest | Vegetation formation composed principally of trees, including shrub and bush understory, where neither broad-leaved nor coniferous species predominate. |
| 321: Natural grasslands | Grasslands under no or moderate human influence. Low productivity grasslands. Often in areas of rough, uneven ground, also with rocky areas, or patches of other (semi-)natural vegetation. |
| 322: Moors and heathland | Vegetation with low and closed cover, dominated by bushes, shrubs (heather, briars, broom, gorse, laburnum |
| 323: Sclerophyllous vegetation | Bushy sclerophyllous vegetation in a climax stage of development, including maquis, matorral and garrigue. |
| 324: Transitional woodland-shrub | Transitional bushy and herbaceous vegetation with occasional scattered trees. Can represent either woodland degradation or forest regeneration / re-colonization. |
| 331: Beaches, dunes, sands | Natural un-vegetated expanses of sand or pebble/gravel, in coastal or continental locations, like beaches, dunes, gravel pads. |
| 332: Bare rocks | Scree, cliffs, rock outcrops, including areas of active erosion. |
| 333: Sparsely vegetated areas | Areas with sparse vegetation, covering 10–50% of the surface. |
| 334: Burnt areas | Areas affected by recent fires. |
| 335: Glaciers and perpetual snow | Land covered by ice or permanent snowfields. |
| 411 Inland marshes | Low-lying land usually flooded in winter, and with ground more or less saturated by fresh water all year round. |
| 412 Peat bogs | Wetlands with accumulation of considerable amount of decomposed moss (mostly Sphagnum) and vegetation matter. Both natural and exploited peat bogs. |
| 421 Salt marshes | Vegetated low-lying areas in the coastal zone, above the high-tide line, susceptible to flooding by seawater. |
| 422 Salines | Sections of salt marsh exploited for the production of salt by evaporation, active or in process of abandonment, distinguishable from marsh by parcellation or embankment systems. |
| 423 Intertidal flats | Area between the average lowest and highest sea water level at low tide and high tide. Generally non-vegetated expanses of mud, sand or rock lying between high and low water marks. |
| 511: Water courses | Natural or artificial water courses for water drainage channels. |
| 512: Water bodies | Natural or artificial water surfaces covered by standing water most of the year. |
| 521: Coastal lagoons | Str |
| 522: Estuaries | The mouth of a river under tidal influence within which the tide ebbs and flows. |
Figure 4General workflow for merging training points obtained from LUCAS and CLC.
Figure 5Distribution of training points per data source (blue and green), class (top) and per survey year (bottom).
Each bar shows the proportion of points extracted from CLC centroids (blue) and from the LUCAS dataset (green). The proportion of CLC points removed by the OSM and HRL filter step is indicated in red.
Per-class conditions applied only to CLC points during the filtering step.
All the raster layers were upsampled to 30×30 m resolution by average and the points that did not meet the specified condition were omitted from the training dataset.
| Condition | HRL | OSM | HRL+OSM | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Code | Class | Tree cover | Grass | Imp. | Perm. Water | Perm. Wetness | Temp. Wetness | Rails | Roads | Buildings | |
| 111 | Continuous urban fabric | – | >50 and <150 | ||||||||
| 112 | Discontinuous urban fabric | >50 and <150 | |||||||||
| 121 | Industrial or commercial units | ||||||||||
| 122 | Road and rail networks and associated land | OR | >30 | >30 | >30 | ||||||
| 123 | Port areas | ||||||||||
| 124 | Airports | ||||||||||
| 131 | Mineral extraction sites | AND | = 0 | = 0 | |||||||
| 132 | Dump sites | ||||||||||
| 133 | Construction sites | ||||||||||
| 141 | Green urban areas | ( OR ) AND | >0 | >0 | <50 or >150 | ||||||
| 142 | Sport and leisure facilities | ||||||||||
| 211 | Non-irrigated arable land | AND | = 0 | = 0 | = 0 | <50 or >150 | |||||
| 212 | Permanently irrigated arable land | = 0 | = 0 | = 0 | <50 or >150 | ||||||
| 213 | Rice fields | = 0 | = 0 | <50 or >150 | |||||||
| 221 | Vineyards | AND | = 0 | = 0 | = 0 | <50 or >150 | |||||
| 222 | Fruit trees and berry plantations | AND | = 0 | = 0 | = 0 | <50 or >150 | |||||
| 223 | Olive groves | AND | = 0 | = 0 | = 0 | <50 or >150 | |||||
| 231 | Pastures | AND | = 0 | = 0 | = 0 | <50 or >150 | |||||
| 241 | Annual crops associated with permanent crops | = 0 | = 0 | <50 or >150 | |||||||
| 242 | Complex cultivation patter | = 0 | = 0 | <50 or >150 | |||||||
| 243 | Agriculture with significant natural vegetation | = 0 | = 0 | <50 or >150 | |||||||
| 244 | Agro-forestry areas | >0 | = 0 | = 0 | <50 or >150 | ||||||
| 311 | Broad-leaved forest | AND | >0 | = 0 | = 0 | <50 or >150 | |||||
| 312 | Coniferous forest | AND | >0 | = 0 | = 0 | <50 or >150 | |||||
| 313 | Mixed forest | >0 | = 0 | = 0 | <50 or >150 | ||||||
| 321 | Natural grasslands | AND | = 0 | >0 | = 0 | = 0 | <50 or >150 | ||||
| 322 | Moors and heathland | = 0 | = 0 | <50 or >150 | |||||||
| 323 | Sclerophyllous vegetation | = 0 | = 0 | <50 or >150 | |||||||
| 324 | Transitional woodland-shrub | = 0 | = 0 | <50 or >150 | |||||||
| 331 | Beaches, dunes, sand | = 0 | = 0 | <50 or >150 | |||||||
| 332 | Bare rocks | = 0 | = 0 | <50 or >150 | |||||||
| 333 | Sparsely vegetated areas | = 0 | = 0 | <50 or >150 | |||||||
| 334 | Burnt areas | = 0 | = 0 | <50 or >150 | |||||||
| 335 | Glaciers and perpetual snow | = 0 | = 0 | <50 or >150 | |||||||
| 411 | Inland marshes | OR | >0 | >0 | = 0 | = 0 | <50 or >150 | ||||
| 412 | Peat bogs | = 0 | = 0 | <50 or >150 | |||||||
| 421 | Salt marshes | = 0 | = 0 | <50 or >150 | |||||||
| 422 | Salines | = 0 | = 0 | <50 or >150 | |||||||
| 423 | Intertidal flats | = 0 | = 0 | <50 or >150 | |||||||
| 511 | Water courses | >50 | |||||||||
| 512 | Water bodies | – | = 100 | ||||||||
| 521 | Coastal lagoons | >50 | |||||||||
| 522 | Estuaries | >50 |
Spectral indices derived from the Landsat data and used as additional variables in the spatiotemporal EML.
| Spectral index | Equation | Reference |
|---|---|---|
| NDVI |
|
|
| SAVI |
|
|
| MSAVI |
|
|
| NDWI |
|
|
| NBR |
|
|
| NDMI |
|
|
| NBR2 |
|
|
| REI |
|
|
Reclassification key used to validate the predictions of our ensemble model on the S2GLC point dataset collected by Malinowski et al. (2020).
| S2GLC | ODSE-LULC |
|---|---|
| 111: Continuous urban fabric | |
| 112: Discontinuous urban fabric | |
| 121: Industrial or commercial units | |
| 111: Artificial Surfaces | 122: Road and rail networks and associated land |
| 123: Port areas | |
| 124: Airports | |
| 132: Dump sites | |
| 133: Construction sites | |
| 311: Broadleaf tree Cover | 311: Broad-leaved forest |
| 312: Coniferous Tree Cover | 312: Coniferous forest |
| 211: Cultivated Areas | 211: Non-irrigated arable land |
| 212: Permanently irrigated arable land | |
| 213: Rice fields | |
| 241: Annual crops associated with permanent crops | |
| 242: Complex cultivation patterns | |
| 243: Agriculture with significant natural vegetation | |
| 244: Agro-forestry areas | |
| 231: Herbaceous Vegetation | 231: Pastures |
| 321: Natural grasslands | |
| 411: Marshes | 411: Inland Marshes |
| 421: Salt Marshes | |
| 422: Salines | |
| 423: Intertidal Flats | |
| 322: Moors and Heathland | 322: Moors and heathland |
| 331: Natural Material Surfaces | 131: Mineral extraction sites |
| 331: Beaches, dunes, sands | |
| 332: Bare rocks | |
| 000: None | 141: Green urban areas |
| 142: Sport and leisure facilities | |
| 222: Fruit trees and berry plantations | |
| 223: Olive groves | |
| 313: Mixed Forest | |
| 324: Transitional woodland-shrub | |
| 333: Sparsely vegetated areas | |
| 334: Burnt areas | |
| 412: Peat Bogs | 412: Peat Bogs |
| 335: Permanent Snow | 335: Glaciers and perpetual snow |
| 323: Sclerophyllous Vegetation | 323: Sclerophyllous vegetation |
| 221: Vineyards | 221: Vineyards |
| 511: Water Bodies | 511: Water courses |
| 512: Water bodies | |
| 521: Coastal lagoons | |
| 522: Estuaries |
Figure 6Example of deseasonalization (Seabold & Perktold, 2010) and subsequent Logit OLS applied on a single pixel in Sweden (Coordinates: 62° 24′43.
7”N 13° 56′00.3”E): (A) red dots represent pixel values, the blue line represents a local weighted regression smoothed line based on the pixel values plus a light blue area indicating the confidence interval, the red line represents the trend after removing the seasonal signal; (B) red line and crosses represent the trend after removing the seasonal signal, the blue line visualizes the regression model based NDVI values in the logit space; (C) Trend analysis on probability values for non-irrigated arable land. In the case above the gradient value is 0.09 with the model R-square =0.88.
Harmonization scheme used to convert ODSE-LULC nomenclature to Copernicus Global Land Cover classes.
On the left side, ODSE-LULC classes are converted to Forest, Other Vegetation, Wetland, Bare, Cropland, Urban, and Water classes. Each transition from one Copernicus class to another is then categorized into a change class in the cross-table.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| 311: Broad-leaved forest |
| Forest loss | Deforestation and crop expansion | Deforestation and urbanization | Water expansion | |||
| 312: Coniferous forest | ||||||||
| 321: Natural grasslands |
| Reforestation | Other | Desertification | Crop expansion | Urbanization | ||
| 322: Moors and heathland | ||||||||
| 324: Transitional woodland-shrub | ||||||||
| 323: Sclerophyllous vegetation | ||||||||
| 411: Inland wetlands |
| Wetland degradation | Wetland degradation and desertification | Wetland degradation and crop expansion | Wetland degradation and urbanization | |||
| 421: Maritime wetlands | ||||||||
| 332: Bare rocks |
| Other | Crop expansion | Urbanization | ||||
| 333: Sparsely vegetated areas | ||||||||
| 334: Burnt areas | ||||||||
| 335: Glaciers and perpetual snow | ||||||||
| 335: Beaches, dunes, and sands | ||||||||
| 211: Non-irrigated arable land |
| Land abandonment | Land abandonment and desertification | |||||
| 212: Permanently irrigated arable land | ||||||||
| 213: Rice fields | ||||||||
| 221: Vineyards | ||||||||
| 222: Fruit trees and berry plantations | ||||||||
| 223: Olive groves | ||||||||
| 231: Pastures | ||||||||
| 111: Urban fabric |
| Other | ||||||
| 122: Road and rail networks and associated land | ||||||||
| 123: Port areas | ||||||||
| 124: Airports | ||||||||
| 131: Mineral extraction sites | ||||||||
| 132: Dump sites | ||||||||
| 133: Construction sites | ||||||||
| 141: Green urban areas | ||||||||
| 511: Water courses |
| Water reduction | ||||||
| 512: Water bodies | ||||||||
| 523: Sea and ocean | ||||||||
| 522: Estuaries | ||||||||
| 521: Coastal lagoons | ||||||||
Weighted F1-score of other land cover products when validated with the ODSE-LULC training dataset.
| Land cover product | Validation year | Data source | Samples | Weighted F1-Score | Number of classes | Res. (m) |
|---|---|---|---|---|---|---|
| S2GLC | 2016 | LUCAS | 756 | 0.724 | 8 | 10 |
|
| 2016 | LUCAS | 719 | 0.719 | 10 | 30 |
| GLCFCS30–2015 | 2016 | LUCAS | 724 | 0.677 | 10 | 30 |
|
| 2015 | LUCAS | 144,027 | 0.657 | 11 | 30 |
| S2GLC | 2018 | LUCAS | 295,152 | 0.653 | 11 | 10 |
| S2GLC | 2018 | CLC | 1,000,063 | 0.604 | 12 | 10 |
| ELC10 | 2018 | LUCAS | 42,629 | 0.596 | 8 | 10 |
| GLCFCS30–2015 | 2015 | LUCAS | 138,342 | 0.503 | 12 | 30 |
| ELC10 | 2018 | CLC | 172,382 | 0.456 | 8 | 10 |
| GLCFCS30–2020 | 2018 | LUCAS | 308,838 | 0.424 | 12 | 30 |
| GLCFCS30–2020 | 2018 | CLC | 1,026,914 | 0.420 | 12 | 30 |
Figure 7Standardized importance of the top-40 most important variables to the random forest and gradient boosted tree models.
The colored bar indicates the highest importance of the variable among the two models. This model is indicated to the right of each bar. The corresponding grey bar indicates the importance to the other model. The color of each bar indicates the data type. Each variable name is prefixed with either LCV (either part of a Landsat band or a landsat-derived spectral index), HYD (Hydrological data), CLM (climatic data), or DTM (digital terrain model). This prefix is followed by the specific data source, e.g., [color or index]_landsat indicates a Landsat band or derived spectral index. The last part of each name indicates the timespan over which the data was aggregated.
Producer’s and user’s accuracy, Weighted F1-score, and Log loss of the ensemble predictions during spatial cross-validation.
| Corine level | Number of classes | Prod acc. | User acc. | Weighted F1 | Log loss | Baseline log loss | Log loss ratio |
|---|---|---|---|---|---|---|---|
| 1 | 5 | 0.835 | 0.835 | 0.834 | 0.456 | 2.018 | 0.774 |
| 2 | 14 | 0.636 | 0.639 | 0.509 | 1.033 | 3.596 | 0.713 |
| 3 | 43 | 0.494 | 0.502 | 0.491 | 1.544 | 5.142 | 0.700 |
Classification report for 43 CLC level 3 classes, based on the predictions made with 5-fold spatial cross-validation.
| CLC code (level 3) | Producer Acc. | User Acc. | F1-score | Support | Log loss | Baseline log loss | Log loss ratio |
|---|---|---|---|---|---|---|---|
| 111: Continuous urban fabric | 0.523 | 0.166 | 0.252 | 51,989 | 0.0230 | 0.0388 | 0.409 |
| 112: Discontinuous urban fabric | 0.509 | 0.572 | 0.539 | 92,151 | 0.0256 | 0.0623 | 0.590 |
| 121: Industrial or commercial units | 0.496 | 0.623 | 0.552 | 129,661 | 0.0382 | 0.0821 | 0.535 |
| 122: Road and rail networks and associated land | 0.294 | 0.068 | 0.111 | 39,832 | 0.0244 | 0.0311 | 0.213 |
| 123: Port areas | 0.543 | 0.321 | 0.403 | 3,994 | 0.0018 | 0.0042 | 0.578 |
| 124: Airports | 0.300 | 0.023 | 0.043 | 6,702 | 0.0049 | 0.0067 | 0.265 |
| 131: Mineral extraction sites | 0.482 | 0.307 | 0.375 | 53,447 | 0.0264 | 0.0397 | 0.335 |
| 132: Dump sites | 0.375 | 0.013 | 0.026 | 6,509 | 0.0048 | 0.0065 | 0.267 |
| 133: Construction sites | 0.217 | 0.038 | 0.065 | 6,728 | 0.0047 | 0.0067 | 0.299 |
| 141: Green urban areas | 0.312 | 0.125 | 0.179 | 15,717 | 0.0091 | 0.0141 | 0.350 |
| 142: Sport and leisure facilities | 0.407 | 0.200 | 0.268 | 64,308 | 0.0326 | 0.0463 | 0.297 |
| 211: Non-irrigated arable land | 0.604 | 0.733 | 0.662 | 998,381 | 0.1892 | 0.3735 | 0.493 |
| 212: Permanently irrigated arable land | 0.447 | 0.146 | 0.221 | 29,786 | 0.0139 | 0.0243 | 0.428 |
| 213: Rice fields | 0.762 | 0.496 | 0.601 | 4,839 | 0.0020 | 0.0050 | 0.596 |
| 221: Vineyards | 0.506 | 0.308 | 0.383 | 66,213 | 0.0287 | 0.0474 | 0.394 |
| 222: Fruit trees and berry plantations | 0.411 | 0.131 | 0.199 | 63,659 | 0.0344 | 0.0459 | 0.251 |
| 223: Olive groves | 0.432 | 0.355 | 0.390 | 63,578 | 0.0244 | 0.0459 | 0.469 |
| 231: Pastures | 0.455 | 0.529 | 0.489 | 529,466 | 0.1509 | 0.2415 | 0.375 |
| 241: Annual crops associated with permanent crops | 0.269 | 0.067 | 0.107 | 16,883 | 0.0101 | 0.0150 | 0.326 |
| 242: Complex cultivation patter | 0.348 | 0.351 | 0.349 | 594,648 | 0.1942 | 0.2624 | 0.260 |
| 243: Agriculture with significant natural vegetation | 0.355 | 0.373 | 0.363 | 782,237 | 0.2558 | 0.3176 | 0.194 |
| 244: Agro-forestry areas | 0.276 | 0.052 | 0.087 | 10,497 | 0.0060 | 0.0099 | 0.396 |
| 311: Broad-leaved forest | 0.537 | 0.660 | 0.592 | 855,499 | 0.1971 | 0.3373 | 0.416 |
| 312: Coniferous forest | 0.596 | 0.646 | 0.620 | 759,215 | 0.1644 | 0.3112 | 0.472 |
| 313: Mixed forest | 0.461 | 0.377 | 0.414 | 612,430 | 0.1707 | 0.2680 | 0.363 |
| 321: Natural grasslands | 0.406 | 0.314 | 0.354 | 400,875 | 0.1431 | 0.1971 | 0.274 |
| 322: Moors and heathland | 0.493 | 0.350 | 0.409 | 301,693 | 0.1100 | 0.1591 | 0.309 |
| 323: Sclerophyllous vegetation | 0.311 | 0.372 | 0.339 | 143,521 | 0.0532 | 0.0890 | 0.403 |
| 324: Transitional woodland-shrub | 0.472 | 0.431 | 0.450 | 724,404 | 0.2117 | 0.3013 | 0.297 |
| 331: Beaches, dunes, sand | 0.551 | 0.207 | 0.301 | 25,688 | 0.0147 | 0.0214 | 0.312 |
| 332: Bare rocks | 0.664 | 0.495 | 0.567 | 58,234 | 0.0162 | 0.0427 | 0.621 |
| 333: Sparsely vegetated areas | 0.522 | 0.471 | 0.495 | 152,571 | 0.0457 | 0.0935 | 0.511 |
| 334: Burnt areas | 0.224 | 0.006 | 0.011 | 2,263 | 0.0021 | 0.0026 | 0.177 |
| 335: Glaciers and perpetual snow | 0.852 | 0.818 | 0.834 | 7,250 | 0.0008 | 0.0072 | 0.883 |
| 411: Inland marshes | 0.425 | 0.228 | 0.297 | 39,784 | 0.0192 | 0.0310 | 0.382 |
| 412: Peat bogs | 0.684 | 0.731 | 0.707 | 174,314 | 0.0333 | 0.1039 | 0.680 |
| 421: Salt marshes | 0.505 | 0.441 | 0.471 | 5,598 | 0.0023 | 0.0057 | 0.600 |
| 422: Salines | 0.481 | 0.081 | 0.139 | 320 | 0.0002 | 0.0004 | 0.577 |
| 423: Intertidal flats | 0.497 | 0.209 | 0.295 | 788 | 0.0004 | 0.0010 | 0.570 |
| 511: Water courses | 0.360 | 0.108 | 0.166 | 11,214 | 0.0068 | 0.0105 | 0.353 |
| 512: Water bodies | 0.895 | 0.956 | 0.924 | 187,981 | 0.0108 | 0.1103 | 0.902 |
| 521: Coastal lagoons | 0.594 | 0.429 | 0.498 | 1,904 | 0.0006 | 0.0022 | 0.708 |
| 522: Estuaries | 0.382 | 0.082 | 0.135 | 353 | 0.0002 | 0.0005 | 0.566 |
| Macro average | 0.460 | 0.327 | 0.356 | 8097140 | 0.083 | 0.137 | 0.452 |
| Weighted average | 0.494 | 0.502 | 0.491 | 0.157 | 0.253 | 0.389 | |
| Accuracy | 0.502 | ||||||
| Kappa score | 0.459 | ||||||
| Log Loss (baseline) | 1.544 (5.142) | ||||||
Classification report for 14 CLC level 2 classes, based on the predictions made with 5-fold spatial cross-validation.
| CLC code (level 2) | Producer Acc. | User Acc. | f1-score | Support | Log loss | Baseline log loss | Log loss ratio |
|---|---|---|---|---|---|---|---|
| 11: Urban Fabric | 0.643 | 0.535 | 0.584 | 144,140 | 0.039 | 0.089 | 0.564 |
| 12: Industrial, commercial and transport units | 0.568 | 0.551 | 0.559 | 180,189 | 0.057 | 0.107 | 0.469 |
| 13: Mine, dump and construction sites | 0.533 | 0.283 | 0.370 | 66,684 | 0.032 | 0.048 | 0.331 |
| 14: Artificial, non-agricultural vegetated areas | 0.479 | 0.227 | 0.308 | 80,025 | 0.038 | 0.055 | 0.315 |
| 21: Arable land | 0.622 | 0.738 | 0.675 | 1,033,006 | 0.191 | 0.382 | 0.500 |
| 22: Permanent crops | 0.558 | 0.326 | 0.412 | 193,450 | 0.072 | 0.113 | 0.363 |
| 23: Pastures | 0.455 | 0.529 | 0.489 | 529,466 | 0.151 | 0.242 | 0.375 |
| 24: Heterogeneous agricultural areas | 0.488 | 0.496 | 0.492 | 1,404,265 | 0.364 | 0.461 | 0.212 |
| 31: Forests and seminatural areas | 0.788 | 0.840 | 0.813 | 2,227,144 | 0.302 | 0.588 | 0.487 |
| 32: Shrub and/or herbaceous vegetation associations | 0.592 | 0.511 | 0.548 | 1,570,493 | 0.384 | 0.492 | 0.218 |
| 33: Open spaces with little or no vegetation | 0.736 | 0.591 | 0.656 | 246,006 | 0.061 | 0.136 | 0.555 |
| 41: Inland wetlands | 0.719 | 0.697 | 0.708 | 214,098 | 0.044 | 0.122 | 0.643 |
| 42: Coastal wetlands | 0.591 | 0.465 | 0.520 | 6,706 | 0.003 | 0.007 | 0.618 |
| 51: Inland waters | 0.913 | 0.936 | 0.924 | 199,195 | 0.013 | 0.115 | 0.884 |
| 52: Marine waters | 0.614 | 0.392 | 0.479 | 2,273 | 0.001 | 0.003 | 0.699 |
| Macro average | 0.620 | 0.541 | 0.569 | 8,097,140 | 0.117 | 0.197 | 0.482 |
| Weighted average | 0.636 | 0.639 | 0.634 | 0.262 | 0.420 | 0.393 | |
| Accuracy | 0.639 | ||||||
| Kappa score | 0.565 | ||||||
| Log Loss (baseline) | 1.033 (3.596) | ||||||
Classification report for 5 CLC level 1 classes, based on the predictions made with 5-fold spatial cross-validation.
| CLC code (level 1) | Producer Acc. | User Acc. | F1-score | Support | Log loss | Baseline log loss | Log loss ratio |
|---|---|---|---|---|---|---|---|
| 1: Artificial surfaces | 0.784 | 0.613 | 0.688 | 471,038 | 0.123 | 0.222 | 0.445 |
| 2: Agricultural areas | 0.798 | 0.854 | 0.825 | 3,160,187 | 0.457 | 0.669 | 0.317 |
| 3: Forest and seminatural areas | 0.872 | 0.848 | 0.860 | 4,043,643 | 0.526 | 0.693 | 0.241 |
| 4: Wetlands | 0.722 | 0.696 | 0.708 | 220,804 | 0.045 | 0.125 | 0.639 |
| 5: Water bodies | 0.917 | 0.936 | 0.926 | 201,468 | 0.013 | 0.116 | 0.884 |
| Macro average | 0.819 | 0.789 | 0.802 | 8,097,140 | 0.233 | 0.365 | 0.505 |
| Weighted average | 0.835 | 0.835 | 0.834 | 0.450 | 0.626 | 0.309 | |
| Accuracy | 0.835 | ||||||
| Kappa score | 0.720 | ||||||
| Log Loss (baseline) | 0.456 (2.018) | ||||||
Figure 8Comparison of number of samples and cross-validation performance.
Both metrics are visualized for each tile in the 30 km tiling system used for spatial cross-validation. Left: Number of samples per tile. Right: Weighted F1-score per tile.
Figure 9Hexbin plot of the weighted F1-score and number of overlapping points per tile.
The Pearson correlation coefficient of 0.125 (p: 0.000) indicates there is a weak positive correlation between the number of points in a tile and the cross-validation weighted F1-score.
Cross-validation performance of our ensemble model per year.
| Year | Weighted F1-score | Support |
|---|---|---|
| 2000 | 0.497 | 1,658,715 |
| 2006 | 0.491 | 1,852,645 |
| 2009 | 0.558 | 225,416 |
| 2012 | 0.487 | 1,971,812 |
| 2015 | 0.588 | 265,830 |
| 2016 | 0.632 | 65,235 |
| 2018 | 0.481 | 2,057,306 |
| 2019 | 0.535 | 180 |
| Average | 0.489 | 1,012,142 |
| Standard deviation | 0.135 | 882,783 |
Conservative classification report of our 2017 LULC prediction on 49,897 S2GLC points that counts 3484 points with predicted classes without an equivalent S2GLC class as errors (141: Green urban areas, 142: Sport and leisure facilities, 222: Fruit trees and berry plantations, 223: Olive groves, 313: Mixed forest, 324: Transitional woodland-shrub, 333: Sparsely vegetated areas, and 334: Burnt areas).
| S2GLC class | Producer Acc. | User Acc. | F1-score | Support |
|---|---|---|---|---|
| 111: Artificial surfaces | 0.933 | 0.933 | 0.933 | 1,826 |
| 211: Cultivated areas | 0.849 | 0.965 | 0.903 | 13,470 |
| 221: Vineyards | 0.826 | 0.694 | 0.754 | 500 |
| 231: Herbaceous vegetation | 0.861 | 0.686 | 0.764 | 6,776 |
| 311: Broadleaf tree cover | 0.967 | 0.814 | 0.884 | 10,944 |
| 312: Coniferous tree cover | 0.975 | 0.914 | 0.943 | 8,626 |
| 322: Moors and heathland | 0.641 | 0.491 | 0.556 | 2,070 |
| 323: Sclerophyllous vegetation | 0.780 | 0.265 | 0.396 | 815 |
| 331: Natural material surfaces | 0.915 | 0.751 | 0.825 | 2,110 |
| 335: Permanent snow cover | 0.624 | 0.800 | 0.701 | 85 |
| 411: Marshes | 0.331 | 0.327 | 0.329 | 324 |
| 412: Peatbogs | 0.629 | 0.482 | 0.546 | 745 |
| 511: Water bodies | 0.992 | 0.974 | 0.983 | 1,606 |
| Macro average | 0.737 | 0.650 | 0.680 | 49,897 |
| Weighted average | 0.892 | 0.830 | 0.854 | |
| Accuracy | 0.830 | |||
| Kappa score | 0.794 |
Optimistic classification report of our 2017 LULC prediction on 49,897 S2GLC points where all 3484 points with predicted classes without an equivalent S2GLC class were removed before calculating accuracy metrics (141: Green urban areas, 142: Sport and leisure facilities, 222: Fruit trees and berry plantations, 223: Olive groves, 313: Mixed forest, 324: Transitional woodland-shrub, 333: Sparsely vegetated areas, and 334: Burnt areas).
| S2GLC class | Producer Acc. | User Acc. | F1-score | Support |
|---|---|---|---|---|
| 111: Artificial surfaces | 0.933 | 0.935 | 0.934 | 1,823 |
| 211: Cultivated areas | 0.849 | 0.967 | 0.905 | 13,429 |
| 221: Vineyards | 0.826 | 0.720 | 0.769 | 482 |
| 231: Herbaceous vegetation | 0.861 | 0.722 | 0.785 | 6,441 |
| 311: Broadleaf tree cover | 0.967 | 0.937 | 0.952 | 9,512 |
| 312: Coniferous tree cover | 0.975 | 0.973 | 0.974 | 8,098 |
| 322: Moors and heathland | 0.641 | 0.672 | 0.656 | 1,511 |
| 323: Sclerophyllous vegetation | 0.780 | 0.378 | 0.509 | 571 |
| 331: Natural material surfaces | 0.915 | 0.866 | 0.889 | 1,831 |
| 335: Permanent snow cover | 0.624 | 0.819 | 0.708 | 83 |
| 411: Marshes | 0.331 | 0.351 | 0.341 | 302 |
| 412: Peatbogs | 0.629 | 0.494 | 0.554 | 726 |
| 511: Water bodies | 0.992 | 0.975 | 0.984 | 1,604 |
| Macro average | 0.794 | 0.755 | 0.766 | 46,413 |
| Weighted average | 0.893 | 0.892 | 0.889 | |
| Accuracy | 0.892 | |||
| Kappa score | 0.867 |
Figure 10Normalized confusion matrix of our prediction on the independently collected S2GLC validation points.
Each cell shows the percentage of the true label predicted as the predicted label.
Figure 11Normalized confusion matrix of the predictions made by our model during spatial cross-validation on our own dataset, reclassed to the S2GLC nomenclature.
Each cell shows the percentage of the true label predicted as the predicted label.
Weighted F1-scores obtained by validating spatial and spatiotemporal models on data from known years and an unknown year (2018).
Trained on CLC points, LUCAS points, and a combination of both.
| Model | Training year | Points | Trained on CLC | Trained on LUCAS | Trained on CLC and LUCAS | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tested on raining year (s) | Tested on 2018 | Tested on training year (s) | Tested on 2018 | Tested on training year (s) | Tested on 2018 | ||||||
| Spatial | 2000 | 100,000 | 0.610 | 0.542 | 0.611 | 0.515 | |||||
| Spatial | 2006 | 100,000 | 0.595 | 0.437 | 0.604 | 0.563 | 0.587 | 0.534 | |||
| Spatial | 2009 | 100,000 | 0.595 | 0.482 | 0.602 | 0.415 | |||||
| Spatial | 2012 | 100,000 | 0.559 | 0.476 | 0.611 | 0.574 | 0.565 | 0.529 | |||
| Spatial | Average | 400,000 | 0.583 | 0.465 | 0.608 | 0.560 | 0.591 | 0.498 | |||
| Spatiotemporal | All | 100,000 | 0.612 | 0.576 | 0.568 | 0.478 | 0.574 | 0.532 | |||
| Spatiotemporal | All | 400,000 | 0.625 | 0.579 | 0.608 | 0.491 | 0.595 | 0.543 | |||
Figure 12Grouped bar plot of the F1-scores CLC class, plotted separately per model of the ensemble.
Meta-learner performance is indicated in red on the background of each bar. If the random forest (blue), gradient boosted trees (orange) or neural network (green) outperformed the meta-learner, its bar will exceed the bigger meta-learner bar, indicating that the meta-learner did not learn to incorporate the model’s higher performance into its final prediction.
Figure 13NDVI trend slope values of LUCAS points with selected LULC change dynamics, categorized according to the Copernicus change classes. The mean NDVI trend value is indicated with green triangles.
Figure 14Detail plot of NDVI and LULC trends between 2000–2020 for 2 LUCAS points. NDVI trend is compared to forest increase (top) and urbanization (bottom).
Left (A and E): A graph comparing the two trends, with green depicting de-seasonalized NDVI data and its trend, as calculated by logit OLS regression. Red depicts the annual probability values and associated trend of the compared LULC change classes (“312: Coniferous forest” and “111: Continuous urban fabric”, respectively). The maps, from left to right, depict the spatial context of the two points in (B/F) high-resolution satellite RGB, (C/G) slope of Landsat ARD NDVI trends, and (D/H) slope of LULC change class trends as predicted by our ensemble. The “in-situ” observations of both points match the dynamic presented in the graph: Point 28681762 (top) experienced forest increase, while point 39143028 (bottom) is located in a recently constructed urban area.
Figure 15Dominant LULC classes, predicted probability and model variance for Non-irrigated arable land, Coniferous forest and Urban Fabric, RGB Landsat temporal composite (Spring season) for the years 2000 and 2019.
Figure 16Trends in NDVI values between 2000 and 2019 compared to trends in LULC probabilities predicted by our ensemble model, as well as the derived LULC change classes between 2001 and 2018.
Figure 17Prevalent LULC change and change intensity on the British isles aggregated to 5 × 5 km tiles, for three dynamics: Urbanization (A), Wetland degradation (B), and forest increase/decrease (C).
Figure 18Example of model variance (prediction uncertainty) in he city is of La Teste-de-Buch (France) for the class “Coniferous forest”, visualized in the ODSE viewer (https://maps.opendatascience.eu/).
(A) model variance map with examples of two locations (P1 in 44°33′33.6′N 1°10′33.2′W; P2 in 44°32′11.8′N 1°02′38.0′W) with low and high variances, (B) probability values showing relatively high confidence, (C) original Landsat images RGB composite used for classification.
Figure 19NDVI signal for 880 million pixel values in our Landsat data between 2000 and 2019.
Red dots indicate the average for each season for 880 million pixels over 11 tiles. The vertical line indicates the launch of Landsat 8, after which the acquisition scheme changed. This sample suggests that the structural difference between the two acquisition schemes in the Landsat ARD product created by Potapov et al. (2020) were not propagated into our aggregated and harmonized dataset.