Literature DB >> 28953943

Mapping land cover change over continental Africa using Landsat and Google Earth Engine cloud computing.

Alemayehu Midekisa¹, Felix Holl¹, David J Savory¹, Ricardo Andrade-Pacheco¹, Peter W Gething², Adam Bennett¹, Hugh J W Sturrock¹.

Abstract

Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28953943 PMCID： PMC5617164 DOI： 10.1371/journal.pone.0184926

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Quantifying and monitoring the spatial and temporal dynamics of the global land use land cover (LULC) is essential for better understanding many of the Earth’s land surface processes. Human-induced land cover changes may affect land surface processes including urbanization, drought, and flood which impact the world’s population [1, 2]. Understanding these changes can allow quantifying and monitoring trends in agriculture [3], fresh water resources [4], forest cover [5-7], and disease transmission [8, 9]. Lack of high spatial resolution (30 m) and regularly updated LULC data limit our ability to quantify the spatial extent and monitor the temporal dynamics of these changes. Earlier attempts to generate continental-scale LULC products were limited to coarse spatial scale (250m-1km) which lacked sufficient spatial details [10-13]. There is a need for a high spatial resolution (30 m) and regularly updated LULC data to improve our understanding of changes in the land surface processes. However, there are challenges in generating global to continental scale LULC maps from high spatial resolution Landsat observations that span over a long period of time. These challenges have included the lack of freely available earth observation data and high-performance computing to process and analyze these data. Starting from 2008, the United States Geological Survey (USGS) has been distributing Landsat data at no cost for public use. This created an opportunity for scientists and practitioners to use Landsat satellite data to map and monitor LULC at continental to global scale and over a long time series [5, 7, 14]. Since the USGS released the Landsat data archive for public use, there has been an increasing trend in efforts to map land use land cover using these data [15-18]. However, most of these efforts have been limited to local scale analyses due to the computational requirements of analyzing large volumes of data. For example, nearly ~10,000 Landsat scenes or ~3 terabytes are required to produce a global land cover map for any given time point [19]. In recent years, there has been an increase in the availability of high performance cloud computing. For example, the NASA Earth Exchange (NEX) platform allows the processing and analysis of NASA earth observation data [20]. Amazon Web Service (AWS) also now provides access to the Landsat data archive, enabling analysis of this dataset on the cloud. Google Earth Engine (GEE) is a new high-performance computing platform which gives access to a vast and growing amount of earth observation data as well as processing power to analyze these data at planetary scale. The launch of these high-performance cloud computing platforms has opened the door to global-scale geospatial data storage, processing and analyzing at a low cost and efficient manner in the cloud [7, 19, 21]. One of the first global scale applications of the Landsat data archive was a study by Hansen et al, which used Landsat satellite data and machine learning to map global forest cover over the 2000–2012 period [7]. Although the focus of their study was to quantify global scale changes in forest cover, theirs is the only recent effort to apply high spatial resolution (30 m) Landsat satellite observation data for mapping global scale land cover over a long period of time (i.e. from 2000–2012). In our study, we present the use of Landsat satellite observation data and GEE cloud platform to map land cover and impervious surface changes over continental Africa for 2000–2015.

Methods

2.1 Earth observation data

In this study, Landsat 7 Enhanced Thematic Mapper Plus (ETM +) surface reflectance data which was computed using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) were used [22]. A cloud screening algorithm was applied using quality assessment (QA) bands in order to remove snow and cloud contaminated pixels for each Landsat image between 1999 and 2016. Annual composites were then produced by taking the median value from images from the target year, plus or minus one year. For example, for the year 2000, pixel values were obtained by calculating the median of all cloud-free pixels from images between January 1st, 1999 and December 31st, 2001. A three-year window was used to ensure that at least one cloud-free pixel was available for each annual composite. Similarly, in addition to producing annual raw image composites, annual normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) were computed by taking the median value from 3-year windows. Additionally, annual inter-calibrated night-time light images were used from the National Oceanic and Atmospheric Administration (NOAA) [23-25].

2.2. Training data

As used by a number of other studies [8, 26–28], training data were derived from visual inspection of freely available high spatial resolution imagery. Recently captured (2015–2016) high-resolution satellite imagery were visually inspected and used to identify Landsat pixels that were entirely made up of one of 7 classes (water, impervious surface, high biomass, low biomass, rock, sand and bare soil) to act as training data. Impervious surface class included asphalt roads, concrete, metal roofs and other built infrastructure while low biomass class included crop field, grass, and shrubland. High biomass class consisted of dense forest. In an effort to ensure that these training data were representative of the classes across the continent, we ensured that training data were captured from all 49 countries in continental Africa, with the aim of capturing 1000 training points per class. For model validation purposes, 80% of sample points (5,664) from each class were randomly selected to act as training data, with 20% of sample points (1,420) withheld as a validation dataset (Fig 1).

Fig 1

A conceptual model for the land use land cover classification model.

2.3 Modeling

For each of the training and validation data points, Landsat spectral bands, NDVI, NDWI and night-time light from the 2015 layers were extracted to be used as covariates in the model (Fig 1). To model and predict the 7 LULC classes, a random forest decision tree classification algorithm was deployed on GEE. Decision tree classification algorithms have been used widely to classify satellite data for forest cover mapping [28, 29], wetland mapping [8, 26, 30, 31], cropland mapping [32], and land cover mapping [33-36]. A random forest is a nonparametric machine learning method comprised of ensembles of decision trees [37]. The random forest algorithm creates multiple decision trees which classify a random subset of the training data according to the covariate predictors. In our study, we used 500 trees in the random forest classification. The final classification was based on the majority vote from all the trees. To generate final LULC layers across Africa 2000–2015, training and validation data were combined into a single dataset before the model was run, with annual predictions made from this model using the annual covariate layers. Having made annual predictions, the changes in the area represented by each class from 2000 to 2015 across continental Africa were calculated. This also allowed for an exploration of the direction of change for each LULC class from 2000 to 2015. As well as modeling 7 classes, estimating the likelihood of a given pixel being impervious was a focus of this study. To do this, the 6 non-impervious surface classes in the training and validation data were collapsed into a single class, resulting in a binary outcome of impervious and non-impervious. A random forest model was then applied to the binary outcome. This model was used to predict the probability (i.e. the proportion of times the model voted) that a pixel was impervious. As for predictions of the 7 LULC classes, final predictions across Africa 2000–2015 were made by combining training and validation data before running the model. All analyses were performed on GEE cloud platform.

2.4 Model validation

Using the validation data, a confusion matrix was generated to evaluate predictive accuracy across classes as well as overall accuracy. This allowed an assessment of user’s accuracy (the number of correctly classified pixels divided by the total number of pixels predicted within that class) and producer’s accuracy (the number of correctly classified pixels divided by the total number of pixels truly in that class) for each class. For the model focused on impervious surface, an assessment of predictive accuracy was made by calculating both a confusion matrix as well as area under the curve (AUC) statistic. The AUC statistic represents the probability that a randomly selected truly impervious pixel will be ranked higher by the model than a truly non-impervious pixel. As such, it provides a measurement of the discriminatory power of the model. In addition to validating the model using out of sample predictions, we compared our predictions to maps of percent tree cover generated by Hansen [7]. This dataset was chosen as it was conducted at a comparable spatial (30 m) and temporal (annual) resolution. As the Hansen study was focused on percent tree cover, rather than LULC, this comparison focused on forest cover only. To compare results, we first classified the Hansen product for 2000 in to high biomass (greater than 30% tree cover) and non-high biomass (less than 30% tree cover) using a threshold of 30% tree cover. This threshold was chosen because it is the closest to our high biomass class. We then compared our high biomass class predictions for 2000 to the high biomass class derived from the Hansen data. Furthermore, we compared our 2010 prediction maps with the 2010 Global Land Cover product (GlobeLand30) at 30 meter spatial resolution which was generated using Landsat data [38]. The GlobeLand30 product consists of 10 classes including water bodies, wetland, built-up, cropland, snow, forest, shrubland, grassland, bareland, and tundra. We chose to compare our 2010 land cover product with the 2010 GlobeLand30 product since theirs is available only for years 2000 and 2010.

Results

3.1 Land use land cover classification

A total of 7,084 data points were captured across the 7 classes, resulting in 5,664 training data points and 1,420 validation data points. Table 1 shows the confusion matrix of observed versus predicted values using the withheld validation data. The model achieved an overall accuracy of 88 percent (Table 1). Class specific user’s and producer’s accuracies ranged from 84–94% and 79–96% respectively. Water appeared to be most accurately predicted, with user’s and producer’s accuracy of 94 and 96% respectively, while low biomass was least accurately predicted with user’s and producer’s accuracies of 84 and 79% respectively.

Table 1

Accuracy assessment based on comparison of model predictions (left) against observed validation data (top) for 2015.

	Impervious Surface	Low Biomass	High Biomass	Bare Soil	Sand	Rock	Water	Total	User’s Accuracy (%)
Impervious Surface	186	1	0	19	3	1	0	210	89
Low Biomass	2	198	14	11	2	7	3	237	84
High Biomass	0	31	209	0	0	1	0	241	87
Bare Soil	10	12	2	172	2	3	1	202	85
Sand	3	2	0	2	149	10	2	168	89
Rock	2	4	1	0	4	131	2	144	91
Water	0	2	1	2	6	2	205	218	94
Total	203	250	227	206	166	155	213	1420
Producer’s Accuracy (%)	92	79	92	83	90	85	96

Annual LULC predicted maps that consist of seven classes (impervious surface, low biomass, high biomass, bare soil, sand, rock, and water) for the 2000–2015 period were generated based on the random forest classification model. Comparison of high biomass class with the Hansen forest map for the 2000 period showed good agreement. Almost 80% of the total high biomass predicted by our model (4,174,958 Km2 out of 5,156,559 Km2) matched with the Hansen high biomass class (percent of tree cover greater than 30%) as shown in Fig 2. Generally, there was a good agreement between products in the Congo Basin and western Africa whereas there were mismatches in the East African highlands and Nile Delta regions, with our model predicting more high biomass in these regions. For the 3 sites selected for comparison purposes (Fig 3), a total of 1,465 Km2 were classified by the GlobeLand30 as built-up while our model predicted a total 926 Km2 as impervious surface for the 2010 period. Additionally, a total of 351,036 Km2 were classified as forest by the GlobeLand30 whereas 281,248 Km2 were classified as high biomass for the 2010 period by our model. Fig 3 shows visual comparison of our land cover prediction maps with the GlobeLand30 product over three sites including the Lake Victoria, Congo Basin and Nile Basin regions. Fig 4 shows the prediction of the LULC product for the years 2000 and 2015. As shown in Fig 4, there was a reduction in high biomass over the 15 years period throughout continental Africa, with high biomass areas becoming more concentrated around the equatorial belt. Additionally, Fig 5 shows regional subsets of 2015 land use land cover map including the Rift Valley Lakes, Congo Basin, Nile Delta and Lake Victoria.

Fig 2

Comparison of predicted high biomass class and Hansen forest cover for 2000.

Fig 3

Predicted land cover maps from this study (left) and GlobeLand30 products (right) for the year 2010.

(A) Lake Victoria; (B) Congo Basin; (C) Nile Basin.

Fig 4

Model predicted land use land cover maps for 2000 and 2015 over Africa.

Fig 5

Regional subsets of 2015 land use land cover maps.

(A). Rift Valley Lakes; (B) Congo Basin; (C) Nile Delta; (D) Lake Victoria.

Predicted land cover maps from this study (left) and GlobeLand30 products (right) for the year 2010.

(A) Lake Victoria; (B) Congo Basin; (C) Nile Basin.

Regional subsets of 2015 land use land cover maps.

(A). Rift Valley Lakes; (B) Congo Basin; (C) Nile Delta; (D) Lake Victoria. With regards to the second model focused on impervious surface, an assessment using the validation data indicated that the model showed very high predictive capacity, with an AUC value of 0.98. Fig 6 shows regional subsets of 2015 probability of being impervious including the Rift Valley Lakes, Congo Basin, Nile Delta and Lake Victoria areas.

Fig 6

Regional subsets of changes in the probability of impervious surface between 2000–2015.

(A). Addis Ababa, Ethiopia; (B) Cairo, Egypt; (C) Lagos, Nigeria.

Regional subsets of changes in the probability of impervious surface between 2000–2015.

(A). Addis Ababa, Ethiopia; (B) Cairo, Egypt; (C) Lagos, Nigeria. Change analysis showed that impervious surface class had the highest relative increase from 2000 to 2015 (38.54%) while high biomass and rock had the highest decreases from 2000 to 2015 with a decrease of around 17% (Fig 7 and Table 2).There was relatively little change in the areas covered by water or sand. An exploration of annual change in the probability of being impervious suggests that changes over the study period were in both directions as shown in four selected regions between 2000 and 2015 (Fig 6). High biomass and rock showed a declining trend whereas impervious surfaces, low biomass, and bare soil showed increasing trend.

Fig 7

Forest cover change between 2000 and 2015.

Table 2

Land use land cover change between 2000 and 2015.

LULC Class	2000 (Km²)	2015 (Km²)	Total Change (Km²)	Total Change (%)
Impervious Surface	39,436	54,634	15,199	38.5
Low Biomass	9,060,578	10,181,331	1,120,752	12.4
High Biomass	5,156,559	4,261,541	-895,018	-17.4
Bare Soil	681,620	797,121	115,501	16.9
Sand	11,037,403	11,603,370	565,967	5.1
Rock	5,417,149	4,496,410	-920,739	-17.0
Water	306,802	305,140	-1,662	-0.5

Growth in the impervious surface was mainly due to conversion from low biomass class to impervious surface. As shown in Figs 8 and 9, sand and water classes were the most stable and did not show substantial change. This study also showed that the decrease in high biomass class was predominantly due to high biomass becoming low biomass (Fig 9).

Fig 8

Percentage change in land use land cover classes over Africa 2000–2015.

Fig 9

Change of each land use land cover classes from 2000 to 2015.

Discussion

Regularly updated, high resolution, and continental-scale land cover data are essential for quantifying and monitoring the Earth’s dynamic land surface processes. Despite the free availability of high spatial resolution Landsat satellite data, generation of continental to global scale LULC maps have been challenging due to the need for high-performance computing to store, process and analyze such a large volume of satellite data. As such, earlier efforts to produce land cover products at this scale have been limited to coarse spatial resolution. Platforms such as GEE have opened the door to planetary scale analyses and offer the opportunity to provide a mechanism to continually monitor the Earth’s surface at high spatial and temporal resolution. The utility of GEE to quantify various land surface changes has been demonstrated for forest mapping [7], population mapping [39], cropland mapping [40], and surface water mapping [21]. Here we presented an approach to quantify continental land cover change over a long period of time (15 years) using GEE and Landsat satellite observations. In the present study, we produced annual LULC maps for continental Africa between 2000 and 2015 showing that the continent has dramatically changed during the study period. Our results indicated area covered by high biomass class showed a declining trend during the study period. These findings were in concordance with Hansen et al who showed a dramatic forest loss between 2000 and 2012 period in subtropical Africa [7]. In an effort to compare our product with existing data, we made comparisons with the Hansen data. While not the same products, a comparative assessment of our high biomass class and the high biomass class generated from the Hansen product (greater than 30% tree cover) for the 2000 period showed 80% agreement. Overall, our product showed very good agreement with the Hansen data. Additionally, comparison of our product with the GlobeLand30 showed good agreement. However, some of our land cover classes and classes from the GlobeLand30 product were different. For example, our low biomass class was represented in three separate classes including wetland, cropland, and grassland in the Globeland30 data. As a result, we focused our comparison with our high biomass class and their forest class as well as our impervious surface and their built-up class. Furthermore, there also appeared to be a large relative increase in man-made impervious surfaces during the 2000–2015 periods. These changes in impervious surface were mostly the conversion of low biomass areas to impervious surfaces. This increase has occurred steadily throughout the period of study, which appears in line with increasing urban growth in Africa [41]. Multi-temporal satellite observations and cloud-based analytic platforms such as GEE could enable cost-effective production of LULC products at low cost and efficient manner. The main advantages of using GEE include reduced processing times and enhanced capacity to perform global scale analysis on high resolution data. There are, however, challenges to using GEE. Some of these challenges include mastery of JavaScript and Python programming languages, limited number of GEE built-in functions, and lack of integration for the current GEE platform with other open source geospatial analytic tools such as R, and QGIS. Although GEE archives a large library of earth observation and geospatial data, data is not available in real-time which may limit its utility to some operational applications that need real-time data access. While this study suggests that the approach taken here can be used to produce continental LULC products, it should be pointed out that this study has a number of limitations. Firstly, training and validation data were only available for 2015. Having more data points throughout the time period would likely improve the accuracy of the annual maps. This would also allow a better assessment of predictive performance through time. That said, as the spectral signature of these 7 classes is unlikely to have changed through the time period, predictions are likely to be robust. Secondly, the study relied on visual inspection of high-resolution imagery to produce training data; ground truth data from the field were not available. This may have resulted in some misclassification. Additionally, relying on a single image to represent a 1 year period ignores the fact that some training data points may move between classes seasonally (i.e. bare soil in the dry season and low biomass in the wet season). Thirdly, we restricted our analysis to 7 LULC classes which may limit the utility of the current LULC product to some applications. With accurate training data on a wider variety of LULC classes, it may be possible to use the approach described here to produce maps of more than 7 classes used here. Although this study encountered the aforementioned limitations, the standardized approaches demonstrated here and model validation results indicated that the LULC maps presented in this research had high prediction accuracy. The approach used here to overcome the computational challenges of handling big earth observation data can help scientists and practitioners who lack high-performance computational resources. Future studies can expand on our study and apply our approach to generate global-scale land cover products. The LULC product presented in this study will be freely available (https://geodata.globalhealthapp.net/lulc/) for public use and can be used in other applications to monitor changes in disease transmission, natural resources, biodiversity, and urbanization.

6 in total

1. Modelling spatial patterns of urban growth in Africa.

Authors: Catherine Linard; Andrew J Tatem; Marius Gilbert
Journal: Appl Geogr Date: 2013-10

2. The emergence of land change science for global environmental change and sustainability.

Authors: B L Turner; Eric F Lambin; Anette Reenberg
Journal: Proc Natl Acad Sci U S A Date: 2007-12-19 Impact factor: 11.205

3. High-resolution global maps of 21st-century forest cover change.

Authors: M C Hansen; P V Potapov; R Moore; M Hancher; S A Turubanova; A Tyukavina; D Thau; S V Stehman; S J Goetz; T R Loveland; A Kommareddy; A Egorov; L Chini; C O Justice; J R G Townshend
Journal: Science Date: 2013-11-15 Impact factor: 47.728

4. Multisensor earth observations to characterize wetlands and malaria epidemiology in Ethiopia.

Authors: Alemayehu Midekisa; Gabriel B Senay; Michael C Wimberly
Journal: Water Resour Res Date: 2014-11-17 Impact factor: 5.240

5. Night-Time Light Data: A Good Proxy Measure for Economic Activity?

Authors: Charlotta Mellander; José Lobo; Kevin Stolarick; Zara Matheson
Journal: PLoS One Date: 2015-10-23 Impact factor: 3.240

6. Using remotely sensed night-time light as a proxy for poverty in Africa.

Authors: Abdisalan M Noor; Victor A Alegana; Peter W Gething; Andrew J Tatem; Robert W Snow
Journal: Popul Health Metr Date: 2008-10-21

6 in total

10 in total

1. Towards a concrete landscape: Assessing the efficiency of land consumption in the Greater Accra Region, Ghana.

Authors: Adams Osman; David Oscar Yawson; Simon Mariwah; Ishmael Yaw Dadson
Journal: PLoS One Date: 2022-06-07 Impact factor: 3.752

2. Validity of environmental audits using GigaPan^® and Google Earth Technology.

Authors: Erica Twardzik; Cathy Antonakos; Ross Baiers; Tamara Dubowitz; Philippa Clarke; Natalie Colabianchi
Journal: Int J Health Geogr Date: 2018-07-06 Impact factor: 3.918

3. Household electricity access in Africa (2000-2013): Closing information gaps with model-based geostatistics.

Authors: Ricardo Andrade-Pacheco; David J Savory; Alemayehu Midekisa; Peter W Gething; Hugh J W Sturrock; Adam Bennett
Journal: PLoS One Date: 2019-05-01 Impact factor: 3.240

4. Cloud-computing and machine learning in support of country-level land cover and ecosystem extent mapping in Liberia and Gabon.

Authors: Celio de Sousa; Lola Fatoyinbo; Christopher Neigh; Farrel Boucka; Vanessa Angoue; Trond Larsen
Journal: PLoS One Date: 2020-01-10 Impact factor: 3.240

5. Supporting elimination of lymphatic filariasis in Samoa by predicting locations of residual infection using machine learning and geostatistics.

Authors: Helen J Mayfield; Hugh Sturrock; Benjamin F Arnold; Ricardo Andrade-Pacheco; Therese Kearns; Patricia Graves; Take Naseri; Robert Thomsen; Katherine Gass; Colleen L Lau
Journal: Sci Rep Date: 2020-11-25 Impact factor: 4.379

6. Tracking forest loss and fragmentation between 1930 and 2020 in Asian elephant (Elephas maximus) range in Nepal.

Authors: Ashok Kumar Ram; Nabin Kumar Yadav; Pem Narayan Kandel; Samrat Mondol; Bivash Pandav; Lakshminarayanan Natarajan; Naresh Subedi; Dipanjan Naha; C Sudhakar Reddy; Babu Ram Lamichhane
Journal: Sci Rep Date: 2021-09-30 Impact factor: 4.379

7. Agricultural SandboxNL: A national-scale database of parcel-level processed Sentinel-1 SAR data.

Authors: Vineet Kumar; Manuel Huber; Björn Rommen; Susan C Steele-Dunne
Journal: Sci Data Date: 2022-07-13 Impact factor: 8.501

8. Predicting residential structures from open source remotely enumerated data using machine learning.

Authors: Hugh J W Sturrock; Katelyn Woolheater; Adam F Bennett; Ricardo Andrade-Pacheco; Alemayehu Midekisa
Journal: PLoS One Date: 2018-09-21 Impact factor: 3.240

9. Evaluating remote sensing datasets and machine learning algorithms for mapping plantations and successional forests in Phnom Kulen National Park of Cambodia.

Authors: Minerva Singh; Damian Evans; Jean-Baptiste Chevance; Boun Suy Tan; Nicholas Wiggins; Leaksmy Kong; Sakada Sakhoeun
Journal: PeerJ Date: 2019-10-22 Impact factor: 2.984

10. Remotely sensed thermal decay rate: an index for vegetation monitoring.

Authors: S S Kumar; L Prihodko; B M Lind; J Anchang; W Ji; C W Ross; M N Kahiu; N M Velpuri; N P Hanan
Journal: Sci Rep Date: 2020-06-17 Impact factor: 4.996

10 in total