Literature DB >> 31720462

Evaluating the conservation state of the páramo ecosystem: An object-based image analysis and CART algorithm approach for central Ecuador.

Víctor J García^1,2, Carmen O Márquez^3,4, Tom M Isenhart⁵, Marco Rodríguez³, Santiago D Crespo³, Alexis G Cifuentes³.

Abstract

Ecuadorian páramo ecosystems (EPEs) function as water sources, contain large soil carbon stores and high levels of biodiversity, and support human populations. The EPEs are mainly herbaceous páramo (HP). To inform policy and management and help drive ecological science toward a better understanding of the HP ecosystem, and the relationships among its multiple ecosystem services, we asked: (1) What is the state of the HP regarding its land use/land cover (LULC)?; and (2) Is the HP being pushed away from its natural state or it is regenerating? To answer these questions, we assessed the LULC in central EPEs using Landsat 8 imagery, Object-Based Image Analysis (OBIA) and a Classification and Regression Trees (CART) algorithm. Results show that two-fifths of the paramo ecosystem remain as native HP (NHP) and two-fifths as anthropogenic HP (AHP). Although the anthropic alteration of the pedogenesis of young paramo soil leads to the establishment of AHP, we found evidence of regeneration and resilience of the NHP. The results of this study will be useful to scientists and decision-makers with interest in páramo ecosystems in central Ecuador. The proposed methodology is simple, fast, and could be implemented in other landscapes to establish comprehensive monitoring systems useful in landscape assessment and planning.

Entities: Chemical Disease Gene Species

Keywords: Classifier decision tree; Ecology; Ecosystem change; Environmental analysis; Environmental assessment; Environmental impact assessment; Environmental science; Herbaceous páramo ecosystem; Human geography; Land use; Nature conservation; Páramo ecosystem; Páramo resilience

Year: 2019 PMID： 31720462 PMCID： PMC6838926 DOI： 10.1016/j.heliyon.2019.e02701

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

Páramos are neotropical high-altitude mountain ecosystems dominant between 3000 and 4700 masl (above the upper forest line) and cover a surface area larger than 7,500,000 ha extending from Costa Rica to the northern Andes of Colombia, Ecuador, Venezuela, and Perú [1, 2, 3]. Ecuadorian páramo ecosystems (EPEs) cover a surface area of 1,833,834 ha, about 5 % of the national territory [2], and extend from the northern boundary with Colombia to the southern border with Perú, through the Andean corridor, and extending 600 km down across the spine of the Andes (about 180 km wide and 600 km length). EPEs function as important water sources with large soil carbon stores and high levels of biodiversity and have long supported human populations. EPEs are home to approximately 628 endemic plant species, which is equivalent to 15% of all of the endemic flora and 4% of the total flora of the country. Seventy five percent of the endemic species are threatened, and 48% are in protected natural areas [4]. EPEs can yield 506–933 mm of water per year [1], equivalent to 2/3 of the annual rainfall within the region. Thus, EPEs supply domestic water for many of the major cities in Ecuador. Due to a high retention of soil organic matter, EPEs topsoil can store up to 143 tons C/ha [5], representing a vital global carbon reservoir. Although volcanic ash soil covers approximately 0.85% of the world's land area, they contain approximately 5% of the global soil carbon [6], roughly 12 times more of the amount stored under the same land area of the world's forest soil. Changes in land use/land cover (LULC), such as conversion to agriculture and clearing for timber plantations and pasture, threaten the EPEs via soil degradation and climate change. Rising temperatures and rainfall pattern changes directly and indirectly influence soil evolution (pedogenesis) of young Andosols soils, altering carbon storage capacity and carbon release from humified soil organic matter (SOM) [7]. Thus, the effects of changes in climate and LU on SOM in volcanic ash soil are substantial [8]. Similarly, complications persist in the mapping of extensive areas in the páramos due to abrupt changes in humidity, altitude, temperature, and topography. The conservation of ecosystem services provided by EPEs in the face of environmental and anthropic changes has become a key priority. However, when determining conservation strategies, it is critically important to be able to identify when the ecosystem is about cross a threshold from a desirable to an undesirable stable state with a loss of resilience [9]. Thus, knowledge about the páramo ecosystem resilience has direct implications for the páramo ecosystem management, where LULC is a huge issue. Although many technological tools combine artificial intelligent algorithms and satellite imaginary for LULC monitoring, there are few examples that extend the use of these technologies to assess system resilience, such as within EPEs. As the páramos are sensitive to disturbances, a better understanding of natural responses of the EPEs to LULC and climate change could have significant implications for land management decisions [3], as well as for global climate change, adaptation and mitigation. Changes in LULC are pervasive, rapid, and can have a significant impacts on humans, the economy, and the environment [10]. LULC changes contribute to páramo ecosystem fragmentation and impact climate and weather conditions from local to global scales, reduce biological diversity, increase soil erosion, and alter other important ecosystem services, leading to the disruption of socio-cultural practices, and increasing natural disasters [11]. Accurate LULC mapping is of paramount importance for continuous monitoring of the changes and determining biotic and abiotic factors contributing to resilient páramo ecosystems [9]. Sustainable development of the EPE will falter without data. Therefore, there is a need for the establishment of comprehensive monitoring systems that utilize remote sensing data to assess changes in LULC and combine this information with knowledge of EPE function to inform spatial planning and achieve long-term conservation goals [12]. In this study, a classification based on objects is proposed as a viable alternative in the complex Andean landscapes. Geographic Object-Based Image Analysis (OBIA) refers to an approach that studies geographic entities through delineating and analyzing image-objects rather than individual pixels. Image-objects are visually perceptible ‘objects’ within the image, typically represented by clusters of similar neighboring pixels that share a common meaning [13]. OBIA not only overcomes the “salt-and-pepper” effect associated with other techniques, but also it adds additional information on spectra, geometry, context, and texture that may be useful in classification. For these reasons, OBIA has been used widely in image-based land cover classification [14]. To inform policy and management and foster a better understanding of the herbaceous páramo (HP) ecosystem service provisioning we asked: (1) What is the state of the HP regarding its LULC?; and (2) Is the HP being pushed away from its natural state or it is regenerating? To achieve these objectives, we used the OBIA framework, CART algorithm, and Landsat 8 images to analyze the state of the páramo ecosystems and generate an up-to-date database by mapping LULC in central EPE. We performed data mining of LULC objects and developed a classification decision tree (CDT) with six important predictive variables with the lowest misclassification rate. The run of the complete data set through the CDT model written in a PHYTON code, led to the distribution of the different LULC in the study area. The proposed methodology is simple, fast, and could be implemented in other landscapes to establish competent comprehensive monitoring systems useful in landscape assessment and planning.

Materials and methods

Study area

The study area is located in the province of Chimborazo (Central Ecuador). Chimborazo is situated 135 km south of Quito and extends between the coordinates: 78° 39′ W and 1° 39′ S, defined between the UTM coordinates South Zone 17, Datum WGS84 (X = 694531; Y = 9840051) and (X = 789975; Y = 9714929) (Fig. 1). The province of Chimborazo has an approximate surface area of 649,970 ha, of which 273,660 ha (42.1 %) belongs to the páramo ecosystem (Fig. 1). Chimborazo has irregular topography that is in an altitudinal range of 143 to 6,263 masl. Soils are mostly Andosols. The climate has a daily seasonality that can be expressed as “winter every night and summer every day,” with notable temperature variations from 2 °C and 20 °C according to the isotherms of the National Institute of Meteorology and Hydrology of Ecuador [15]. However, the climate is generally cold, which contrasts with the high UV radiation that is felt mainly at noon when the sky is clear. Usually, there is a bimodality in terms of rainfall, with peaks of rain around April and greater dryness around July–August. For this study, we segmented the area into 81 regions of 10 × 10 km2 (10,000 ha) each one and processed each region individually (Fig. 1).

Fig. 1

Study area.

The herbaceous páramo

The Ministry of Environment of Ecuador (MEE) defines EPEs according to vegetation types, precipitation, and soils. Although one can find wetland, forest, water body, and snow, the páramo ecosystem in the province of Chimborazo is mainly of the type of HP. Within the herbaceous class groups, the HP covers the largest area (more of 90 %) of mountain ecosystems located in glacial valleys, slopes and subglacial plains, and its soils are rich in organic matter. The humid herbaceous upper montane páramo that is in volcanic enclaves presents relatively low humidity and soils with a low concentration of organic carbon. The floodplain herbaceous páramo has a positive water balance and soils have anaerobic conditions that inhibit organic matter decomposition (organic carbon up to 50%). Finally, the subnival humid herbaceous páramo located in periglacial slopes in shallow entisoles soils and exhibits low retention capacity or water regulation [16].

Workflow

The LULC classification methodology had 5 major stages: image preprocessing, OBIA framework implementation, attribute extraction, data mining, and classification and analysis (Fig. 2).

Fig. 2

Flowchart depicting the essential stages in this classification scheme.

Flowchart depicting the essential stages in this classification scheme. Image preprocessing included downloading Landsat 8 images, pan-sharpening, and radiance and reflectance correction. The OBIA framework implementation included image segmentation and object generation. Attribute extraction comprised the extraction of the maximum, average, and minimum values of the reflectance, spectral indices and ancillary data for each object. Data mining of the classification scheme involved sampling 466 random objects and the extraction of its attribute values; finding the CDT and evaluation of its performance; and evaluating the classifier learnability of the sample and the classifier validation to obtain an estimate of classifier performances. Through sample mining, we performed feature selection to rank the importance of features (predictor variables). The classification and analysis include a phyton code generation for the classification of each object in the categories under study.

Landsat-8 OLI images

The study of the total area was completed using two images captured by the Landsat 8 satellite. The scenes of the studied area were captured on 2016/11/20 with the following tags: (1) LC80100612016325LGN01 and LC80100622016325LGN01; (2) Time of the year: summer; (3) Path/Row: 010/061 and 010/062; (4) Sun Azimuth (degrees): 130.33674 and 128.09132; (5) Sun Elevation (degrees): 60.56479 and 61.31854; (6) Scene center time: 15:26:38 and 15:27:15; (7) Cloud cover: 19% and 25%; (8) Sensors: OLI_TIRS; and, (9) Datum – projection: WGS 84 - UTM ZONE 17s. Although the United States Geological Survey provides Landsat-8 images which have a level of preprocessing L1T, these images have a systematic radiometric and geometric correction by incorporating Ground Control Points (GCP) [17], and are also ortho-rectified through the Digital Elevation Model (DEM). The geometric accuracy of the Landsat 8 images was verified by using the topographic charts and base cartography of rivers and roads, at a scale of 1:50000, georeferenced in the UTM Datum WGS84 projection, of the Ecuadorian Military Geographical Institute, [18].

OBIA framework

The image of each scene was processed in ArcGIS following the workflow suggested by Urbanski [19] for converting Landsat-8 OLI images into a land cover map using OBIA classification: Create a pan-sharpened composite from Landsat 8 data and pan-sharpened 15 × 15 m channels with preserved DN values. For pan-sharpening, the smoothing-filter based intensity modulation technique is used [20]. DN = Digital number (quantized and calibrated standard product pixel values). Combined multispectral band: Band 4 (red), Band 3 (green), Band 2 (blue), and Band 8 (panchromatic). Tool: Pan-sharpened composite (Lansat 8). Using pan-sharpened DN channels created channels with atmospherically corrected radiance at the Earth surface (Learth). The DN values are converted to radiance using radiometric rescaling coefficients from Landsat 8 MTL metadata file. Top of the Atmosphere (TOA) spectral radiance is calculated using band-specific multiplicative (M) and additive (A) rescaling factors. Atmospheric correction is performed using dark object subtraction method assuming one percent minimum reflectance. The one percent deducted from minimum radiation is calculated from formula where: cos θ = cosine of the solar zenith angle in degrees, ESUN = Mean solar exo-atmospheric irradiances, d = Earth-Sun distance in astronomical units [19]. Tool: Radiance atmospheric corrected (Lansat 8). Using pan-sharpened DN channels created channels with atmospherically corrected reflectance at the Earth surface. Calculation of spectral reflectance starts with the conversion of DN to TOA reflectance. Then the one percent minimum reflectance is estimated, and the reflectance at the Earth surface is calculated. The Lowest Valid Value is determined using the absolute minimum value in the band [19]. Tool: Reflectance atmospheric corrected (Lansat8). From reflectance, images perform segmentation, creating a polygon layer of segments. Partition the scene into a set of discrete regions or objects that are internally more uniform than when compared to their neighboring object. The segmentation is performed using the hybrid linkage region growing algorithm, which works in 2 steps. In the first step, multispectral slopes are calculated and converted to the edge map using an adequate threshold. This edge raster map is thinning by extraction of pixels with local “slope maxima” [21]. We selected the segmentation scale after a series of interactive “trial and error” test and visual validation [22]. Tool: Segmentation (Lansat8). Descriptive attributes extraction. Extract from raster pixels of objects and assign to each segment their statistics (mean, standard deviation, maximum, minimum) [19]. Tool: Extraction from raster (OBIA).

Ground truthing and known LULC

LU is characterized by the arrangements, activities, and inputs people undertake in a specific land cover to produce, change, or maintain it. Land cover (LC) is the observed biophysical cover on the Earth's surface [23]. For this study, the LULC data were divided into five classes: NHP, AHP, forest (FRS), water bodies (WTR), and snow (SNW). Ground truthing was conducted to provide field data to validate LULC class. . Data was collected at 466 locations (objects) randomly located in the study area and geographic coordinates of LULC object were recorded using a hand-held global positioning system device. The LULC classes based on visual observation of the imagery are shown in Fig. 3. Training objects for each class were randomly selected within the study area using well-known class locations.

Fig. 3

Typical characteristics of the LULC class identified from field testing and recognized in Landsat 8 imagery. (a) and (b) Water (WRT); Watercourse and water bodies. River, small lakes, and reservoirs. The water class includes the lacustrine system and adjacent areas with high flood susceptibility. (c) and (d) Forest (FRS) dominated by coniferous plants, mostly Pinus spp. Mixed vegetation which has a scattered distribution, mostly shrubs. Forest with close and open canopies more than 12 m height. (e) and (f) Native herbaceous paramo (NHP) ecosystem dominated by Calamagrostis spp. (g) and (h) Anthropogenic herbaceous paramo (AHP) ecosystem that has been disturbed or transformed by human LU. Crop field, pasture, bare soil, built-up areas/infrastructure, roads, burned areas, and regenerated herbaceous páramo with species association and altered phenology. (i) and (j) Snow (SNW) class corresponds to areas with an altitude higher than 3827 masl in which there are precipitation deposits of small ice crystals, with geometric shapes that are grouped in flakes resulting in highest reflectance values.

Object-based features (predictor variables)

The object-based features used in this study to complete the OBIA classification were retrieved from the Landsat 8 bands and the DEM. The mean of the ground reflectance values at Bottom-of-Atmosphere for all the pixels within an object were labeled as Reflectance basic spectral information. We tested 18 candidate features, including the Reflectance basic spectral information, 12 spectral vegetation indices (SVIs) derived from multispectral bands, and five indices derived from DEM (Table 1). The band algebra was used through the Math Band tool, from ENVI, to generate the indices detailed in Table 2 according to their corresponding Landsat 8 bands.

Table 1

Features used for classification.

Object features	Description
Reflectance basic spectral information (Landsat 8 bands B2, B3, B4, and B8).	The average of the ground REFLECTANCE values at Bottom-of-Atmosphere for all the pixel within an object.
Spectral vegetation indices derived from Landsat 8 bands B2, B3, B4, B5, B6, and B7.	EVI2, WDRVI, SAVI, NDVI, NDWI, VARIg, NDSI, BI, NDMI, NBR, NBR2, and NDBI.
Topographic indices derived from DEM.	ALTITUDE, SLOPE, CURVATURE, TWI (Topographic Wetness Index is an indicator of the effect of local topography on runoff flow direction and accumulation), and ASPECT.

Table 2

Spectral vegetation Indices derived from Landsat 8 OLI bands.

Spectral Vegetation Indices	Formulation	Description
NDVI: Normalized Difference Vegetation Index	(NIR-R)/(NIR+R)	The higher the index, the higher the chlorophyll content [28]
SAVI: Soil-Adjusted Vegetation Index	(1+L)(NIR-R)(NIR+R+L) : L = 0.05	Accounts for the effect of soil reflectance [29]
NDWI: Normalized Difference Water Index	(G-SWIR1)/(G+SWIR1)	Used mapping water [30]
WDRVI: Wide Dynamic Range Vegetation Index	(0.05×NIR-R)/(0.05×NIR+R)	More sensitive to leaf area index. Enable a robust characterization of crop physiological and phenological characteristic [31]
VARIg: Visible Atmospherically Resistant Vegetation Index green	(G-R)/(G+R-B)	Lineally sensitive to vegetation fraction, it exhibits a good correlation with nitrogen contents [32]
EVI2: Enhanced Vegetation Index 2	2.5×(NIR-R)(NIR+2.4×R+1)	Provide greater sensitivity in regions with high biomass while minimizing the influence of the soil and the atmosphere [33].
NDSI: Normalized Difference Snow Index	(G-NIR)/(G+NIR)	It is useful for snow mapping [34]
BI: Bare Soil Index	[(SWIR1+R)−(NIR+B)][(SWIR1+R)+(NIR+B)]	The difference amount bare soil areas, lands, vegetation, is marked using the BI [35]
NDMI: Normalized Difference Moisture Index	(NIR-SWIR1)/(NIR+SWIR1)	NDMI is for the detection of vegetation water content [36]
NBR: Normalized Burn Ratio	(NIR-SWIR2)/(NIR+SWIR2)	NBR allows discrimination of burned and unburned areas [37]
NBR2: Normalized Burn Ratio 2	(SWIR1-SWIR2)/(SWIR1+SWIR2)	NBR2 is useful for postfire recovery assessment [38].
NDBI: Normalized Difference Built-up Index	(SWIR1-NIR)/(SWIR1+NIR)	Mapping built-up areas [39]

Features used for classification. Spectral vegetation Indices derived from Landsat 8 OLI bands. The SVIs were used as predictor variables as opposed to the image band because the SVIs carry information about the spectral signature of the vegetative material, which image band by itself does not. The SVIs derived from information provided by remote sensors is the primary source of information for monitoring and assessing the vegetation cover of the land surface [24, 25].

Data processing

The CART algorithm available in the SPM®8 software suite from Salford Systems San Diego, USA was used for data mining. CART is a non-parametric, rule-based machine learning method that attempts to inductively discover the relationship between input attributes (predictor variables) and a target attribute. The method divides the predictor variable into different regions so that the target variable can be predicted more accurately. The discovered relationship is represented in a structure referred to as a decision tree (DT). The DT describes a hidden pattern in the dataset and which can be used for classification and to differentiate objects of LULC within the area of study by mapping the input space into a predefined class. Also, CART automatically chooses the best variables and split point by reducing the squared of the absolute error criterion. CART automatically handles variable selection, variable interaction modeling, nonlinear relationship modeling, missing values, outliers, and it is not affected by the monotonic transformation of variables [26, 27]. A DT is a non-parametric rule-based hierarchical model of decisions and their consequences. Thus, a DT is a graph that uses nodes and branch connecting nodes to illustrate a course of action and various outcome [26]. The critical issues on a DT are: The node at the top of the tree is called the root node (a node with a no incoming branch). A node that has no sub-branch is terminal. Also known as “leaves” or “decision” nodes. A node with an outgoing branch is referred to as an “internal node.” In a DT, each internal node considers a single attribute (predictive variable) and splits this according to the value of the attribute (range). Each split divides one node into two subsets (a binary split). Pruning involves removing a subset node from a DT. Each terminal node is assigned to one class representing the most appropriate target value. Alternatively, each terminal node may hold a probability vector indicating the probability of the target attribute has a particular value. The CART algorithm automatically grows a large tree. Also, the CART algorithm automatically prunes the large tree upward, and uses either a test sample or cross-validation to prune subtrees. The final tree results in the lowest error. All variables are considered for each split, but not all variables are necessarily used. Only variables that maximize the splitting criterion at any given split are used (the important variables). CART build up DTs with only six important predictor variables at a time, and the number of nodes ranges from 5 to 27. We chose the DT that led to the smallest error with the small number of nodes. CART randomly used 80 % (373) of the ground truth data for learning and 20 % (93) for validation and made up a confusion matrix for learning and validation. For the sake of clarity and completeness, a binary cross-tabulation was performed to assess the DT performance categorizing each class in agreement with the equation in Table 3.

Table 3

Systematic notation in a confusion matrix, the binary cross-tabulation for class “i” and indices used to assess the performance of the decision tree categorizing class “i.”

Confusion matrix	Actual Class	Predicted Classes
		Class 1	..	Class j	..	Class n
	Class 1	P11	..	P1j	..	P1n
	..	..	..	..	..	..
	Class i	Pi1	..	Pij	..	Pin
	..	..	..	..	..	..
	Class n	Pn1	..	Pnj	..	Pnn
Binary cross-tabulation for class “i.”		Predicted positive		Predicted negative
	Positive example	TPi=Pi,j=i		FPi=∑j=1nPi,j≠i
	Negative example	FNi=∑i=1nPi≠j,j		TNi=N−TPi−FPi−FNi
	TP = True positive; FP = False positive; FN = False negative; TN = True negative; n = total number of classes; N = Total number of objects.
Sensitivity (Recall)	=TPiTPi+FNi		Represents the likelihood that an object belongs to a class “i” and the classifier accurately assigns it to class “i” (User accuracy - UA).
Specificity	=TNiTNi+FPi		Measures how well the classifier can recognize negative samples. The proportion of real negative cases that are correctly predicted negative.
Precision	=TPiTPi+FPi		Expresses the likelihood of a class “i” being properly recognized (Producer accuracy - PA).
Accuracy	=TPi+TNiN		Expresses the probability that a randomly selected object on the map is appropriately classified (Overall accuracy - OA).
Misclassification rate	=FPi+FNiN
Informedness	=TNiTNi+FPi+TPiTPi+FNi−1		Probability of an informed decision about the random condition [40].
Markedness	=TNiTNi+FPi+TPiTPi+FNi−1		Measures of how much one variable is market as a predictor or possible cause of another [40].
Matthew's correlation=TNi×TPi−FPi×FNi(TNi+FNi)×(TPi+FPi)×(TNi+FPi)×(TPi+FNi)			A measure of the quality of binary classifications prediction; +1 value represents a perfect prediction, 0 no better than random prediction and -1 indicates total disagreement between prediction and observation [40, 41]. Matthew's correlation coefficient is a correlation coefficient between the observed and predicted classification.

Systematic notation in a confusion matrix, the binary cross-tabulation for class “i” and indices used to assess the performance of the decision tree categorizing class “i.”

Results

Knowledge discovery

Importance of predictor variables

The importance of the variables in the classifier performance decreased with the following trend: Reflectance > VARIg > Altitude > SAVI > BI > EVI2 > NDWI > NDVI > NBR > NDMI > NBR2 > NDBI > Slope (Fig. 4). Fig. 4 also shows the six predictive variables (Emphasized) used by the final CDT shown in Fig. 6. The CDT in Fig. 6 led to the smallest error with the smallest number of nodes. Variables CURVATURE, TWI, ASPECT, WDRVI, and NDSI showed little importance and provided limited information for classifier performance.

Fig. 4

Importance of the predictor variables. The final classification decision tree uses emphasized variables.

Fig. 6

The classifier decision tree.

Importance of the predictor variables. The final classification decision tree uses emphasized variables.

Decision tree

CART randomly built out 1500 sub-DTs, each one with one subset of variables, and ran the 466 ground truth data down the smaller trees and computed the errors for each tree as a function of the node numbers. The predictor's subset that led to the sub-DT with the lowest test error is highlighted in Fig. 3, and the sub-DT error as a function of the number of nodes is shown in Fig. 5. Thus, the sub-DT with the smallest error has 14 nodes (Fig. 6), and it is referred to as the final CDT.

Fig. 5

Error curve for the classifier in Fig. 5.

Error curve for the classifier in Fig. 5. The classifier decision tree.

Accuracy assessment

Classification results after learning and validation are shown in Table 4. During the learning process the CDT correctly categorized 344 of 373 ground truth objects (overall accuracy of 93.03 %). Moreover, of 74 WTR objects, 73 were correctly classified (98.65%), and of 83 NHP objects, 67 were correctly classified (78.31%) (Table 4). After validation, the CDT correctly categorized 88 of 93 ground truth objects (overall accuracy of 94.68 %). Moreover, of 14 WTR objects, 14 were correctly classified (100 %), and of 20 NHP objects, 18 were correctly classified (90 %) (Table 4).

Table 4

The resulting confusion matrix from the learning and validation process.

Confusion matrix – Learning (373) – 80%
Actual class ↓	Predicted class
Actual class ↓	Water (74)	Forest (33)	Native herbaceous páramo (67)	Anthropogenic herbaceous páramo (181)	Snow (18)
Water (74)	73	0	1	0	0
Forest (31)	0	30	0	1	0
Native herbaceous páramo (83)	1	0	65	17	0
Anthropogenic herbaceous páramo (167)	0	3	1	162	1
Snow (18)	0	0	0	1	17
Confusion matrix – Validation (93) – 20%
Actual class ↓	Predicted class
Actual class ↓	Water (14)	Forest (7)	Native herbaceous páramo (19)	Anthropogenic herbaceous páramo (47)	Snow (6)
Water (14)	14	0	0	0	0
Forest (7)	0	7	0	0	0
Native herbaceous páramo (20)	0	0	18	2	0
Anthropogenic herbaceous páramo (46)	0	0	1	44	1
Snow (6)	0	0	0	1	5

The resulting confusion matrix from the learning and validation process.

Decision tree features – binary cross-tabulation

The CDT features derived from the confusion matrix by a binary cross-tabulation (Table 3) are shown in Table 5. The overall accuracy per object class is above 94 %, and the misclassification rate is below 6 %. The CDT specificity (the proportion of real negative cases that are correctly predicted negative) is above 94%. The probability of an informed classification for each class was above 78 %. The measures of how much each class is marked as a possible cause of the negative ones (the whole set of another class) were above 72 %. Matthew's correlation coefficient between the observed and the predicted object was above 82 %.

Table 5

Decision tree features. L stands for learning and V for validation.

	Water		Forest		Native herbaceous páramo		Anthropogenic herbaceous páramo		Snow
	L	V	L	V	L	V	L	V	L	V
Sensitivity (User accuracy)	0.99	1.00	0.91	1.00	0.97	0.95	0.90	0.94	0.94	0.83
Specificity	1.00	1.00	1.00	1.00	0.94	0.97	0.97	0.96	1.00	0.99
Precision (Producer accuracy)	0.99	1.00	0.97	1.00	0.78	0.90	0.97	0.96	0.94	0.83
Accuracy (Overall accuracy)	0.99	1.00	0.99	1.00	0.95	0.97	0.94	0.95	0.99	0.98
Misclassification rate	0.01	0.00	0.01	0.00	0.05	0.03	0.06	0.05	0.01	0.02
Informedness	0.98	1.00	0.96	1.00	0.78	0.89	0.88	0.89	0.94	0.82
Markedness	0.98	1.00	0.96	1.00	0.72	0.87	0.94	0.91	0.94	0.82
Matthew's correlation	0.98	1.00	0.93	1.00	0.82	0.90	0.90	0.90	0.94	0.82

Decision tree features. L stands for learning and V for validation.

LULC in the páramo ecosystem of the province of Chimborazo

Fig. 7b shows the complete object's data set from the overall scene through the CDT model (Fig. 6) written in a PHYTON code. This map shows a predicted distribution of the different LULC in the study area. For the sake of clarity and comparison, we include the HP distribution reported by MEE in Fig. 7a. We used the ecosystem map of Ecuador as a baseline and for comparison as it is the official tool for the specialization, characterization, and definition of ecosystems at the national level. The Ecosystem Classification System of Ecuador has been established from continental scales to fine scales (landscape, local); and can be used at different levels according to the purpose of studies carried out in any territory [16].

Fig. 7

(a) The native herbaceous páramo reported in the Ecosystem map of Ecuador 2012 [16]. (b) The native herbaceous páramo and anthropogenic herbaceous páramo predicted by the classifier decision tree in this study. The surface area cover by each category derived from OBIA analysis for NHP, AHP, FRS, WTR, and SNW was 121,905 ha (44.55%), 105,470 ha (38.54%), 25,749 ha (9.41%), 19,973 ha (7.30%), and 560 ha (0.20%), respectively. However, in 2012, the MEE reported that there was 141,827 ha of NHP [16], and there is no report of AHP surface area. The distribution of LULC for the entire study area and each region is shown in Table 6. The regions with the largest area of NHP, AHP, FRS, WTR, and SNW is region 71, with (7,633 ha), 35 (6,970 ha), 11 (2,654 ha), 17 (1,842 ha) and 70 (120 ha), respectively (Table 6). As per, each region has 10,000 ha of surface area. Thus, 76 % of the surface area of region 71 is covered by NHP. Region 35 exhibits the largest area of anthropogenic activities, with 70% of its surface area of AHP. FRST covers 26 % of the surface area of region 11. WTR covers 17 % of the surface area of the region 11, and 0.1 % of the region 70 is covered by snow.

Table 6

The distribution of LULC for the complete study area and each region.

Region	Coordinates UTM - zone 17 southern hemisphere		Surface area (ha)
	Coordinates UTM - zone 17 southern hemisphere		Native herbaceous páramo (MEE 2012)	Native herbaceous páramo	Anthropogenic herbaceous páramo	Forest	Water	Snow
	X	Y	Native herbaceous páramo (MEE 2012)	Native herbaceous páramo	Anthropogenic herbaceous páramo	Forest	Water	Snow
2	749121.1	9840957.6	816	906	760	148	110	1
4	779121.1	9840957.6			55		7	1
6	739121.1	9830957.6	2949	1958	2373	86	477	86
7	749121.1	9830957.6	50	943	480	102	449	27
8	759121.1	9830957.6	1895	1497	1665	310	378	0
9	769121.1	9830957.6	494	496	430	152	41	2
10	779121.1	9830957.6	21	199	26	315	87
11	789121.1	9830957.6	1396	483	379	2654	876
12	739121.1	9820957.6	3961	2934	1742	285	381	10
13	749121.1	9820957.6	23	14	58		21
14	759121.1	9820957.6			7		10
16	779121.1	9820957.6	720	2479	362	682	341	2
17	789121.1	9820957.6	3724	1895	461	1978	1842	67
18	739121.1	9810957.6	3840	3702	2831	137	196	3
19	749121.1	9810957.6	230	188	3932	24	309
20	759121.1	9810957.6		34	2047	30	271
20_1	769121.1	9810957.6	321	265	39	44	5
21	779121.1	9810957.6	3805	3766	534	369	237	1
22	789121.1	9810957.6	41	63	8	2	16
24	729121.1	9800957.6	4372	2216	1338	1154	580
25	739121.1	9800957.6	5583	3820	2071	271	185	4
26	749121.1	9800957.6	1337	845	1701	16	239	26
27	759121.1	9800957.6		103	4577	64	528
28	769121.1	9800957.6	1274	746	1064	312	92
29	779121.1	9800957.6	4061	4045	781	328	193	2
30	789121.1	9800957.6	521	414	83	128	78
32	729121.1	9790957.6	778	276	584	1831	361
33	739121.1	9790957.6	6980	4528	3027	385	468	1
34	749121.1	9790957.6	908	957	1949	62	271
35	759121.1	9790957.6		59	6970	24	437
36	769121.1	9790957.6	1447	989	1295	283	139
37	779121.1	9790957.6	3949	2678	1503	704	493	2
38	789121.1	9790957.6	1259	970	98	91	207	1
41_1	749121.1	9780957.6	6003	3874	2637	574	402	2
41	739121.1	9780957.6	955	949	4344	56	203
42	759121.1	9780957.6	758	351	5866	343	861
43	769121.1	9780957.6	2367	3091	752	637	420
44	779121.1	9780957.6	5059	3217	1610	1547	757
45	719121.1	9770957.6			167		19
46	729121.1	9770957.6		5	676	21	118
47	739121.1	9770957.6	3157	980	5084	553	534	2
48	749121.1	9770957.6		52	2811	104	105
49	759121.1	9770957.6	4755	3323	4535	194	624	89
50	769121.1	9770957.6	1629	2163	2429	529	482	1
51	779121.1	9770957.6	2951	3991	2251	1430	515
54	729121.1	9760957.6	496	60	1059	131	376
55	739121.1	9760957.6	1086	226	1799	83	340
56	749121.1	9760957.6		114	2326	0	30
57	759121.1	9760957.6	5540	5054	3131	78	196	10
58	769121.1	9760957.6	2924	4366	3323	133	259
59	779121.1	9760957.6	2341	2091	593	278	229
63	739121.1	9750957.6	291	101	172	25	65
64	749121.1	9750957.6	866	608	613	15	57
65	759121.1	9750957.6	2473	4841	3787	299	219	5
66	769121.1	9750957.6	1953	5373	1408	63	288	4
66_1	779121.1	9750957.6	323	691	145	78	136
68	729121.1	9740957.6	712	391	262	65	54
69	739121.1	9740957.6	3591	1953	1813	210	296	81
70	749121.1	9740957.6	6615	5937	1756	131	70	120
71	759121.1	9740957.6	7162	7633	1527	173	85	10
72	769121.1	9740957.6	3939	4479	671	471	300	0
73	779121.1	9740957.6	434	278	17	731	175
74	749121.1	9730957.6	3678	3183	574	2	7	0
75	759121.1	9730957.6	4722	3525	1203	340	80
76	769121.1	9730957.6	4731	3024	544	1501	461
77	779121.1	9730957.6	1525	406	34	1056	600
77_1	759121.1	9720957.6	886	718	156	85	9
78	769121.1	9720957.6	771	336	147	321	24
79	779121.1	9720957.6	378	53	21	519	250
TOTAL			141827	121905	105470	25749	19973	560

The distribution of LULC for the complete study area and each region. There are eight regions (2, 9, 19, 20_1, 21, 29, 34, 41) where the difference between the surface area of the NHP reported in the MEE map and the surface area resulting from our analysis is less than 100 ha (Table 6). Thus, it can be inferred that the NHP within these regions remains in the same condition as in 2012. For example, the surface area of the NHP in region 21 and 29 reported by MEE was 3,805 ha and 4,061 ha, respectively; while our analysis suggests 3,766 ha and 4,044 ha, respectively. Similarly, there are 5 regions (16, 51, 58, 65, 66) where the CDT identifies that the HP ecosystem is in an upgrading stage, with all of these regions increasing in area by 1000 ha or more. This suggests that the NHP is being regenerated. Region 66 also gained 3,600 ha. On the other hand, the CDT identified 13 regions (17, 24, 25, 33, 37, 41_1, 44, 47, 49, 69, 75, 76, 77) where the NHP ecosystem is in a downgrading stage, with all of these regions decreasing in area by 1,000 ha or more). This suggests that NHP is being converted to other land uses or otherwise lost. Region 33 lost the greatest area of NHP at 2,500 ha. In summary, the sum of the all positive difference between the total area given by GEOBIA and the reported by MEE in each regions suggest a total gain of 14,265 ha, mean a while, the sum of the all negative differences between the total area given by GEOBIA and the reported by MEE in each region suggest a total loss of 34,186 ha. In contrast, regarding the MEE report the results suggest a total net loss of 19,921 ha (141,827–121,905) of NHP.

Discussion

Important variables

Low relative importance is associated with variables with little relevance for categorization and CDT performance. Predictor variables with little importance provide limited information and do not reduce the entropy of the information. However, this hierarchical framework must be taken with caution because the correlation among the variables could have had an impact on the relative assessment of variables importance, but do not affect the predictive performance of the CDT [25]. The variable “Reflectance” provided more information –with a relative importance of 100%– than any of the SVIs or topographic variables (Fig. 4). Thus, “Reflectance” significantly determines the CDT performance. It has been shown that it is crucial to employ surface reflectance values to achieve accurate and reliable classification results for large-scale application over which weather conditions can change significantly [42]. Reflectance value captures the physical and biophysical parameters of LULC objects, and depends on the 3-D complex LULC objects structure's effects on the reflectance, as well as topographic heterogeneity [14]. However, Berhane et al. [43] found that topographic heterogeneity from regions of interest did not reduce classification accuracy unless heterogeneity became extreme. In general, overall classifier accuracy trended downwards with increasing heterogeneity, but remained relatively high until extreme heterogeneity levels were reached [43]. The VARIg –with a relative importance of 62.54 %– was the second in importance to “Reflectance”, and higher than that of “Altitude.” Previous studies conducted by Gitelson et al. [31] showed that VARIg is sensitive to the entire range (0–100%) of variation of the vegetal fraction. The VARIg indicator was developed to estimate crop conditions because it exhibits a good correlation with nitrogen content (g N/m2) [31]. “Altitude” is a topographic variable derived from DEM scaled to the spatial resolution of the SVIs. The importance of altitude suggests that the spatial distribution of the categories of LULC in the study region depends on topography and related microclimate. In our studied area altitude varied from 3000 masl to 4730 masl, resulting in a range of microclimate which impacts plant physiology. “Altitude” was found to be the most important environmental variable predicting the spatial distribution forests in mountainous regions as a result of the relationship with altitude, slope, and aspect [25]. EVI2, with a relative importance of 46.02 %, –was fourth in importance to “Reflectance”, and its information contribution was higher than that provided by the NBR2. EVI2 is related to leaf area index and does not lose sensitivity in areas with dense vegetation and abundant chlorophyll production [33]. NBR2, with a –relative importance of 33.94 %, is used to evaluate postfire regrowth trajectory and as a means of identifying long-term impacts of fire on sensitive ecosystems [38]. Our results indicated that slope has the importance of 22.98 % in the categorization of LULC in our study region. Slopes range from undulating (5%) to very steep (70%). It is known that the “Slope” can be relevant in a spatial distribution of vegetation that requires large amounts of moisture, and has a significant influence on microclimate [25]. Because “Slope” affects water conditions in the soil as well as temperature, the slope is one of the sources of heterogeneity in the landscape controlling the spatial distribution of the vegetation in mountain regions [25]. Using our six predictor variables, the overall percentage of correct categorizations of the 93 true ground objects was 94.68 % (Table 4). This value is acceptable if we consider the heterogeneity of the studied region and that we only used 373 objects to train the algorithm. In general, these areas are located in hard-to-reach areas, possess very uneven topography and extreme climates, and offer little available local information.

CDT algorithm

Our CDT serves as an exploratory tool for data-driven discovery and prediction to gain new insights on LULC in central EPEs. Although other classifiers could achieve better classification results, CDT provides clear decision rules with fixed threshold values that can be used in future research without any training phase [44]. It has been shown that the CART algorithm can provide stable performance and reliable results in machine learning and data mining research [45]. It has been used to identify spectral bands with the highest discriminative capabilities between classes and low misclassification rates [45]. Classification utilizing the DT algorithm has already been applied in identifying both field-grown and greenhouse crops from remote sensing data with excellent and robust results [44]. However, the CDT found in this study can become unstable, as a change in one node affects all of the nodes that are connected below. Our CDT rules are constrained by the root variable, and only allows rules to be extracted that are indicated by the “Reflectance”, the top variable. Nevertheless, this CDT weakness should not be overlooked. Slight adjustments can help identify seasonal effect on LULC unit's categorization and improve accuracy. In a hypothetical scenario, one could identify the best CDT performance based on season, and match it to the season of data collection. Note, the differences with the Random Forest algorithm where the performance depends on the performance of the majorities of DTs.

Date base and knowledge discovery

Our results indicate that Páramo ecosystems lost surface area due to anthropogenic activities. Páramo residents, mostly indigenous people, are engaged in agricultural, livestock, and logging activities. There is not always a clear line between commercial and subsistence resource use. It is tempting but unrealistic to assume that residents will only seek subsistence or low-impact activities [46]. In the páramo, crop production is accomplished by removing native páramo surface vegetation, cultivation for 3–4 years, and relocation to other páramo areas. The formerly cultivated fields are left for regeneration. Alternatively, páramo residents cultivate pasture for livestock activities on previous crop production areas. Therefore, the natural páramo ecosystem can lose or gain surface area as a consequence of the anthropic activities. The itinerant LU by the páramo resident is due to the short productive capacity of the Ecuadorian páramo soil. These are young volcanic soils, mostly from the andosol order with low bulk density, high organic carbon content, and low nutrient stores. Conversion of the native páramo vegetation alters the pedogenetic trait of the young andosol soils, resulting in surface hardening and a loss of soil organic carbon to the atmosphere. Crop yields are highest in the first year after conversion, but are lower in subsequent years, requiring the use of fertilization to maintain productivity. After the third or fourth years, agricultural activity is unsustainable. Anthropic activities, including burning LC and livestock grazing, affect the structure and composition of the NHP and lead to the conversion to AHP. In areas where there was higher intensity of burning and grazing, the grasslands have a lower height, have lost biomass, and the shrub layer is absent. This ecosystem is characterized by dense vegetation dominated by grasses emanate from basal growing points. Shrubs and matted straw disappear gradually along the elevation gradient and are replaced by cushions, rosettes to caulescent, prostrate shrubs and short-stemmed grasses. The resulting composition and physiognomy of this AHP differ altitudinally and latitudinally, as well as driven by factors such as climate, geological history, initial habitat diversity, and the intensity of human influence [16]. These anthropogenic activities hasten biodiversity loss, climate change, deterioration of ecosystem function, and ultimately compromises the utility of the land for local communities and future generations [46]. In our study, the total area of AHP was estimated at 105,470 ha, with 36 regions exhibiting more than 1,000 ha of AHP (Table 6). This result suggests a long history of páramo soil exploitation, together with a substantial amount of anthropogenic activities. Although this conversion threatens the livelihoods of the páramo residents, this interaction is complex and conditioned by critical economic, social, and environmental factors [47]. In contrast, we estimated that 14,265 ha of NHP has regenerated and is now indistinguishable from unimpacted NHP. Additional research is needed to confirm the areal extent of the conversion of AHP back to NHP and assess the ecosystem attributes of these communities. However, our results could be considered as a sign of NHP resilience. Additional studies on NHP resilience are essential for exploring NHP's response as well as the adaptability of all páramo communities to global climate change. The response of the NHP ecosystem functions (productivity, hydric regulation, carbon sequestration, biodiversity, etc.) to LU change depends strongly the original páramo vegetation composition, intensity of human activities, and changes in climate [48]. Thus, the anthropogenic activities and the global clime change threatens the future of the NHP ecosystem. As anthropic pressure on the NHP accelerates, it is critical to strengthen conservation efforts. It is crucial to protect these valuable resources using realistically implemented managerial plans and the establishment of well-defined policies and laws [42]. However, any policies need to consider and recognize Indigenous Peoples’ rights as essential to meeting local and global conservation goals [49].

Conclusion

The methodology developed in this study could be the foundation for a comprehensive monitoring system to achieve sustainability goals related to EPEs. Data mining of LULC objects allowed discovery of a CDT with six important predictive variables (Reflectance, VARIg, Altitude, EVI2, NBR2, and slope). The CDT allows the categorization of the land use categories of NHP, AHP, FRS, WTR, and SNW with the lowest misclassification rate (below 6 %). The probability of an informed classification for each class was above 78 %, and Matthew's correlation coefficient between the observed and the predicted object was above 82 %. Analysis of the LULC database in the Central EPEs showed that two-fifths of the páramo ecosystem remains as native HP (NHP), two-fifths has been converted to anthropogenic HP (AHP), and one-fifths remains as FRS, WTR, and SNW. Although the anthropic alteration of the pedogenesis of young páramo soil led to the establishment of AHP, we found evidence of regeneration of the NHP, indicating its resilience. Additional study of NHP resilience is an essential task for exploring NHP's response to global climate change and the adaptability of all páramo communities to such change. Such assessment depends on the accurate monitoring the spatial pattern of LULC and on the evaluation of the driving mechanisms of LULC changes (natural or anthropogenic). Results of this study will be primarily useful to scientists and decision-makers with interest in páramo ecosystems in central Ecuador. The proposed methodology is simple, fast, and could be implemented in other landscapes to establish comprehensive monitoring systems useful in landscape assessment and planning.

Declarations

Author contribution statement

Victor J Garcia, Carmen O. Márquez: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Tom M. Isenhart: Conceived and designed the experiments; Wrote the paper. Marco V Rodríguez: Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data. Santiago D. Crespo, Alexis G. Cifuentes: Performed the experiments.

Funding statement

This work was supported by Universidad Nacional de Chimborazo, Riobamba Ecuador, through the project “Soil organic carbon and sequestration in Ecuadorian páramo ecosystem”.

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

5 in total

1. Wide Dynamic Range Vegetation Index for remote quantification of biophysical characteristics of vegetation.

Authors: Anatoly A Gitelson
Journal: J Plant Physiol Date: 2004-02 Impact factor: 3.549

2. Climate-land-use interactions shape tropical mountain biodiversity and ecosystem functions.

Authors: Marcell K Peters; Andreas Hemp; Tim Appelhans; Joscha N Becker; Christina Behler; Alice Classen; Florian Detsch; Andreas Ensslin; Stefan W Ferger; Sara B Frederiksen; Friederike Gebert; Friederike Gerschlauer; Adrian Gütlein; Maria Helbig-Bonitz; Claudia Hemp; William J Kindeketa; Anna Kühnel; Antonia V Mayr; Ephraim Mwangomo; Christine Ngereza; Henry K Njovu; Insa Otte; Holger Pabst; Marion Renner; Juliane Röder; Gemma Rutten; David Schellenberger Costa; Natalia Sierra-Cornejo; Maximilian G R Vollstädt; Hamadi I Dulle; Connal D Eardley; Kim M Howell; Alexander Keller; Ralph S Peters; Axel Ssymank; Victor Kakengi; Jie Zhang; Christina Bogner; Katrin Böhning-Gaese; Roland Brandl; Dietrich Hertel; Bernd Huwe; Ralf Kiese; Michael Kleyer; Yakov Kuzyakov; Thomas Nauss; Matthias Schleuning; Marco Tschapka; Markus Fischer; Ingolf Steffan-Dewenter
Journal: Nature Date: 2019-03-27 Impact factor: 49.962

2 in total

1. The Influence of Knowledge Base on the Dual-Innovation Performance of Firms.

Authors: Liping Zhang; Hailin Li; Chunpei Lin; Xiaoji Wan
Journal: Front Psychol Date: 2022-05-27

2. Wetland monitoring technification for the Ecuadorian Andean region based on a multi-agent framework.

Authors: Esteban Valencia; Iván Changoluisa; Kevin Palma; Patricio Cruz; Deyanira Valencia; Paul Ayala; Victor Hidalgo; Diego Quisi; Nelson Jara; Diana Puga
Journal: Heliyon Date: 2022-03-17