Literature DB >> 33266608

Hybrid Integration Approach of Entropy with Logistic Regression and Support Vector Machine for Landslide Susceptibility Modeling.

Tingyu Zhang1, Ling Han1, Wei Chen2, Himan Shahabi3.   

Abstract

The main purpose of the present study is to apply three classification models, namely, the index of entropy (IOE) model, the logistic regression (LR) model, and the support vector machine (SVM) model by radial basis function (RBF), to produce landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Firstly, landslide locations were extracted from field investigation and aerial photographs, and a total of 194 landslide polygons were transformed into points to produce a landslide inventory map. Secondly, the landslide points were randomly split into two groups (70/30) for training and validation purposes, respectively. Then, 10 landslide explanatory variables, such as slope aspect, slope angle, altitude, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected and the potential multicollinearity problems between these factors were detected by the Pearson Correlation Coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL). Subsequently, the landslide susceptibility maps for the study region were obtained using the IOE model, the LR-IOE, and the SVM-IOE model. Finally, the performance of these three models was verified and compared using the receiver operating characteristics (ROC) curve. The success rate results showed that the LR-IOE model has the highest accuracy (90.11%), followed by the IOE model (87.43%) and the SVM-IOE model (86.53%). Similarly, the AUC values also showed that the prediction accuracy expresses a similar result, with the LR-IOE model having the highest accuracy (81.84%), followed by the IOE model (76.86%) and the SVM-IOE model (76.61%). Thus, the landslide susceptibility map (LSM) for the study region can provide an effective reference for the Fugu County government to properly address land planning and mitigate landslide risk.

Entities:  

Keywords:  hybrid model; landslides; loess area; machine learning; statistical method

Year:  2018        PMID: 33266608      PMCID: PMC7512466          DOI: 10.3390/e20110884

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


1. Introduction

Landslides often occur in mountainous and hilly areas and are one of the most dangerous geological disasters [1]. Landslides can cause huge economic losses and a large number of casualties. According to statistics, almost 1000 people and 4 billion dollars are lost annually in the world [2], and this figure still keeps growing. China is also a region where landslides frequently occur; it has been reported that 7122 geological disasters occurred in 2017, causing 327 deaths, 173 injured, 25 missing, and a loss of 3.54 billion CNY [3]. In addition, in northwestern China, landslides pose a greater threat to resident security and transportation, because of the harsh environment and population concentration. However, enormous manpower and material resources may be required to control and renovate every landslide. Therefore, predicting landslide occurrence is both valuable and important. As the first step to predicting landslide occurrences, a landslide susceptibility analysis aims to recognize hazardous and high-risk regions, and a preference for the negative effects of landslides [4]. The landslide susceptibility map (LSM) is the final result of the landslide susceptibility analysis. However, the traditional methods for landslide susceptibility mapping based on filed investigation and manual analysis are time-consuming and expensive, and the result is imprecise [5,6]. In recent years, geographical information systems (GIS) have been vigorously developed, which make the preparation of the landslide susceptibility map more convenient, which has great advantages [7]. Meanwhile, there has been a lot of research on the combination of geographical information systems, and statistical and nonstatistical methods to evaluate landslide susceptibility—in terms of the binary statistical method, for example, the frequency ratio (FR) model [8,9,10,11,12,13], the certainty factor (CF) model [14,15,16,17], the statistical index (SI) [18,19], the weights of evidence (WOE) [20,21,22], and the index of entropy (IOE) model [23,24]. The factor internal coefficient of certainty or weight of evidence is decided by landslide data, but the selection of factors would be influenced by humans. As a multivariate statistical method, the logistic regression (LR) model is extensively applied by many researchers [25,26,27,28,29,30]. Due to the limitation of statistical models, some machine learning algorithms that can avoid the influence from humans were also introduced and applied for landslide susceptibility analysis, such as artificial neural networks (ANN) [31,32,33], neuro-fuzzy [34,35,36,37], fuzzy logic [38,39], decision trees [40,41,42], kernel logistic regression (KLR) [43,44], and support vector machines (SVM) [45,46,47]. Statistical models and machine learning algorithms have their own advantages and disadvantages [48,49]. The internal parameters of the explanatory variables in binary statistical models are determined by landslide data, which can avoid the interference of human factors and be more objective. However, the selection of explanatory variables will receive interference from humans. By contrast, multivariable statistical models and machine learning methods can avoid the problem of factor dependence, but they are less widespread and limited to few cases of study for their intensive computation [50,51]. In recent years, many hybrid models have been used in the literature, such as the fuzzy weight of evidence method [17], adaptive network-based fuzzy inference system (ANFIS) based on frequency ratio (FR–ANFIS) model [52], wavelet packet–statistical (WP–SM) models [53], and integration of support vector machines and the multiboost [54]. According to plenty of research, the hybrid model generally performed better than the original models, so trying to mix different models and apply them to different regions is significant. Therefore, this research assembled the IOE model with the LR and SVM models to form two hybrid models (LR–IOE and SVM–IOE) for landslide susceptibility mapping in the Fugu County of Shaanxi Province, China.

2. Study Area

The Fugu County, whose geographic coordinates are 110°25′ to 111°15′ east longitude and 38°42′ to 39°33′ north latitude, covers an area of 3229 Km2 (Figure 1). The elevation in the study area is between 761 and 1423 m above sea level, and increases from east to west. The temperate zone with an arid continental monsoon climate is the main climate type in the study region, and the maximum and minimum temperatures in history are 38.9 °C and −24 °C, while the average annual temperature is 9.1 °C. The average annual rainfall is 428.6 mm, and the geographical distribution of rainfall shows a gradual increase from northwest to southwest. Meanwhile, most of the precipitation is concentrated from July to September, accounting for 69% of the annual rainfall. There are 62 rivers with drainage areas above 1 × 107 m2 in the study region, and the average annual runoff is 5.911 × 109 m3.
Figure 1

Landslide inventory map and the location of study area.

The overall topography of the study area is high in the northwest and low in the southwest. The main landform types can be divided into four types as follow: Loess girder landform, loess gully landform, canyon hilly landform, and valley terraces. The dip direction of rock formation is roughly southwest–northwest, with a dip angle of approximately 5–8 degrees except for a few areas, which are about 20 degrees. The Carboniferous–Permian strata in the east and the Jurassic strata in the northwest are coal-bearing strata, and the lithology in the study area is shown in Table 1.
Table 1

Lithological units of study area.

CategoryGeological AgeCodeMain Lithology
AHoloceneQ4Sand, gravel, loess
PleistoceneQ3Loess, gravel
BPlioceneN2jSandy clay
PlioceneN2bQuartz sand, clay
CMiddle JurassicJ2ySiltstone, sandstone, mudstone, shale, coal seam
Late JurassicJ1fMudstone, glutenite
DEarly TriassicT3wMudstone, shale, coal seam
Early TriassicT2-3yGlutenite, mudstone, shale, siltstone
Middle TriassicT2zSandstone, mudstone
Late TriassicT1hMedium-fine sandstone, siltstone, mudstone
Late TriassicT1lSandstone, mudstone
EEarly PermianP2sGlutenite, sandstone, mudstone
Early PermianP2shMudstone, silty mudstone, sandstone, clay minerals, siliceous
Late PermianP1shFeldspar quartz sandstone, conglomerate, sandstone, mudstone, shale
Late PermianP1sMudstone, shale, sandstone, coal seam
FCarboniferousC2tCalcaremaceous sandstone, coal seam, mudstone
Due to the rich coal resources in the study area, the mining industry is developed and the population is concentrated, which caused serious damage to the environment. At the same time, it has also formed massive landslides.

3. Data Used

3.1. Landslide Inventory Map

A landslide inventory map is the first step in a landslide susceptibility analysis and includes historical and newly discovered landslides and their relational information [43], such as the location, the date of occurrence, the extent of landslide phenomena in a region, and the types of mass movements that have left discernable traces [55]. In order to obtain a practical and accurate landslide inventory map, data collection and an adequate field survey were significantly in the current study. A digital elevation model (DEM) of the study region with 30 m resolution was obtained from ASTER GDEM, downloaded from Geospatial Data Cloud [56]. The geological map and mean annual precipitation data were provided by the government of Fugu County. Based on field investigations, a total of 194 landslides polygons, including 162 slides, 29 falls, and 3 debris flows, were drawn according to the depletion zone, and these landslides were triggered by rainfall and excavation. In the study area, the smallest and largest sizes of these landslides were about 39 m2 and 13.5 × 104 m2, respectively. Because only 12% of landslides are over 10,000 m2 in size, landslide polygons were transformed into points using the centroid method and then the landslide inventory map (Figure 1) was obtained in the present study [57,58]. To avoid the overfitting problems in modeling, a total of 194 nonlandslide points were randomly generated and mapped on the landslide inventory map. All of these landslide and nonlandslide points were randomly divided into two groups; namely, the training dataset, including 272 (70%) points, was used to train the models, and the validating dataset, including 116 (30%) points, was used for validation propose.

3.2. Landslide Explanatory Variables

In order to produce the landslide susceptibility map, 10 landslide explanatory variables, namely slope aspect, altitude, slope angle, lithology, mean annual precipitation, distance to roads, distance to rivers, distance to faults, land use, and normalized difference vegetation index (NDVI), were selected to produce data layers representing themselves with a resolution of 30 × 30 m. Slope aspect, altitude, and slope angle maps were extracted from DEM data using ArcGIS software. Land use and NDVI were extracted from GF-2 satellite images gathered from the China Center for Resources Satellite Data and Application. Lithology, distance to roads, mean annual precipitation, distance to rivers, and distance to faults maps were extracted based on existing data. The slope aspect, which is considered to be a prerequisite condition, was frequently adopted by many works in the literature to produce a landslide susceptibility map [30]. The slope aspect was reclassified into nine groups, based on the equal interval method, as follows: Northwest, west, southwest, south, southeast, east, northeast, north, flat, respectively (Figure 2a).
Figure 2

Landslide explanatory variable maps involving: (a) Slope aspect; (b) slope angle; (c) altitude; (d) lithology; (e) mean annual precipitation; (f) distance to roads; (g) distance to rivers; (h) distance to faults; (i) land use; (j) normalized difference vegetation index (NDVI).

As it is considered to be another critical factor, the slope angle was widely used by a lot of relevant research [59]. In the current research, the slope angle was divided into the following six categories, based on the Jenks natural break method, as follows: 0°–6.65°, 6.65°–11.40°, 11.40°–16.39°, 16.39°–22.09°, 22.09°–29.45°, 29.45°–60.57° (Figure 2b). Altitude is also considered a significant factor for landslide susceptibility mapping [1]. Thus, based on the Jenks natural break method, elevation values were classified into the following seven ranges: 761–903 m, 903–984 m, 984–1054 m, 1054–1124 m, 1124–1194 m, 1194–1262 m, and 1262–1423 m (Figure 2c). The difference of lithology is the basis of landslide formation conditions [60]. According to field investigations and the existing geological data and maps, lithological units were divided into six categories (Table 1) and the lithology map was produced (Figure 2d). Previous research has indicated that there is a strong correlation between mean annual precipitation and landslide occurrences [61,62,63]. According to the existing and local observation data, mean annual precipitation is divided into seven classes based on equal interval method as follows: <360 mm/y, 360–380 mm/y, 380–400 mm/y, 400–420 mm/y, 420–440 mm/y, 440–460 mm/y, and >460 mm/y (Figure 2e). Distance to roads is used as an important landslide explanatory variable to prepare the distance to roads map [64]. In this study, the values of distance to roads were reclassified into five ranges based on equal interval method as follows: <200 m, 200–400 m, 400–600 m, 600–800 m, and >800 m (Figure 2f). River erosion of slope is considered to be a significant explanatory variable inducing landslides; thus, distance to rivers is employed to be a quantitative index of river erosion [25]. In this study, with 200 m as the interval, the values of distance to rivers were reclassified into five ranges based on equal interval method as follows: <200 m, 200–400 m, 400–600 m, 600–800 m, and >800 m (Figure 2g). Fault movement is not only the requirement for individual landslide occurrences, but also a controlling factor for regional landslide occurrences [12]. A mass of field surveys indicated that the more fault movement occurred acutely, the more landslides were triggered. In the current research, with 2000 m as the interval, the values of distance to faults were reclassified into five ranges based on equal interval method as follows: <2000 m, 2000–4000 m, 4000–6000 m, 6000–8000 m, and >8000 m (Figure 2h). Land use in different regions will be different. The use of these land may lead to an asymmetrical distribution of landslides [65]. Thus, land use was also employed to be an explanatory variable in the study region, which was generally divided into five categories as follows: Water, residential areas, bare land, forest/grassland, and farmland (Figure 2i). NDVI reflects the surface condition and provides a quantitative estimate of vegetation growth and biomass. This is depending on the biomass, the position within the hillslope profile, the root-zone depth and possibility to crack rocks and to prevent or ease water infiltration [66,67]. Therefore, NDVI is also considered to be a pivotal explanatory variable. The computational formula of NDVI is defined as follows: where R stands for the red part of electromagnetic spectrum, while NIR represents the infrared part of electromagnetic spectrum. Using the Jenks natural break method, the NDVI values were reclassified into five categories as follows: −0.39 to −0.019, −0.019 to 0.063, 0.063–0.134, 0.134–0.216, and 0.216–0.607 (Figure 2j).

4. Methodologies

4.1. Multicollinearity Diagnosis

In the study region, not all explanatory variables have a positive impact on the classification results. Multicollinearity problems may exist between explanatory variables, which may lead to an overfit in modeling. Thus, the Pearson correlation coefficient (PCC), the variance inflation factor (VIF), and tolerance (TOL) were introduced to detect the potential multicollinearity problems [68]. The essence of PCC is a statistical linear correlation coefficient, and its analysis is usually used to measure the linear relationship between distance variables. For two sets of samples X (i = 1, 2, 3, ..., n) and Y (j = 1, 2, 3, ..., n), the PCC between them can be expressed as: where x and y are variable values for X and Y. and are the average of X and Y, respectively. In general, the greater the absolute value of PCC is, the higher the risk of multicollinearity between the landslide explanatory variables [69], and a PCC of >0.7 indicates a multicollinearity problem [70]. The VIF and TOL are two important indexes for a multicollinearity diagnosis. VIF refers to the ratio of the variance when there is multicollinearity between the conditioning factors and the variance when there is no multicollinearity, and the tolerance is the reciprocal of VIF [71]. In general, the larger the VIF values and the smaller the tolerances values are, the stronger the multicollinearity between the conditioning factors. In this study, the explanatory variables with VIF >2 or TOL <0.4 should be abandoned [72].

4.2. Index of Entropy (IOE) Method

The first classification model applied in the present study is the index of entropy (IOE) model, which is a bivariate statistic model; the IOE is also used to be the input data to build the hybrid models in the subsequent modeling. The entropy means the degree of unsteadiness and indeterminacy of a system, and also indicates that elements in a natural environment are the most related development for mass movement [23]. In addition, the entropy represents the degree of different explanatory variables that affect the development of landslides in a landslide susceptibility analysis. The weight values (W) of each landslide explanatory variable are determined by the following equations [73]: where FR is the frequency ratio value; x and y represent the percentage of domain and percentage of landslides, respectively; S stands for the probability density; entropy values are represented by M and M; N means the number of categories or ranges of each explanatory variables; and I is the information parameters. Then, the final weight values are calculated by SPSS software. Because these three explanatory variables (aspect, lithology, and land use) are generated from vector graphics with no attribute values, the FR values of aspect, lithology, and land use were used as input data for the computation of W. Finally, the landslide susceptibility map for the IOE model is produced using the following equation: where LSIIOE stands for the sum of all the categories; j represents the number of explanatory variable maps; e means the number of classes within explanatory variable maps with the greatest number of groups; f is the number of classes within particular explanatory variable maps; and C indicates the value of the categories after secondary classification [74].

4.3. Integration of Logistic Regression and Index of Entropy Model

The logistic regression (LR) model is employed to integrate with the IOE to build a new hybrid model, namely, the LR–IOE model in this study. Logistic regression is a commonly used statistical analysis method for regression analysis of binary classification dependent variables. The superiority of the LR model is that independent variables can be discrete or continuous and there is no need to satisfy the normal distribution [75]. In a logistic regression analysis, the dependent variable has values of 0 and 1, representing nonlandslide occurrences and landslide occurrences, respectively. The LR model can be expressed as the following equation: where P stands for the probability of landslide occurrences, whose value ranges from 0 to 1; Z is calculated by the following equation with the output values range from −∞ to +∞: where n is the number of independent variables; B (i = 1, 2, 3, ..., n) is the logistic regression coefficient and X are the values of the n explanatory variables; and B0 is a constant. Because the values of S were obtained from the IOE model and the dimension of S is uniform, it can avoid the linear correlation between landslides and explanatory variables and also reduce the noise in modeling. In this study, the 10 explanatory variables were reclassified with the corresponding S values. Then, the values of S were regarded as the input data to build the hybrid model (LR–IOE) through the forward stepwise method to calculate B0 and B.

4.4. Integration of Support Vector Machine and Index of Entropy Model

The basic theory of the support vector machine is to transform the input space into high-dimensional space through an inner product function using the training data [76]. The support vectors are defined as the training samples that have the smallest distance from the optimal hyper plane [40]. In this study, SVM is designed to solve binary classification problems, which means that the positive and negative samples exist at the same time. Consider a set of training vectors x (i = 1, 2, 3, ..., n), and x consists of two types denoted as y = ±1 [77]. SVM aims to search an n-dimensional hyperplane distinguishing the two categories; meanwhile, ensure that these two classes are farthest from the hyperplane. Using mathematical formulas, this can be expressed as follows: followed by constraints: where stands for the norm of hyperplane normal; k is a constant. By applying the Lagrangian multiplier (), the cost function can be written as: In addition, slack variable is applied to solve the nonseparable problems [76]; thus, Equations (12) and (13) can be modified as: where v stands for misclassification, with values ranging from 0 to 1. In addition, by introducing a kernel function, the nonlinear decision boundary can be calculated. In the current research, the following kernel function, namely, the radial basis function (RBF), which is considered to be one of the most powerful kernels [78], is selected to calculate LSISVM and produce landslide susceptibility map. The radial basis function is shown as follows: where accounts for the width of the Gaussian kernel function [19]. Similarly, the S was used to be the input data for the SVM model and then build the new hybrid model (SVM–IOE).

4.5. The ROC Curve

To test the performance of LSMs obtained by the three models, the receiver operating characteristics (ROC) curve was applied. Based on a series of different dichotomies (cutoffs or decision thresholds), the ROC curve plots 1—specificity as X-axis and sensitivity as Y-axis, which can be expressed as: where TP represents true positive, TN is true negative, FP is false positive, and FP is false negative [79]. The quality of these three models predicting the occurrences or non-occurrences of landslide can be measured by the area under the ROC curve (AUC) [9]. The AUC values range from 0 to 1; in addition, if the AUC value is closer to 1, it indicates that the accuracy of model prediction is higher. Conversely, if AUC value is less than 0.5, and closer to 0, it indicates that the model prediction has no practical value [80].

5. Results

5.1. Assessment of Explanatory Variables

In this study, the training dataset was used to evaluate explanatory variables and the Pearson correlation coefficient between pairs of explanatory variables was calculated (Table 2). It can be seen from the results that the lowest PCC value is −0.009, which happened between altitude and NDVI, and the highest PCC value happened between slope aspect and distance to rivers (0.368). All PCC values are less than 0.7.
Table 2

Pearson correlation coefficient between pairs of explanatory variables.

Explanatory VariablesSlope AspectSlope AngleAltitudeLithologyMean Annual PrecipitationDistance to RoadsDistance to RiversDistance to FaultsLand Use
Slope aspect1
Slope angle0.0371
Altitude0.1160.0031
Lithology0.1650.1700.0101
Mean annual precipitation0.1400.100−0.0210.0251
Distance to roads0.2800.0670.0790.0480.2051
Distance to rivers0.3680.1040.112−0.0100.0040.1601
Distance to faults0.3200.054−0.0700.0750.0240.0340.1191
Land use0.123−0.1160.0870.0530.2870.0500.0840.0191
NDVI0.0380.011−0.0090.1790.146−0.065−0.0550.0470.082
The calculation results of VIF and TOL are shown in Table 3. It can be observed that the maximum VIF value is 1.926 and the minimum TOL value is 0.519, which means all the explanatory variables can be applied for landslide susceptibility modeling.
Table 3

VIF and tolerances for explanatory variables.

Explanatory VariablesVIFTolerances
Slope angle0.6571.523
Slope aspect0.9621.040
Altitude0.7901.265
Distance to rivers0.6871.455
Distance to roads0.5731.746
Distance to faults0.9091.100
NDVI0.7701.298
Land use0.9101.099
Lithology0.5191.926
Mean annual precipitation0.6111.637

5.2. Result of IOE Model

The calculation method of W has already been described in Section 4.2, Equations (3)–(8), and the results are shown in Table 4. The FR values shown in Table 4 were used as the input data for slope aspect, lithology, and land use. For the remaining explanatory variables, the original (continuous) data were used as input data to compute the IOE values. Based on the obtained results, the landslide susceptibility index for the IOE model (LSIIOE) was calculated using Equation (9) and was written as follows:LSI
Table 4

Spatial relationship between each landslide explanatory variable and landslide by the index of entropy (IOE) model.

Explanatory VariablesClassesNo. of Pixels in Domain% Percentage of DomainNo. of Landslide% Percentage of Landslides FRij Sij Mj Mjmax Ij Wj Bi
Slope aspectFlat7360.02100.0000.0000.0002.8703.1700.0950.0840.061
North436,17512.23496.5690.5370.067
Northeast478,23313.4132115.3281.1430.143
East453,97912.73396.5690.5160.065
Southeast435,97412.2283223.3581.9100.239
South492,24513.8061510.9490.7930.099
Southwest471,64613.2292518.2481.3790.173
West413,51411.598139.4890.8180.103
Northwest382,82010.737139.4890.8840.111
Slope angle (°)0–6.65434,59812.1901611.6790.9580.1352.4452.5850.0540.0640.043
6.65–11.40954,01226.7583122.6280.8460.119
11.40–16.39937,52426.2962518.2480.6940.098
16.39–22.09640,54617.9662820.4381.1380.161
22.09–29.45349,5509.8041410.2191.0420.147
29.45–60.57249,0926.9872316.7882.4030.339
Altitude (m)761–90371,7022.0112618.9789.4370.6751.5772.8070.4380.874−0.252
903–984354,9389.9552618.9781.9060.136
984–1054796,32822.3352719.7080.8820.063
1054–1124851,00423.8692618.9780.7950.057
1124–1194989,54627.7552820.4380.7360.053
1194–1262487,43813.67242.9200.2140.015
1262–142314,3660.40300.0000.0000.000
LithologyCategory A80,8052.26610.7300.3220.1091.9632.5850.2400.119−0.013
Category B650,27018.2391410.2190.5600.189
Category C2,029,31656.91811583.9421.4750.497
Category D736,19420.64964.3800.2120.072
Category E65,7041.84310.7300.3960.134
Category F30330.08500.0000.0000.000
Mean annual precipitation (mm/y)<36063,4681.78021.4600.8200.0812.3572.8070.1600.2320.239
360–380630,45617.68353.6500.2060.020
380–400537,28215.0702014.5990.9690.096
400–420850,90023.8662216.0580.6730.066
420–440999,89528.0454432.1171.1450.113
440–460451,40212.6613928.4672.2480.222
>46031,9190.89553.6504.0770.042
Distance to roads (m)<200385,49810.8127756.2045.1980.6171.6092.3220.3070.517−0.533
200–400311,5808.7392014.5991.6700.198
400–600282,1257.91396.5690.8300.099
600–800248,2896.96442.9200.4190.050
>8002,337,83065.5712719.7080.3010.036
Distance to rivers (m)<2001,108,72231.0978662.7742.0190.5011.9562.3220.1580.127−0.269
200–400881,38324.7212618.9780.7680.191
400–600642,14518.011128.7590.4860.121
600–800389,49710.92575.1090.4680.116
>800543,57515.24664.3800.2870.071
Distance to faults (m)<2000526,62414.7711913.8690.9390.1902.2512.3220.0300.0300.110
2000–4000459,27112.882107.2990.5670.115
4000–6000431,65112.1071410.2190.8440.171
6000–8000344,3399.6582014.5991.5120.307
>80001,803,43750.5837454.0151.0680.217
Land useWater13,2660.37200.0000.0000.0001.2582.3220.4580.9740.061
Residential areas86,1172.4152518.2487.5550.711
Bare land178,071249.9457151.8251.0380.098
Forest/Grassland1,317,84536.9631712.4090.3360.032
Farmland367,38210.3042417.5181.7000.160
NDVI−0.39 to −0.019278,4307.8094019.1973.7390.5771.7792.3220.2340.303−0.354
−0.019 to 0.063988,70027.7313827.7371.0000.154
0.063–0.1341,233,77734.6054331.3870.9070.140
0.134–0.216837,51223.491128.7590.3730.058
0.216–0.607226,9036.36442.9200.4590.071

B0 is 2.345.

In the end, all of the 10 explanatory variables were used to build the IOE model, and LSIIOE values range from −10.37 to 11.67. LSIIOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSIIOE are to 11.67, the higher the probability of landslide occurrence, and the values of LSIIOE are close to −10.37, indicating that the probability of occurrence of a landslide is lower. Then, the natural break method was applied to classify the final LSM produced by the IOE model into four categories, which were low (−10.37 to −4.33), moderate (−4.33 to −1.65), high (−1.65 to 1.64), and very high (1.64 to 11.67) (Figure 3a). Additionally, the area percentage of low, moderate, high, and very high regions is 31.24%, 16.39%, 33.23%, and 19.14%, respectively.
Figure 3

Landslide susceptibility map derived from: (a) The IOE model; (b) logistic regression (LR)–IOE model; (c) support vector machine (SVM)–IOE model.

5.3. Result of LR–IOE Model

The calculation method of Z has already been described in Section 4.2, Equations (3)–(8). The S values shown in Table 4 were used as the input data for all 10 explanatory variables through the reclassification method to build the LR–IOE model and to compute B and B using SPSS software. Based on the results, Equation (11) can be written as follows: Subsequently, the LSILR–IOE values were obtained, which range from 0.016 to 0.983. LSILR–IOE values reflect the probability of landslide occurrence. In other words, the closer the values of LSILR–IOE are to 1, the higher the probability of landslide occurrence, and the values of LSILR–IOE are close to 0, indicating that the probability of landslide occurrence is lower. Similarly, the natural break method was applied to classify the final LSM produced by the LR–IOE model into four categories: Low (0.016–0.248), moderate (0.248–0.445), high (0.445–0.688), and very high (0.688–0.983) (Figure 3b). In addition, the area percentage of low, moderate, high, and very high is 16.77%, 33.06%, 21.05%, and 29.12%, respectively.

5.4. Result of SVM–IOE Model

In the current research, the parameters of the radial basis function were selected by the grid search method with 10-fold cross validation, and then the entropy was regarded as the input data to calculate the LSISVM–IOE values based on SVM–IOE model. The LSISVM–IOE values range from 0.061 to 0.984. The closer the values are to 1, the higher the probability of landslide occurrence, and the values of LSISVM–IOE are close to 0, indicating that the probability of landslide occurrence is lower. Then, the natural break method was applied to classify the final LSM produced by the SVM–IOE model into four categories: Low (0.061–0.271), moderate (0.271–0.437), high (0.437–0.658), and very high (0.658–0.984) (Figure 3c). The area percentage of low, moderate, high, and very high is 15.08%, 29.56%, 33.39%, and 21.97%, respectively.

5.5. Validation of Landslide Susceptibility Maps

In the current study, the ROC curve was used to validate and compare the performance of the IOE, LR–IOE, and SVM–IOE models. The final AUC values represent the success and prediction rate derived from the training and validating dataset, respectively. In the end, for success rate results, the AUC values for the IOE, LR–IOE, and SVM–IOE models were observed to be 0.8743, 0.9011, and 0.8653, respectively (Figure 4a). That is to say, the training accuracy of the susceptibility maps is 87.43%, 90.11%, and 86.53%, respectively. In terms of prediction rate results, the AUC values for the IOE, LR–IOE, and SVM–IOE models were found to be 0.7686, 0.8184, and 0.7661, respectively (Figure 4b). In other words, the prediction accuracy of the susceptibility maps is 76.86%, 81.84%, and 76.61%, respectively.
Figure 4

Receiver operating characteristics (ROC) curves of models: (a) Training dataset; (b) validating dataset.

Generally, the results of both the success rate and prediction rate express reasonable and practical accuracies in the current research. However, the LR–IOE model shows the best result for the current study.

6. Discussion

Spatial prediction of landslides is a critical process in the study of landslides and the accuracy of prediction will be affected by the models that we used, and the input data extracted from explanatory variables. However, there is no definitive conclusion about the methods used to select and evaluate explanatory variables. Therefore, it is necessary to investigate the methods which will help us to obtain reasonable conclusions. In this study, we calculated the IOE and PCC to assess 10 explanatory variables, and evaluated three classification models, namely, IOE, LR–IOE, and SVM–IOE, for landslide susceptibility mapping. According to PCC values (Table 2), all 10 factors are less than 0.7, which means these 10 factors cannot generate noise in landslide susceptibility modeling. From the index of entropy (Table 4), we can see the residential areas have the highest value (7.555), which means that most landslides occurred in this region. We believe that the reason for this condition is the concentration of population and the fact that human engineering activities are intense in this area. Similarly, the closer to the road, the higher the frequency of landslides that occurred was. For the slope aspect, most landslides occurred on south-facing slopes; the reason for this condition may be the climate, and the same results were also reported by the authors of [37] (p. 82). The category C (Siltstone, sandstone, mudstone, shale, coal seam, glutenite) in lithology is the region where the largest number of landslides has occurred. This may be due to the softness of sandstone and siltstone structures and strong weathering erosion. In the case of slope angle and mean annual precipitation, the rate of landslide occurrence is roughly proportional to them. The reason may be that a large amount of water infiltrate increases the water content and weight of the rock and soil mass and increases the sliding force of the rock and soil mass, and the steeper the slope, the stronger the slip force of the rock and soil mass. Interestingly, with the values of distance to faults, distance to rivers, distance to roads, altitude, and NDVI increasing, the IOE is gradually decreasing. The reason for this phenomenon is that road construction usually causes instability, while roads in the study region are generally built at low altitudes and away from faults. The root of the vegetation is conducive to the stability of the soil, while the erosion of the rivers will affect the stability of the slope. These conditions are roughly the same as those observed in the field. In this study, the selection of explanatory variables was based on previous studies and field observations, which will cause interference from human factors. In addition, although we calculated all the W values for the 10 explanatory variables, it is not clear how much the method developed in the work is sensitive to the number of the classes and to the choice of the breaking points. Therefore, this is the focus of future research. As shown in Figure 4, we can see the AUC value of the LR–IOE model is the highest among the three models, whether it is for the success or prediction rate, which means that the LR–IOE model performs best in landslide susceptibility mapping in this study. However, the AUC value of the SVM–IOE model is the lowest, which may be due to the fact that the SVM–IOE model is more dependent on the selection of the kernel function, and there is no objective way to solve it. In terms of the proportion of the final susceptibility mapping results (Figure 5), it can be observed that the proportion of high and very high regions obtained by the three models is about 52%. Among them, the LR–IOE model has the lowest result (50.17%), which implies an efficient result corresponding to the LR–IOE model, and it can also improve the efficiency of decision-making and reduce costs.
Figure 5

Percentages of different landslide susceptibility classes for the three models.

7. Conclusions

In this present study, the IOE model, LR–IOE model, and SVM–IOE model were used to obtain landslide susceptibility maps for the Fugu County of Shaanxi Province, China. Ten explanatory variables, namely, altitude, slope aspect, mean annual precipitation, slope angle, lithology, distance to roads, land use, distance to rivers, distance to faults, and NDVI, were selected and the potential multicollinearity problem among them was detected by PCC, VIF, and TOL. The results of the analysis showed that there are no potential multicollinearity problems between these 10 factors and they are available for landslide susceptibility modeling. A total of 194 landslides, including landslides recognized from extensive field investigations and historical landslide records, and 194 nonlandslide points were also randomly generated. To build the models, 272 (70%) landslide and nonlandslide points were randomly selected and the remaining 116 (30%) landslide and nonlandslide points were applied for validating purposes. A natural break method was used to split the study region into four categories: Low, moderate, high, and very high. In the end, the performance of the achieved landslide susceptibility maps was evaluated using AUC values. In terms of the success rate presented by the AUC values, the LR–IOE model has the highest training accuracy (90.11%), followed by the IOE model (87.43%) and the SVM–IOE model (86.53%). As for the prediction rate, the LR–IOE model has the highest training accuracy (81.84%), followed by the IOE model (76.86%) and the SVM–IOE model (76.61%). Thus, the results prove that these three models present good performance in landslide susceptibility mapping. The LR–IOE model performed best for this research and is more suitable for landslide susceptibility mapping in the study area. The results of this study provide available information for the engineers, decision makers, and urban planners in this study region.
  3 in total

1.  Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling.

Authors:  Wei Chen; Shuai Zhang; Renwei Li; Himan Shahabi
Journal:  Sci Total Environ       Date:  2018-07-11       Impact factor: 7.963

2.  Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China.

Authors:  Wei Chen; Jianbing Peng; Haoyuan Hong; Himan Shahabi; Biswajeet Pradhan; Junzhi Liu; A-Xing Zhu; Xiangjun Pei; Zhao Duan
Journal:  Sci Total Environ       Date:  2018-02-19       Impact factor: 7.963

3.  GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models.

Authors:  Wei Chen; Hui Li; Enke Hou; Shengquan Wang; Guirong Wang; Mahdi Panahi; Tao Li; Tao Peng; Chen Guo; Chao Niu; Lele Xiao; Jiale Wang; Xiaoshen Xie; Baharin Bin Ahmad
Journal:  Sci Total Environ       Date:  2018-04-10       Impact factor: 7.963

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.