Literature DB >> 31613923

Estimation of soil pH with geochemical indices in forest soils.

Wei Wu1, Hong-Bin Liu2.   

Abstract

Soil pH is a critical soil quality index and controls soil microbial activities, soil nutrient availability, and plant roots growth and development. The current study aims to evaluate various pedotransfer functions for predicting soil pH using different geochemical indices (CaO, ratios of Al2O3, Fe2O3, TiO2, SiO2, MgO, and K2O to CaO) in forest soils. Various models including empirical functions (quadratic, cubic, sigmoid, logarithmic) and artificial neural network with these geochemical indices were assessed by independent testing set. Mean bias error (MBE), root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), coefficient of determination (R2), t-statistics (t-stat), and Akaike's Information Criterion (AIC) were applied to evaluate the model performances. Additionally, a new indicator (global performance indictor, GPI) was originally introduced in this study and was used to rank these models. According to GPI, the sigmoid functions and ANNs performed better than others. On average, they could explain above 70% of the variability in soil pH. Both model structure and dataset shape impact on model performance. The best input was CaO for ANNs, sigmoid, and logarithmic functions. The ratios of K2O to CaO and Al2O3 to CaO were the best inputs for quadratic and cubic equations, respectively.

Entities:  

Year:  2019        PMID: 31613923      PMCID: PMC6793886          DOI: 10.1371/journal.pone.0223764

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Soil pH indicates soil acidity and alkalinity. Generally, slightly acidic soils are optimal for macro- and micro-nutrients availability [1]. Soil pH impacts on soil nutrients and plant growth and development [2]. It is a critical element for understanding soil nutrient availability and weathering as well as relationships between soil and biota. The relationship between soil pH and base saturation has been well studied. Some researchers observed a curvilinear relationship between soil pH and Ca saturation [3, 4]. Others reported a linear relationship between them [5, 6]. Soil CaO has been applied to predict soil pH with other geochemical elements. For example, Lukens et al. used ratios of Fe2O3, TiO2, and Al2O3 to CaO to predict soil pH with sigmoid functions [7]. The models produced similar prediction accuracy with coefficient of determination changing between 0.7 and 0.74, root mean square error between 0.83 and 0.88. Nordt and Driese found that bulk soil CaO + MgO could be used to predict soil pH in Vertisol [8]. The prediction of soil pH using bulk soil elemental oxides is also an issue in pedotransfer functions. Soil CaO, is one source of Ca2+ supply to soil solution, we believe that itself could be used to estimate soil pH. However, studies on this topic were limited. The objectives of the current study were to (1) evaluate various pedotransfer functions for predicating soil pH using several geochemical indices and (2) investigate the usefulness of soil CaO to predict soil pH. To do this, five models with different geochemical indices were compared and tested. Specifically, artificial neural networks were evaluated with respect to the non-linear relationship between soil pH and the geochemical indices. Model performances were evaluated by an independent validation set.

Materials and methods

Study site

The study area covering 13326 km2 is located in the core region of the Three Gorges Reservoir of China (Fig 1). It has a humid subtropical monsoon climate with a mean annual precipitation of 1267 mm and a mean annual temperature of 16.02°C. The elevation varies between 175 and 2033 m with a mean of 643 m. The slope changes between 0.45° and 52.96° with a mean of 17.83°.
Fig 1

Maps of study area location and sample sites.

Data

A total of 1163 samples were collected from forest soils in the study area (Fig 1), where the major bedrock lithologies are carbonate rocks and sandstone and soil type is Combisols [9]. The study did not involve private land, protected land, endangered or protected species. No specific permissions were required for these locations/activities. In order to ensure an even distribution of selected sites, systematic sampling using a regular grid was applied in this work [10]. Surface soils at 0–20 cm depth were collected at a density of 1 sample/km2. For each sampling site, 3 to 5 subsamples collected within 50 m of the site were mixed to represent the sample. All the sampling locations were recorded by Global Positioning System (GPS). Standard measurements were performed on the soil samples. Prior to laboratory analysis, samples were air-dried and passed through a 2 mm soil sieve. Soil pH was determined in a soil-to-water ratio of 1:2.5 with a glass electrode. The elements (Al2O3, Fe2O3, TiO2, SiO2, K2O, Mg2O, and CaO) were measured by Inductively Coupled Plasma-Optical Emission Spectrometry (ICP-OES) method [10]. Ratios of Al2O3, Fe2O3, TiO2, SiO2, MgO, and K2O to CaO (hereafter AlCa, FeCa, TiCa, SiCa, MgCa, and KCa) and CaO were used to develop the pedotransfer functions to predict soil pH in forest soils [7]. These geochemical indices were calculated by where X represents Al2O3, Fe2O3, TiO2, SiO2, MgO, and K2O. All data were divided into calibration and validation sets for each dataset. Approximately 2/3 of the data were used to develop (or train) the models. The remaining 1/3 of the data were used to validate the models.

Models

Both empirical functions (quadratic, cubic, sigmoid, and logarithmic) and artificial neural network were tested in this work. The expressions of these empirical functions are given in Table 1. For sigmoid function, parameter k and p are the minimum and range of the response, respectively.
Table 1

Empirical models used in the current study.

NameAb.Equation
QuadraticQy = b0+b1x+b2x2
CubicCy = b0+b1x+b2x2+b3x3
SigmoidaSigy=k+p1+(xb0)b1
LogarithmicLogy = b0+b1ln(x)

ak and p are the minimum and range of the response, respectively.

ak and p are the minimum and range of the response, respectively. The artificial neural networks (ANNs) that are inspired by biological neural network are also frequently used tools for various fields [11-13]. ANNs can deal with both linear and non-linear relationships between variables [11, 12]. In the current study, ANNs with three layers (an input, a hidden, and an output layers) were tested and trained with scale conjugate gradient back propagation algorithm (Fig 2). The output of a node is, where f is an activation function, y is the output of a node j, x is an input of the vector of inputs, w is the weight connected the input x to the node j, and b is a bias associated with the node j. The parameters (weight and bias) are determined during the training stage based on a set of input data and targets. The tangent and linear activation functions were used in the hidden layer and output layer, respectively [14-17].
Fig 2

ANN structure.

The numbers of neurons in the hidden layer between 2 and 20 were tried. To train the ANNs, three datasets were created randomly based on the calibration dataset for training (70%), validating (15%), and testing (15%). The ANNs with the lowest value of root mean square error (RMSE) and the highest value of coefficient of determination (R2) were selected to predict soil pH using the geochemical indices. Number of parameters was calculated by [18], where N, N, N, and 1 are number of node in the input, hidden, output layers and bias, respectively.

Performance evaluation

Model performances could be evaluated by comparing predicted and measured data based on a set of statistical error indicators. In this work, mean bias error (MBE), root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and coefficient of determination (R2), t-statistics (t-stat), and Akaike’s Information Criterion (AIC) [19] were employed to assess the model performances based on the independent validation set. where n is the number of observations, y, and ŷ are the measured and estimated soil pH of the ith soil sample, respectively, is the mean value of the measured soil pH, k is the number of parameters. MBE shows overall under- or over-estimation tendency. A negative value of MBE indicates an overestimation of the model, and a positive one indicates an underestimation of the model. The most accurate model has an MBE value closed to zero, lower values of RMSE, MAPE, MAE, t-stat, AIC, and a higher value of R2. Each statistical error indicator has its specific strength and weakness. For example, RMSE is not a better indicator than MBE for evaluating average model performance [20]. However, MBE could not give the correct performance when the model has overestimations and underestimations at the same time. Therefore, to find out the best model based on the above-mentioned indicators, a new Global Performance Indicator (GPI) was introduced in this work. Each indicator should be scaled on a scale of 0–1 with 0 being the best and 1 representing the worst. For the indicators that have negative or positive values, their absolute values are used in GPI. For the indicators that the lower the better (e.g., RMSE and MAPE etc.), the minimum is scaled to 0 and maximum to 1 (Eq 11). For the indicators that the higher the better (e.g., R2), the maximum is scaled to 0 and minimum to 1 (Eq 12). For the ith model, the GPI was defined as, Where P is the performance indicator. Pmax and Pmin are the maximum and minimum of P for the corresponding indicators of the evaluated models. Iij is the scaled value of indicator j for the ith model and m is the number of performance indicators. Models with GPI closer to zero perform better.

Statistical analysis

A one-way analysis of variance (ANOVA) was used to test the difference in variables between calibration and validation sets. Pearson’s correlation coefficients were calculated to determine the strength of correlations between soil pH and geochemical indices. The analyses of descriptive statistics were performed in SPSS v13.0. Model development and validation were done by MATLAB v9.0.

Results

Data overview

On average, the soils were neutral. Soil pH varies between 4.34 and 8.7 with a mean of 7.16 (Table 2). CaO mainly ranged between 0 and 30% (mean = 2.63%), Al2O3 between 12 and 15% (mean = 14.4%), Fe2O3 between 3 and 6% (mean = 5.2%), TiO2 between 0.5 and 0.8% (mean = 0.75%), SiO2 between 50 and 70% (mean = 62.9%), MgO between 0 and 2% (mean = 1.9%), K2O between 2.2 and 2.7% (mean = 2.5%) (Fig 3). In terms of coefficient of variation (CV%), soil pH showed low variability (< 25%). Among the geochemical indices, SiCa and AlCa presented low variability (< 25%), FeCa, TiCa, MgCa, KCa showed medium variability (25% - 75%) and CaO presented high variability (> 75%).
Table 2

Descriptive statistics of soil pH and geochemical indices (N = 1163).

MinMaxMedianMeanStd. DevCV%
pH4.348.77.467.161.0915.22
CaO (%)0.0829.981.102.634.05153.77
AlCa (%)24.6299.5493.0987.5513.5915.52
FeCa (%)10.0998.5382.6775.4819.0525.24
TiCa (%)1.3591.0039.7139.5721.3253.86
SiCa (%)48.6599.8798.3195.716.937.24
MgCa (%)4.4692.2661.5755.8717.2330.84
KCa (%)5.7395.9970.3363.421.3533.67
Fig 3

Histogram plots for the geochemical elements.

Soil pH showed significant correlation with these geochemical indices (Table 3 and Fig 4).
Table 3

Pearson’s correlation coefficients between soil pH and geochemical indices (p<0.01).

CaO (%)AlCa (%)FeCa (%)TiCa (%)SiCa (%)MgCa (%)KCa (%)
0.5-0.61-0.68-0.83-0.49-0.71-0.76
Fig 4

Relationships between soil pH and the geochemical indices.

Differences in soil pH and geochemical indices between calibration and validation sets were given in Table 4. Results of ANOVA indicated that there was no significant difference in these variables between calibration and validation sets.
Table 4

Differences in soil pH and geochemical indices between calibration and validation sets (N = 877 and 286 for calibration (Cal) and validation (Val) sets, respectively.).

ItemMinMaxMedianMeanStd.DevFp value
pHCal4.528.67.417.141.090.9540.329
Val4.348.77.577.211.08
CaO(%)Cal0.1129.981.102.624.040.0140.904
Val0.0824.741.112.664.08
AlCa(%)Cal24.6299.0393.1887.5713.670.0080.929
Val31.1499.5493.0787.4813.38
FeCa(%)Cal10.0997.4682.6775.5519.120.0480.827
Val13.7998.5382.6775.2618.85
TiCa(%)Cal1.3585.7039.7139.6521.250.0500.823
Val1.8291.0039.739.3321.54
SiCa(%)Cal48.6599.8798.3195.726.900.0150.902
Val57.8099.8798.395.667.00
MgCa(%)Cal4.4691.7261.7356.030.580.2840.594
Val6.8392.2661.0755.41.05
KCa(%)Cal5.7394.3170.263.520.720.1050.746
Val7.0895.9970.6663.041.27

Model calibration

The coefficients of determination (R2) of the developed models based on the calibration set are given in Table 5. The ANNs with 18, 7, 11, 7, 14, 19, and 15 hidden nodes were applied to estimate soil pH using CaO, AlCa, FeCa, SiCa, TiCa, MgCa, KCa, and respectively (Fig 5). On average, ANN produced the highest value of R2 (0.73), followed by sigmoid (R2 = 0.7) and cubic (R2 = 0.63) equations. The values of R2 ranged between 0.21 (p < 0.01, logarithmic equation with SiCa) and 0.77 (p < 0.01, ANN with SiCa).
Table 5

Model calibration (N = 877, p<0.01).

InputFunctionb0b1b2b3R2
CaOQuadratic6.41030.4148-0.01560.43
Cubic6.04270.8099-0.06840.00160.56
Sigmoid0.68231.39140.74
Logarithmic6.92240.78880.64
ANN0.76
AlCaQuadratic1.2810.2493-0.0020.59
Cubic24.0871-0.87360.0151-0.000080.71
Sigmoid94.973331.040.73
Logarithmic20.2979-2.95350.3
ANN0.77
FeCaQuadratic6.33990.1023-0.00110.65
Cubic9.7379-0.12440.0031-0.000020.69
Sigmoid87.011310.66160.67
Logarithmic14.689-1.76640.34
ANN0.7
TiCaQuadratic8.6033-0.0276-0.000180.69
Cubic8.23320.02226-0.001670.0000120.7
Sigmoid51.16432.4650.7
Logarithmic10.4478-0.95950.51
ANN0.72
SiCaQuadratic-22.13960.8114-0.00530.42
Cubic-2.412900.0055-0.0000460.45
Sigmoid98.8379120.58820.73
Logarithmic34.5359-6.01050.21
ANN0.77a
MgCaQuadratic7.69460.0516-0.0010.6
Cubic7.46040.0718-0.00153.352E-60.6
Sigmoid67.22645.70290.6
Logarithmic13.2347-1.54160.37
ANN0.61
KCaQuadratic7.46430.0595-0.00090.73
Cubic8.13550.00450.0003-7.469E-60.73
Sigmoid77.06937.16290.73
Logarithmic12.8597-1.40930.41
ANN0.76

aBox in grey denoted the highest value of R2.

Fig 5

Root mean square error (RMSE) and coefficient of determination (R2) for ANNs with different numbers of hidden nodes (The black box indicates the lowest value of RMSE or highest value of R2).

aBox in grey denoted the highest value of R2.

Model performance

Performances of the models were evaluated based on the validation set and the statistical error indicators were shown in Table 6. On average, all models except sigmoid functions presented underestimation tendency according to MBE. In terms of MAPE, models gave good estimation of soil pH (mean MAPE = 7.4%). ANN and sigmoid models could explain above 70% of the variability in soil pH (R2 = 0.73 and 0.71, respectively). Logarithmic model performed worst with the highest values of MBE, RMSE, MAPE, MAE, AIC, and the lowest values of R2. ANN gave the best estimations of soil pH according to RMSE, MAPE, MAE, t-stat, and R2. Sigmoid model performed best based on AIC and MBE. The geochemical indices gave varied prediction performances with models. For example, SiCa produced the highest R2 in ANNs, KCa in quadratic and cubic functions, CaO in logarithmic and sigmoid models. Lukens et al. [7] predicted soil pH by AlCa, FeCa, and TiCa using sigmoid models. They reported that TiCa and FeCa gave slightly better performances than AlCa. In the current work, CaO, AlCa, SiCa, and KCa produced better predictions of soil pH than FeCa and TiCa using sigmoid functions based on R2.
Table 6

Model performance (N = 286).

Fun.InputMBERMSEMAPEMAER2AICt-statGPIRank
ANNAlCa0.0140.5140.0540.3710.78-0.4460.4721.012
FeCa0.0070.5870.0640.4390.71-0.2260.2072.845
SiCa0.0240.5080.0540.3650.78-0.4550.7941.213
TiCa0.0390.5650.0610.4140.73-0.1951.1653.36
MgCa0.0340.6790.0750.5070.61-0.3620.8475.527
KCa0.0690.5340.0570.3930.76-0.9252.2182.64
CaO0.0140.5120.0540.3670.78-0.5570.4760.811
QAlCa0.0470.6980.0850.5890.59-0.6921.1323.084
FeCa0.040.6380.0740.5140.65-0.871.0591.552
SiCa0.0680.8250.1040.710.42-0.3561.3896.487
TiCa0.0610.5910.0650.4460.71-1.0241.7432.333
MgCa0.0580.6940.0770.5270.5-0.7021.4123.725
KCa0.0520.5560.0610.420.74-1.1471.5791.191
CaO0.0660.8150.1020.70.44-0.3891.3776.196
CAlCa0.0360.5790.0670.4650.72-1.0561.0431.211
FeCa0.0460.5950.0650.4470.7-1.0041.3082.233
SiCa0.0510.7210.0890.610.56-0.6191.20667
TiCa0.0510.5840.0630.4310.71-1.041.4842.424
MgCa0.0560.6940.0770.5290.59-0.6951.375.265
KCa0.0540.5430.0580.4030.75-1.1851.6891.92
CaO0.0480.710.0880.6010.57-0.6581.1515.496
LogAlCa0.0720.9190.1170.7970.28-0.1481.3265.886
FeCa0.070.8850.1120.7650.34-0.2231.3435.335
SiCa0.0680.9760.1250.850.19-0.0271.1796.327
TiCa0.0630.7690.0950.6540.5-0.5051.43.433
MgCa0.0460.8610.1080.7360.61-0.2780.9032.752
KCa0.0590.8450.1060.7260.4-0.3171.1754.014
CaO0.0560.6420.0760.530.65-0.8731.4861.381
SigAlCa-0.0650.5570.0630.4350.75-1.1351.9882.684
FeCa-0.0430.6250.0720.4940.68-0.9041.1614.066
SiCa-0.0720.550.0620.4290.76-1.1592.2392.695
TiCa0.0030.5930.0660.4530.7-1.0120.0851.893
MgCa-0.0050.7010.0790.5410.59-0.6770.1255.057
KCa-0.0270.5640.0620.4290.74-1.110.81.672
CaO0.0330.5330.0570.3970.76-1.2311.0450.881
MeanQ0.0560.6880.0810.5580.58-0.7401.3844.164
Cubic0.0490.6320.0720.4980.66-0.8941.3222.793
Log0.0620.8420.1060.7230.42-0.3391.2596.755
Sig-0.0250.5890.0660.4540.71-1.0331.0630.821
ANN0.0290.5570.0600.4080.73-0.4520.8830.932
Overall mean0.0340.6290.0740.5070.59-0.5661.184

Box in grey presented the best performance suggested by the corresponding error indicator.

Box in grey presented the best performance suggested by the corresponding error indicator. Models gave different prediction accuracy indicated by different statistical error indicators. For example, ANN with SiCa was the best one in terms of RMSE, MAPE, MAE, and R2. Sigmoid function with TiCa performed best based on MBE and t-stat. Cubic with KCa was the best according to AIC. Because the used statistical error indicators did not always give the consistent results, the GPI was introduced and calculated by combining these indicators. The ranking of the models according to each accuracy indicator and GPI was reported in Table 6. On average, the results of GPI indicated that sigmoid model, ANN, and cubic were ranked 1st, 2nd, and 3rd. The model performance indicated by GPI was acceptable and better, because it combined all the performance tests. GPIs were also calculated within each model. The geochemical indices gave different performance for the evaluated models. CaO ranked 1st in ANNs, sigmoid and logarithmic functions. KCa ranked 1st in quadratic models. Therefore, CaO and KCa were the best inputs to predict soil pH for both ANNs and the empirical equations over the study site. Scatter plots of the observed and predicted soil pH by ANN with CaO and sigmoid with CaO were given in Fig 6. Statistics of validation results were listed in Table 7. The maximum pH values were underestimated while the minimums were overestimated for both models. There was no significant difference in soil pH between observations and predictions for the two models.
Fig 6

Scatter plot of the observed and predicted soil pH by (a) artificial neural network with CaO and (b) sigmoid with CaO.

The red dash line is the 1:1 line.

Table 7

Statistics of validation results (N = 286).

pHMinMaxMedianMeanStd.DevFp value
Observation4.348.77.577.211.08
Predicted by ANN with CaO4.878.367.357.20.980.0280.868
Predicted by sigmoid with CaO4.728.577.227.1810.1430.706

Scatter plot of the observed and predicted soil pH by (a) artificial neural network with CaO and (b) sigmoid with CaO.

The red dash line is the 1:1 line.

Discussion

On average, ANNs performed better than cubic, quadratic, and logarithmic functions. Among the empirical approaches, sigmoid function was the best one. Model structure results in the differences between them [21]. ANN constructs a network connected with weighted nodes that were trained by certain algorithms. Compared with other models, the main advantages of ANNs are: 1) they are non-parametric techniques and do not need any model assumptions; 2) ANNs have no assumption on data distribution. Generally, ANN is often criticized for its complex network structure that makes the results difficult to interpret [22]. The indicator, AIC, based on an “information-theoretical approach” has been widely used for model selection [23-25]. In this case, ANNs produced higher values of AIC than others, due to the larger number of model parameters. Besides, data set shape also impacts on model performance, especially for the empirical functions. The rank order of them are sigmoid > cubic > quadratic > logarithmic functions. The best input was CaO for ANNs, sigmoid and logarithmic functions. The ratios of K2O to CaO and Al2O3 to CaO were the best inputs for quadratic and cubic equations, respectively. CaO and the ratios of elemental oxides to CaO could be used to predict soil pH, because Ca2+ is the main driver affecting soil pH [7]. The sigmoid functions indicated the geochemical indices have different rates of change in soil pH. This was also given by the scatter plots (Fig 4). The oxides that were more abundant than CaO had higher values of growth rate and inflection point (e.g., SiO2, Al2O3, Fe2O3) and vice versa (e.g., TiO2, MgO, K2O). Lukens et al. (2018) stated that samples collected from calcareous soils could have a relatively large values of FeCa or AlCa and compressed intervals at higher index values, where pH decreases as a function of Ca loss and Fe or Al gain. This could also explain the relationships between soil pH and the ratios of elemental oxides to CaO over the current study site. Soil pH is a key parameter for understanding soil weathering and relationships between soil nutrient availability and environmental factors. Weathering indices that incorporate Ca in some form could track soil pH. A recent study reported that soil pH values are closely correlated with water balance (mean annual precipitation–mean annual potential evapotranspiration) at global scale [26]. The pedotransfer functions and geochemical proxies compared and evaluated in the current study could be used to estimate significantly environmental components in the past time [7].

Conclusions

Various pedotransfer functions with different geochemical indices were applied to estimate soil pH in forest soils. The predicted data were compared to the measurements of an individual validation dataset. In order to do so, 7 statistical indicators have been applied to test models performances. Moreover, a new accuracy factor, named Global Performance Indicator (GPI), was originally introduced in this study and was used to rank the proposed models. The rank order was sigmoid > artificial neural network > cubic > quadratic > logarithmic. Soil CaO could be used to predict soil pH with ANNs, sigmoid and logarithmic functions. KCa and AlCa were the best inputs for quadratic and cubic equations, respectively.

Data.

(CSV) Click here for additional data file.
  3 in total

Review 1.  Variables selection methods in near-infrared spectroscopy.

Authors:  Zou Xiaobo; Zhao Jiewen; Malcolm J W Povey; Mel Holmes; Mao Hanpin
Journal:  Anal Chim Acta       Date:  2010-03-30       Impact factor: 6.558

2.  Water balance creates a threshold in soil pH at the global scale.

Authors:  E W Slessarev; Y Lin; N L Bingham; J E Johnson; Y Dai; J P Schimel; O A Chadwick
Journal:  Nature       Date:  2016-11-21       Impact factor: 49.962

3.  Feedforward neural network model estimating pollutant removal process within mesophilic upflow anaerobic sludge blanket bioreactor treating industrial starch processing wastewater.

Authors:  Philip Antwi; Jianzheng Li; Jia Meng; Kaiwen Deng; Frank Koblah Quashie; Jiuling Li; Portia Opoku Boadi
Journal:  Bioresour Technol       Date:  2018-02-20       Impact factor: 9.642

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.