| Literature DB >> 27603135 |
Benjamin R Fitzpatrick1,2,3, David W Lamb2,4, Kerrie Mengersen1,2,3,5.
Abstract
Modern soil mapping is characterised by the need to interpolate point referenced (geostatistical) observations and the availability of large numbers of environmental characteristics for consideration as covariates to aid this interpolation. Modelling tasks of this nature also occur in other fields such as biogeography and environmental science. This analysis employs the Least Angle Regression (LAR) algorithm for fitting Least Absolute Shrinkage and Selection Operator (LASSO) penalized Multiple Linear Regressions models. This analysis demonstrates the efficiency of the LAR algorithm at selecting covariates to aid the interpolation of geostatistical soil carbon observations. Where an exhaustive search of the models that could be constructed from 800 potential covariate terms and 60 observations would be prohibitively demanding, LASSO variable selection is accomplished with trivial computational investment.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27603135 PMCID: PMC5014409 DOI: 10.1371/journal.pone.0162489
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The 63 potential covariates.
| Source | Covariate Name | Acronym |
|---|---|---|
| ATV Top of Pasture Surveys 12 covariates from each of February, May & November = 36 covariates | Soil Apparent Electrical Conductivity | ECA |
| Near InfraRed Reflectance | NIR | |
| Red Reflectance | RED | |
| Simple Ratio | SR | |
| Difference Vegetation Index | DVI | |
| Normalized Difference Vegetation Index | NDVI | |
| Soil Adjusted Vegetation Index | SAVI | |
| Non-Linear Vegetation Index | NLVI | |
| Modified Non-Linear Vegetation Index | MNLVI | |
| Modified Simple Ratio | MSR | |
| Transformed Vegetation Index | TVI | |
| Re-normalised Difference Vegetation Index | RDVI | |
| Terrain & Hydrology Metrics Calculated from 25 | Catchment Area | CatAr |
| Catchment Height | CatHe | |
| Catchment Slope | CatSl | |
| Cosine(Aspect) | CosAsp | |
| Elevation | Elev | |
| Slope Length Factor | LSF | |
| Plan Curvature | PlanC | |
| Profile Curvature | ProfC | |
| Sky View Factor | SVF | |
| Slope | Slp | |
| Stream Power Index | SPI | |
| Terrain Ruggedness Index | TRI | |
| Topographic Position Index | TPI | |
| Vector Terrain Ruggedness | VTR | |
| Visible Sky | VS | |
| Wetness Index | WI | |
| Foliar Projective Cover Layers = 2 Covariates | 2011 | FPCI |
| 2012 | FPCII | |
| Electromagnetic Channels = 6 Covariates | 1 to 6 | MagI—MagVI |
| Potassium | K | |
| Thorium | Th | |
| Uranium | U |
Summary statistics for the absolute values of validation set element prediction error (VSEPE) distributions from each variable selection method conducted on design matrices filtered to enforce a maximum correlation coefficient magnitude between covariate pairs of 0.4 or 0.95 (|r| ≤ 0.4 or |r| ≤ 0.95).
The final column contains the coefficient of determination (R2) values for the model-averaged predictions (MAP) from the models resulting from the combinations of variable selection technique and design matrix filtering austerity specified by that row. LAR = Least Angle Regression Variable Selection, Exh = Exhaustive Search Variable Selection, Seq = Sequential Replacement Variable Selection, Fwd = Forward Stepwise Variable Selection, Bwd = Backward Stepwise Variable Selection, Min. = Minimum, 1st Qu. = First Quartile, 3rd Qu. = Third Quartile and Max. = Maximum.
| VSEPE | MAP | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | | | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
| LAR | 0.95 | 1.332e-05 | 0.1482 | 0.3184 | 0.4744 | 0.5446 | 4.437 | 0.5963 |
| LAR | 0.40 | 1.097e-05 | 0.1517 | 0.3324 | 0.4776 | 0.5695 | 4.063 | 0.3666 |
| Exh | 0.40 | 5.571e-05 | 0.1644 | 0.3419 | 0.4964 | 0.5997 | 4.290 | 0.2882 |
| Seq | 0.40 | 5.571e-05 | 0.1677 | 0.3448 | 0.4960 | 0.6044 | 3.961 | 0.3055 |
| Fwd | 0.40 | 5.571e-05 | 0.1604 | 0.3392 | 0.4955 | 0.5994 | 4.063 | 0.3046 |
| Bwd | 0.40 | 1.036e-05 | 0.1654 | 0.3593 | 0.5053 | 0.6037 | 4.422 | 0.2382 |
Fig 1Histograms depicting the distribution of subset sizes selected by each variable selection technique applied to training sets constructed from the 27 covariate design matrix.
LAR = Least Angle Regression Variable Selection, Exh = Exhaustive Search Variable Selection, Seq = Sequential Replacement Variable Selection, Fwd = Forward Stepwise Variable Selection, Bwd = Backward Stepwise Variable Selection, Min. = Minimum, 1st Qu. = First Quartile, 3rd Qu. = Third Quartile and Max. = Maximum.
The 15 most frequently selected covariates from the LAR variable selection executions on the 500 unique, 35 observation training sets constructed from the design matrix created by filtering the full design matrix to enforce a maximum permitted correlation coefficient magnitude between remaining covariates pairs of 0.95.
The second column contains the frequencies with which the selected covariates occurred in the 500 selected models. Accompanying each selected covariate in the final column are the covariates from the full design matrix that had correlation coefficient magnitudes with the covariate in question greater than 0.95 and thus were excluded from the design matrix supplied to the variable selection. Colons denote interaction terms for the two covariate terms which the colon separates. Numeric superscripts denote polynomial terms for the covariate indicated by the acronym. Acronyms are expanded in Table 1.
| Covariate | Freq | Correlated Covariates |
|---|---|---|
| ECA.Nov4 | 219 | - |
| LSF3 | 139 | Slp3, TRI3, LSF4, Slp4, TRI4 |
| DVI.May | 102 | SAVI.May, NLVI.May, MNLVI.May, RDVI.May |
| WI | 100 | - |
| ECA.Feb:Slp | 95 | ECA.Feb:TRI |
| Mag.II:FPCI | 95 | - |
| SVF:Mag.IV | 94 | - |
| Slp2 | 89 | LSF:Slp, LSF:TRI, Slp:TRI, TRI:WI, TRI2 |
| ECA.Feb:SR.May | 88 | ECA.Feb:NDVI.May, ECA.Feb:SAVI.May, ECA.Feb:MSR.May, ECA.Feb:TVI.May, ECA.Feb:RDVI.May |
| LSF:SVF | 82 | LSF:VTR, SVF:Slp, SVF:TRI |
| ECA.Nov:DVI.Nov | 78 | ECA.Nov:MNLVI.Nov |
| Elev:SVF | 76 | - |
| ECA.Feb:DVI.Nov | 74 | ECA.Feb:MNLVI.Nov, ECA.Feb:RDVI.Nov |
| ECA.Nov3 | 73 | - |
| ECA.Feb:Elev | 72 | - |
Fig 2The frequencies with which covariate terms were selected across 500 selected models.
These selected models were obtained by applying the Least Angle Regression variable selection algorithm to training sets constructed by taking 35 observation subsets of a design matrix. This design matrix was produced by filtering the full design matrix to enforce a maximum permitted correlation coefficient magnitude between covariate pairs of 0.95. The curved lines (Poincaré segments) represent interaction terms between the covariates they connect. Covariate acronyms are expanded in Table 1.
Fig 3The observed soil organic carbon percentages (%SOC) at the soil core locations have been represented by the shade filling the circles located at each of the soil core sample locations.
The observed %SOC values have been represented with the same grey scale as the predicted %SOC values and associated uncertainties in the rasters. (a) The sum of the covariate based predictions and the predictions from the model for the spatial component of the errors from the covariate based model. The more westerly pixel annotated with a vertical cross represents a predicted %SOC value of 17.92 and the more easterly pixel annotated with a vertical cross represents a predicted %SOC value of 9.54. (b) The uncertainty estimated to accompany the %SOC predictions. The three pixels annotated with vertical crosses represent estimates of the uncertainty associated with the model-averaged predicted %SOC values of 20.57, 21.66 and 43.66 units on the predicted %SOC scale. The estimated uncertainty of 43.66 being the most westerly of these three pixels and the estimated uncertainty of 20.57 being the most northerly of these three pixels.