| Literature DB >> 33456317 |
Tomislav Hengl1, Johan G B Leenaars2, Keith D Shepherd3, Markus G Walsh4, Gerard B M Heuvelink5, Tekalign Mamo6, Helina Tilahun7, Ezra Berkhout8, Matthew Cooper9, Eric Fegraus10, Ichsani Wheeler11, Nketia A Kwabena12.
Abstract
Spatial predictions of soil macro and micro-nutrient content across Sub-Saharan Africa at 250 m spatial resolution and for 0-30 cm depth interval are presented. Predictions were produced for 15 target nutrients: organic carbon (C) and total (organic) nitrogen (N), total phosphorus (P), and extractable-phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sulfur (S), sodium (Na), iron (Fe), manganese (Mn), zinc (Zn), copper (Cu), aluminum (Al) and boron (B). Model training was performed using soil samples from ca. 59,000 locations (a compilation of soil samples from the AfSIS, EthioSIS, One Acre Fund, VitalSigns and legacy soil data) and an extensive stack of remote sensing covariates in addition to landform, lithologic and land cover maps. An ensemble model was then created for each nutrient from two machine learning algorithms- random forest and gradient boosting, as implemented in R packages ranger and xgboost-and then used to generate predictions in a fully-optimized computing system. Cross-validation revealed that apart from S, P and B, significant models can be produced for most targeted nutrients (R-square between 40-85%). Further comparison with OFRA field trial database shows that soil nutrients are indeed critical for agricultural development, with Mn, Zn, Al, B and Na, appearing as the most important nutrients for predicting crop yield. A limiting factor for mapping nutrients using the existing point data in Africa appears to be (1) the high spatial clustering of sampling locations, and (2) missing more detailed parent material/geological maps. Logical steps towards improving prediction accuracies include: further collection of input (training) point samples, further harmonization of measurement methods, addition of more detailed covariates specific to Africa, and implementation of a full spatiotemporal statistical modeling framework.Entities:
Keywords: Africa; Machine learning; Macro-nutrients; Micro-nutrients; Random forest; Soil nutrient map; Spatial prediction
Year: 2017 PMID: 33456317 PMCID: PMC7745107 DOI: 10.1007/s10705-017-9870-x
Source DB: PubMed Journal: Nutr Cycl Agroecosyst ISSN: 1385-1314 Impact factor: 3.270
Fig. 1Combined histograms (at log-scale) for the soil macro-nutrients based on a compilation of soil samples for Sub-Saharan Africa
Fig. 2Combined histograms (at log-scale) for the soil micro-nutrients based on a compilation of soil samples for Sub-Saharan Africa
Fig. 3Comparison of spatial coverage of sampling locations for four nutrients: ext. P, ext. K, ext. Mg and ext. Fe. Data sources: AfSIS Sentinel Sites soil samples, EthioSIS soil samples, Africa Soil Profiles DB soil samples, IFDC-PBL soil samples, One Acre Fund soil samples, University of California soil samples and Vital Signs soil samples. See text for more details
Fig. 9Accuracy assessment plots for all nutrients. Predictions derived using 5–fold cross-validation. All values expressed in ppm and displayed on a log-scale
Fig. 10Predicted spatial distribution of the determined clusters (20) (above), and the corresponding map of scaled Shannon Entropy Index (below). High values in scaled Shannon Entropy Index indicate higher prediction uncertainty. Cluster centers are given in Table 3
Class centers for 20 clusters determined using supervised fuzzy k-means clustering
| Cluster | org. C | org. N | K | P | P tot. | Ca | Mg | Na | S | Fe | Mn | Zn | Cu | B |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| c1 | 247 | 13.7 | 1570 | 306 | 113 | 13 | 123 | 119 | 3.8 | 2.2 | 0.4 | |||
| c2 | 1840 | 280 | 12.4 | 344 | 321 | 49 | 97 | 250 | 0.7 | |||||
| c3 | 2230 | 286 | 115 | 78.7 | 366 | 463 | 114 | 48 | 29 | 134 | 128 | 5.3 | 2.7 | 0.9 |
| c4 | 3580 | 335 | 109 | 33.7 | 333 | 631 | 162 | 69 | 23 | 117 | 120 | 4.7 | 2.8 | 0.6 |
| c5 | 3090 | 383 | 211 | 27.9 | 449 | 1720 | 282 | 171 | 99 | 134 | 4.2 | 2.7 | 2.4 | |
| c6 | 1890 | 246 | 76.3 | 13.4 | 628 | 166 | 76 | 24 | 86 | 130 | 9.5 | 5.4 | 0.4 | |
| c7 | 1190 | 291 | 19.5 | 237 | 295 | 46 | 28 | 115 | 121 | 4.7 | 2.3 | |||
| c8 | 62 | 290 | 231 | 104 | 36 | 9 | 1.7 | |||||||
| c9 | 2840 | 372 | 335 | 21.3 | 303 | 3780 | 669 | 56 | 116 | 4.1 | 2.5 | 0.7 | ||
| c10 | 3870 | 439 | 297 | 27 | 444 | 3200 | 416 | 267 | 246 | 90 | 166 | 4.1 | 2.9 | 2.1 |
| c11 | 5780 | 704 | 607 | 2840 | 572 | 474 | 33 | 86 | 117 | 4.4 | 2.6 | 0.9 | ||
| c12 | 112 | 7.74 | 278 | 797 | 214 | 41 | 140 | 74 | 1.5 | |||||
| c13 | 34 | 5 | 79.4 | 821 | 447 | 133 | 37 | 114 | ||||||
| c14 | 3750 | 803 | 101 | 22.7 | 465 | 518 | 150 | 59 | 21 | 116 | 113 | 4.0 | 2.6 | 0.6 |
| c15 | 1620 | 1040 | 82.2 | 26.6 | 482 | 332 | 115 | 56 | 28 | 108 | 116 | 3.9 | 2.8 | 0.6 |
| c16 | 4260 | 514 | 269 | 21.8 | 451 | 5790 | 496 | 41 | 76 | 136 | 4.3 | 3.2 | 0.8 | |
| c17 | 3200 | 393 | 65.6 | 21.6 | 301 | 357 | 127 | 51 | 22 | 139 | 122 | 5.0 | 2.4 | 1.8 |
| c18 | 2600 | 330 | 64.7 | 15 | 742 | 145 | 51 | 101 | 27.5 | 30.1 | 0.6 | |||
| c19 | 6890 | 580 | 133 | 24.4 | 413 | 665 | 162 | 56 | 17 | 122 | 114 | 4.0 | 2.3 | 0.5 |
| c20 | 13,700 | 1100 | 416 | 20.8 | 756 | 944 | 262 | 17 | 132 | 2.3 | 3.2 | 0.7 |
Underlined numbers indicate highest values per nutrient; italic indicates top two lowest concentrations per class
Fig. 4Principal component analysis plots generated using sampled data: (left) biplot using first two components, (right) biplot using the third and fourth component. Prior to PCA, original values were transformed to compositions using the compositions package. P is the extractable phosphorus, and P.T is the total phosphorus
List of target soil macro- and micro-nutrients of interest and summary results of model fitting and cross-validation
| Nutrient | Method | N | 1% | 50% | 99% | R-square | RMSE |
|---|---|---|---|---|---|---|---|
| org. N | total (organic) N extractable by wet oxidation | 63,937 | 0.0 | 600.0 | 4200 | 0.66 | 558 |
| tot. P | total phosphorus | 0 | 132 | 3047 | 0.85 | 284 | |
| ext. K | extractable by Mehlich 3 | 104,784 | 0 | 130 | 1407.5 | 0.64 | 201 |
| ext. Ca | extractable by Mehlich 3 | 105,173 | 14 | 1162 | 14288 | 0.69 | 1950 |
| ext. Mg | extractable by Mehlich 3 | 103,356 | 1.2 | 242 | 2437 | 0.78 | 241 |
| ext. Na | extractable by Mehlich 3 | 71,986 | 0 | 30.13 | 2690 | 0.61 | 452 |
| ext. S | extractable by Mehlich 3 | 43,666 | 0.6 | 9 | 51 | 78 | |
| ext. Al | extractable by Mehlich 3 | 30,945 | 0 | 874 | 2120 | 0.84 | 171 |
| ext. P | extractable by Mehlich 3 | 42,984 | 0 | 6 | 188 | 43 | |
| ext. B | extractable by Mehlich 3 | 43,338 | 0 | 0.33 | 2.09 | 0.41 | 0.47 |
| ext. Cu | extractable by Mehlich 3 | 45,572 | 0.001 | 2.2 | 10.6 | 0.54 | 2.11 |
| ext. Fe | extractable by Mehlich 3 | 18,341 | 0 | 121 | 574 | 0.68 | 53 |
| ext. Mn | extractable by Mehlich 3 | 44,689 | 1.8 | 124 | 440 | 0.53 | 69 |
| ext. Zn | extractable by Mehlich 3 | 45,626 | 0.1 | 2.1 | 26.03 | 0.47 | 4.0 |
All values are expressed in ppm. N = ‘‘Number of samples used for training’’, R-square = ‘‘Coefficient of determination’’ (amount of variation explained by the model based on cross-validation) and RMSE = ‘‘Root Mean Square Error’’. Underlined cells indicate poorer models (or too small sample sizes)
Top ten most important covariates per nutrient, reported by the ranger package
| Nutrient | Most important covariates (10) |
|---|---|
| org. N | |
| tot. P | Precipitation July, density of mineral exploration sites (Al), precipitation August, September, lithology, precipitation February, LSTD August, mean annual precipitation, water vapor January-February, precipitation June |
| ext. K | Soil pH, water vapour July-August, DEM, precipitation January, std. EVI April, precipitation February, water vapor January-February, depth, cloud fraction February, water vapor November-December |
| ext. Ca | |
| ext. Mg | Soil pH, water vapor January-February, Landsat NIR, Landsat SWIR1, cloud fraction February, Landsat SWIR2, water vapor November-December, LSTD March, water vapor March-April, Landsat SWIR1 |
| ext. Na | |
| ext. S | Lithology, Landsat SWIR2, cloud fraction December, precipitation October, May, TWI (DEM), precipitation November, std. EVI July-August, LSTD November |
| ext. Al | Soil pH, LSTD November, precipitation November, TWI, LSTD December, cloud fraction November, DEM, cloud fraction December, precipitation total, precipitation February |
| ext. P | Valley depth (DEM), precipitation July, Deviation from mean (DEM), precipitation November, DEM, std. EVI May-June, precipitation January, positive openness (DEM), mean EVI July-August, mean EVI May-June |
| ext. B | Precipitation August, January, depth, precipitation November, soil pH, DEM, std. EVI July-August, precipitation September, positive openness (DEM), precipitation December |
| ext. Cu | Water vapor May-June, precipitation December, water vapor November-December, July-August, September-October, depth, water vapor January-February, precipitation July, cloud fraction November, precipitation August |
| ext. Fe | Water vapor January-February, density of mineral exploration sites (Phosphates), water vapor September-October, July-August, cloud fraction seasonality, water vapor May-June, March-April, depth, DEM, cloud fraction mean annual |
| ext. Mn | |
| ext. Zn | Precipitation January, December, mean EVI May-June, precipitation March, std. EVI March-April, precipitation February, November, April, TWI |
Explanation of codes: depth = depth from soil surface, LSTD = MODIS mean monthly Land Surface Temperature day-time, LSTN= MODIS mean monthly Land Surface Temperature night-time, EVI = MODIS Enhanced Vegetation Index, TWI = topographicwetness index, DEM = Digital Elevation Model, NIR = Landsat Near Infrared band, SWIR = Landsat Shortwave Infrared band.Underlined covariates indicate distinct importance
Fig. 5Predicted soil macro-nutrient concentrations (0–30 cm) for Sub-Saharan Africa. All values are expressed in ppm
Fig. 6Predicted soil micro-nutrient concentrations (0–3 cm) and extractable Al for Sub-Saharan Africa. All values are expressed in ppm
Fig. 7Examples of nutrient deficiency maps based on our results: zoom in on town Bukavu at the border between the eastern Democratic Republic of the Congo (DRC) and Rwanda. Points indicated samples used for model training. The thresholdlevels are based on Roy et al. (2006, p.78) ranging from very low (<50% expected yield) to medium (80–100% yield) to very high (100% yield). All values are in ppm’s. Background data source: OpenStreetMap
Fig. 8Examples of locally defined nutrient deficiency maps based on our results: Eastern Africa. The adopted threshold levels are based on Roy et al. (2006, p.78) ranging from very low (<50% expected yield) to medium (80–100% yield) to very high (100% yield). All values are in ppm’s. Background data source: OpenStreetMap
Fig. 11Variable importance plot for prediction of the crop yield using the model from Eq. (5). Training points include 7954 legacy rows for 606 trials
Fig. 12Examples of predicted potential crop yield for the land mask of SSA (excluding: forests, semi-deserts and deserts, tropical jungles and wetlands). Circles indicate the OFRA field trials database points used to train the model