| Literature DB >> 27560980 |
Sylvain Delerce1, Hugo Dorado1, Alexandre Grillon2, Maria Camila Rebolledo3, Steven D Prager1, Victor Hugo Patiño1, Gabriel Garcés Varón4, Daniel Jiménez1.
Abstract
Seasonal and inter-annual climate variability have become important issues for farmers, and climate change has been shown to increase them. Simultaneously farmers and agricultural organizations are increasingly collecting observational data about in situ crop performance. Agriculture thus needs new tools to cope with changing environmental conditions and to take advantage of these data. Data mining techniques make it possible to extract embedded knowledge associated with farmer experiences from these large observational datasets in order to identify best practices for adapting to climate variability. We introduce new approaches through a case study on irrigated and rainfed rice in Colombia. Preexisting observational datasets of commercial harvest records were combined with in situ daily weather series. Using Conditional Inference Forest and clustering techniques, we assessed the relationships between climatic factors and crop yield variability at the local scale for specific cultivars and growth stages. The analysis showed clear relationships in the various location-cultivar combinations, with climatic factors explaining 6 to 46% of spatiotemporal variability in yield, and with crop responses to weather being non-linear and cultivar-specific. Climatic factors affected cultivars differently during each stage of development. For instance, one cultivar was affected by high nighttime temperatures in the reproductive stage but responded positively to accumulated solar radiation during the ripening stage. Another was affected by high nighttime temperatures during both the vegetative and reproductive stages. Clustering of the weather patterns corresponding to individual cropping events revealed different groups of weather patterns for irrigated and rainfed systems with contrasting yield levels. Best-suited cultivars were identified for some weather patterns, making weather-site-specific recommendations possible. This study illustrates the potential of data mining for adding value to existing observational data in agriculture by allowing embedded knowledge to be quickly leveraged. It generates site-specific information on cultivar response to climatic factors and supports on-farm management decisions for adaptation to climate variability.Entities:
Mesh:
Year: 2016 PMID: 27560980 PMCID: PMC4999131 DOI: 10.1371/journal.pone.0161620
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Location map of the study areas.
Summary of data sources for cropping events.
| Number of observations available | |||
|---|---|---|---|
| Dataset | Years with available data for the study | In Saldaña (irrigated rice) | In Villavicencio (rainfed rice) |
| (a) Harvest monitoring | 2007 to 2014 | 945 | 268 |
| (b) National Rice Survey | 2007 to 2012 | 95 | 28 |
| (c) Sowing date experiments | 2012 to 2013 | 200 | 79 |
List of the variables used in the models.
| Variable name | Meaning | Type | Unit |
|---|---|---|---|
| Cultivar | Cultivar that was grown | Categorical | |
| TX_Avg_VEG | Average maximum temperature in vegetative stage | Continuous | °C |
| TX_Avg_REP | Average maximum temperature in reproductive stage | Continuous | °C |
| TX_Avg_RIP | Average maximum temperature in ripening stage | Continuous | °C |
| TM_Avg_VEG | Average minimum temperature in vegetative stage | Continuous | °C |
| TM_Avg_REP | Average minimum temperature in reproductive stage | Continuous | °C |
| TM_Avg_RIP | Average minimum temperature in ripening stage | Continuous | °C |
| TA_Avg_VEG | Average temperature in vegetative stage | Continuous | °C |
| TA_Avg_REP | Average temperature in reproductive stage | Continuous | °C |
| TA_Avg_RIP | Average temperature in ripening stage | Continuous | °C |
| DR_Avg_VEG | Average diurnal range in vegetative stage | Continuous | °C |
| DR_Avg_REP | Average diurnal range in reproductive stage | Continuous | °C |
| DR_Avg_RIP | Average diurnal range in ripening stage | Continuous | °C |
| TX_35_Freq_VEG | frequency of days with maximum temperature above 35°C in vegetative stage | Continuous | --- |
| TX_37_Freq_REP | frequency of days with maximum temperature above 37°C in reproductive stage | Continuous | --- |
| TX_31_Freq_RIP | frequency of days with maximum temperature above 31°C in ripening stage | Continuous | --- |
| P_Accu_VEG | Accumulated precipitation in vegetative stage | Continuous | mm |
| P_Accu_REP | Accumulated precipitation in reproductive stage | Continuous | mm |
| P_Accu_RIP | Accumulated precipitation in ripening stage | Continuous | mm |
| P_10_Freq_VEG | Frequency of days with more than 10 mm precipitation in vegetative stage | Continuous | --- |
| P_10_Freq_REP | Frequency of days with more than 10 mm precipitation in reproductive stage | Continuous | --- |
| P_10_Freq_RIP | Frequency of days with more than 10 mm precipitation in ripening stage | Continuous | --- |
| RH_Avg_VEG | Average relative humidity in vegetative stage | Continuous | % |
| RH_Avg_REP | Average relative humidity in reproductive stage | Continuous | % |
| RH_Avg_RIP | Average relative humidity in ripening stage | Continuous | % |
| SR_Accu_VEG | Accumulated solar energy in vegetative stage | Continuous | Cal·cm-2 |
| SR_Accu_REP | Accumulated solar energy in reproductive stage | Continuous | Cal·cm-2 |
| SR_Accu_RIP | Accumulated solar energy in ripening stage | Continuous | Cal·cm-2 |
| Yield | Crop productivity | Continuous | Kg·ha-1 |
Summary of the variability observed among all cropping events between 2007 and 2014 in each site.
| Saldaña | Villavicencio | |||||
|---|---|---|---|---|---|---|
| Variable | Minimum | Maximum | Coefficient of variation | Minimum | Maximum | Coefficient of variation |
| TX (°C) | 23.4 | 39.6 | 0.07 | 23 | 36 | 0.07 |
| TM (°C) | 18.6 | 27.3 | 0.04 | 18 | 26 | 0.05 |
| P_accu (mm) | 115 | 1,229 | 0.36 | 987 | 1,934 | 0.16 |
| P_10_Freq | 0.03 | 0.23 | 0.37 | 0.22 | 0.43 | 0.15 |
| RH (%) | 42 | 95.6 | 0.10 | 61.9 | 96 | 0.07 |
| SR_accu (cal·cm-2) | 40,146 | 69,543 | 0.06 | 39,508 | 52,543 | 0.04 |
| Yield (kg·ha-1) | 2,000 | 10,750 | 0.21 | 1,750 | 8,200 | 0.29 |
See Table 2 for variables definitions.
Comparison of regression methods relative to their ability to handle different types of data problems.
| Linear Models | Neural Networks | Trees | Support Vector Machine | |
|---|---|---|---|---|
| [-] Non-linear relationships require transformation before training the model, which requires prior knowledge. | [+] Neural networks have universal approximation capabilities for non-linear relationships [ | [+] Can model non-linear relationships. | [+] Can model non-linear relationships. | |
| [-] Needs preliminary transformation of categorical variables. | [-] Needs preliminary transformation of categorical variables. | [+] Uses recursive binary partitions. Therefore handles categorical variables inherently. | [-] Needs preliminary transformation of categorical variables. | |
| [-] Needs preliminary imputation of missing values | [-] Needs preliminary imputation of missing values | [+] Can use surrogate splits to overcome missing data. | [-] Needs preliminary imputation of missing values | |
| [-] Typically influenced by outliers. Therefore needs preliminary filtering of such values. | [-] Known to suffer a lack of robustness towards outliers when using a classical error measure [ | [+] Resilient to the effects of predictor outliers. | [-] One of the well-known risks of large margin training methods, such as SVMs is their sensitivity to outliers [ | |
| [-] Typically influenced by any transformation of the inputs. | [-] Typically influenced by any transformation of the inputs. | [+] Invariant under (strictly monotone) transformations of the individual predictors. | [-] Typically influenced by any transformation of the inputs. | |
| [+] Assigns a low coefficient to irrelevant inputs. | [-] Does not cope well with irrelevant input. | [+] Performs internal feature selection. Thereby resistant to inclusion of many irrelevant predictors. | [-] Does not cope well with irrelevant input. | |
| [+] White-box model. | [-] Black-box model. | [+] Grey-box model. Tends to be white-box if the number of splits is small. | [-] Black-box model. | |
| [-] Needs previous filtering of such correlated predictors. | [-] Needs previous filtering of such correlated predictors. | [-] Needs previous filtering of such correlated predictors. | [-] Needs previous filtering of such correlated predictors. |
([+] = good, [-] = poor).
Fig 2Variables importance of CIF models including all cultivars.
(a) Saldaña and (b) Villavicencio. Lowercase letters to the right of the boxplots show the results of the Kruskal-Wallis test, with statistically similar variables grouped by the same letter
CIF Models inputs and results.
| Model | Observations | Runs | Average R-squared | R-squared standard deviation |
|---|---|---|---|---|
| Saldaña—F733 | 267 | 100 | 29.9% | 7.9 |
| Saldaña—F60 | 150 | 100 | 46.6% | 8.6 |
| Saldaña—Lagunas | 187 | 100 | 6.9% | 5.6 |
| Villavicencio—F174 | 134 | 100 | 28.1% | 11.9 |
Fig 3Boxplots of conditional permutation based VI scores using CIF for specific cultivars.
In Saldaña for (a) cultivars F733 and (b) F60, and in Villavicencio for (c) cultivar F174. Lowercase letters to the right of the boxplots show the results of the Kruskal-Wallis test, with statistically similar variables grouped by the same letter.
Fig 4Partial dependence plots of the most relevant predictors.
(a) Saldaña-F733, (b) Saldaña-F60 and (c) Villavicencio-F174. Tick marks represent individual observations.
Fig 5Patterns of the daily series of the 5 climatic variables characterizing the clusters.
(left) Cluster 1 and (right) cluster 15. Individual patterns of each cropping event appear in grey, the black line represents the median of the cluster.
Fig 6Boxplots of the observed yield distributions in each cluster in Saldaña.
Clusters are sorted from left to right in decreasing order of median yield value. Lowercase letters above the boxplots show the results of the Kruskal-Wallis test, with statistically similar clusters grouped by the same letter.
Fig 7Boxplots of the yield distributions by cultivar in Saldaña.
(a) Cluster 2 and (b) cluster 10. Cultivars are sorted from left to right in decreasing order of yield median value. Lowercase letters above the boxplots show the results of the Kruskal-Wallis test, with statistically similar cultivars grouped by the same letter.