| Literature DB >> 29991959 |
Abelardo Montesinos-López1, Osval A Montesinos-López2, Gustavo de Los Campos3, José Crossa4, Juan Burgueño4, Francisco Javier Luna-Vazquez2.
Abstract
BACKGROUND: Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. (Plant Methods 13(4):1-23, 2017a. 10.1186/s13007-016-0154-2; Plant Methods 13(62):1-29, 2017b. 10.1186/s13007-017-0212-4) proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis.Entities:
Keywords: Bayesian Ridge Regression; Bayesian functional regression; Functional data; Functional regression analysis; Hyperspectral data
Year: 2018 PMID: 29991959 PMCID: PMC5994840 DOI: 10.1186/s13007-018-0314-7
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1Reflectance (centered to a zero mean) measured over 250 wavelengths in the 392 to 850 nm range of the light spectrum. Each curve corresponds to data of a maize genotype planted in an irrigated environment and measured at Cd. Obregón, Mexico
Fig. 2Scatterplot of the hypothetical phenomenon. The dots represent the 100 data points measured. This smoothing plot was done using L = 11 basis for three values of the period T = 4 (black color), 6 (blue color) and 8 (red color)
Fig. 3Scatterplot of the hypothetical phenomenon. The dots represent the 100 data points measured: a this smoothing plot was done using L = 5 basis for three values of the B-spline with degree 1 (linear; black color), 2 (quadratic; red color) and 3 (cubic; blue color); b this smoothing plot was done using L = 11 basis for three values of the B-spline with degree 1, 2 and 3; c this smoothing plot was done using L = 25 basis for three values of the B-spline degree 1, 2 (quadratic) and 3 (cubic); d this smoothing plot was done using L = 51 basis for three values of the B-spline degree 1, 2 and 3
Methods proposed, predictors, basis type and type of model
| Method | Predictor of the model | Basis type | Type of model |
|---|---|---|---|
| M1 |
| None | Conventional regression |
| M2 |
| B-splines | Functional regression (Eq. |
| M3 |
| Fourier | Functional regression (Eq. |
| M4 |
| B-splines | Alternative 1 for Functional regression (Eq. |
| M5 |
| Fourier | Alternative 1 for Functional regression (Eq. |
| M6 |
| B-splines | Alternative 2 for Functional regression (Eq. |
| M7 |
| Fourier | Alternative 2 for Functional regression (Eq. |
Grain yield () is the vector response of the variable (trait of interest)
Prediction accuracy of grain yield with Pearson’s correlation for the 7 proposed methods with BRR prior distribution for different numbers of basis functions
| Method | Parameter | Number of basis | Average | |||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 11 | 17 | 23 | 29 | 35 | 41 | 45 | 51 | ||||||||||||||||||||||
| Pearson’s correlation | ||||||||||||||||||||||||||||||
| M1 | Mean | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a |
| SE | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | 0.023 | ||||||||||||||||||||
| M2 | Mean | 0.466 | a | A | 0.482 | a | A | 0.494 | a | A | 0.495 | a | A | 0.496 | a | A | 0.498 | a | A | 0.497 | a | A | 0.498 | a | A | 0.497 | a | A | 0.491 | a |
| SE | 0.020 | 0.023 | 0.021 | 0.024 | 0.022 | 0.022 | 0.022 | 0.023 | 0.237 | 0.022 | ||||||||||||||||||||
| M3 | Mean | 0.482 | a | A | 0.496 | a | A | 0.495 | a | A | 0.494 | a | A | 0.498 | a | A | 0.498 | a | A | 0.498 | a | A | 0.499 | a | A | 0.498 | a | A | 0.495 | a |
| SE | 0.020 | 0.023 | 0.023 | 0.024 | 0.025 | 0.025 | 0.025 | 0.024 | 0.025 | 0.024 | ||||||||||||||||||||
| M4 | Mean | 0.480 | a | A | 0.495 | a | A | 0.496 | a | A | 0.497 | a | A | 0.496 | a | A | 0.495 | a | A | 0.497 | a | A | 0.496 | a | A | 0.498 | a | A | 0.494 | a |
| SE | 0.024 | 0.025 | 0.024 | 0.025 | 0.025 | 0.025 | 0.026 | 0.026 | 0.024 | 0.025 | ||||||||||||||||||||
| M5 | Mean | 0.483 | a | A | 0.494 | a | A | 0.495 | a | A | 0.494 | a | A | 0.495 | a | A | 0.500 | a | A | 0.497 | a | A | 0.499 | a | A | 0.498 | a | A | 0.495 | a |
| SE | 0.021 | 0.023 | 0.025 | 0.024 | 0.026 | 0.023 | 0.025 | 0.024 | 0.024 | 0.024 | ||||||||||||||||||||
| M6 | Mean | 0.466 | a | A | 0.482 | a | A | 0.493 | a | A | 0.495 | a | A | 0.497 | a | A | 0.498 | a | A | 0.497 | a | A | 0.498 | a | A | 0.497 | a | A | 0.491 | a |
| SE | 0.020 | 0.023 | 0.022 | 0.024 | 0.022 | 0.022 | 0.022 | 0.023 | 0.024 | 0.022 | ||||||||||||||||||||
| M7 | Mean | 0.482 | a | A | 0.496 | a | A | 0.495 | a | A | 0.493 | a | A | 0.498 | a | A | 0.498 | a | A | 0.498 | a | A | 0.499 | a | A | 0.498 | a | A | 0.495 | a |
| SE | 0.020 | 0.023 | 0.023 | 0.024 | 0.025 | 0.025 | 0.025 | 0.024 | 0.025 | 0.024 | ||||||||||||||||||||
| Implementation time | ||||||||||||||||||||||||||||||
| M1 | Mean_T | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a | A | 30.89 | a |
| SE_T | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | 3.27 | ||||||||||||||||||||
| M2 | Mean_T | 8.4 | c | AB | 8.9 | c | AB | 9.32 | c | AB | 9.51 | c | AB | 9.34 | b | AB | 10.1 | b | A | 10.33 | b | A | 9.65 | b | AB | 6.79 | c | B | 9.150 | c |
| SE_T | 0.22 | 0.25 | 0.3 | 0.13 | 0.16 | 0.22 | 0.41 | 0.96 | 0.06 | 0.67 | ||||||||||||||||||||
| M3 | Mean_T | 7.89 | c | AB | 8.32 | c | AB | 8.61 | c | AB | 9.31 | c | AB | 9.79 | b | AB | 10.37 | b | A | 9.97 | b | AB | 8.91 | b | AB | 6.71 | c | B | 8.879 | c |
| SE_T | 0.19 | 0.04 | 0.14 | 0.33 | 0.04 | 0.07 | 0.52 | 0.82 | 0.01 | 0.7 | ||||||||||||||||||||
| M4 | Mean_T | 27.17 | b | A | 27.01 | b | A | 26.83 | b | A | 26.78 | b | A | 27.15 | a | A | 27.16 | a | A | 27.12 | a | A | 27.32 | a | A | 16.68 | b | B | 25.910 | b |
| SE_T | 0.22 | 0.11 | 0.28 | 0.36 | 0.63 | 0.52 | 0.44 | 0.5 | 0.16 | 1.95 | ||||||||||||||||||||
| M5 | Mean_T | 27.53 | b | A | 27.4 | b | A | 27.15 | b | A | 27.21 | b | A | 27.12 | a | A | 27.16 | a | A | 27.25 | a | A | 27.27 | a | A | 16.84 | b | B | 26.110 | b |
| SE_T | 1.12 | 1.12 | 0.83 | 0.91 | 0.63 | 0.67 | 0.62 | 0.73 | 0.3 | 2.04 | ||||||||||||||||||||
| M6 | Mean_T | 8.03 | c | AB | 8.5 | c | AB | 8.68 | c | AB | 9.12 | c | AB | 10.19 | b | AB | 10.59 | b | AB | 10.91 | b | A | 10.97 | b | A | 7.17 | c | B | 9.350 | c |
| SE_T | 0.04 | 0.08 | 0.09 | 0.07 | 0.19 | 0.23 | 0.05 | 0.46 | 0.59 | 0.79 | ||||||||||||||||||||
| M7 | Mean_T | 7.92 | c | AB | 8.34 | c | AB | 8.61 | c | AB | 9.2 | c | AB | 9.55 | b | AB | 9.76 | b | AB | 10.09 | b | A | 10.05 | b | AB | 7.15 | c | B | 8.970 | c |
| SE_T | 0.08 | 0.1 | 0.1 | 0.07 | 0.24 | 0.22 | 0.31 | 0.47 | 0.49 | 0.61 | ||||||||||||||||||||
Mean is the average Pearson’s correlation and SE is the standard error. Mean_T and SE_T are the average and standard error (in seconds) for implementing each scenario. Average is the average across the number of basis. Different lowercase letters by the columns indicate statistical differences between methods with the Tukey test at 5% level of significance. Different uppercase letters by row indicate statistical differences between numbers of basis with the Tukey test at 5% level of significance
Fig. 4Optimal number of basis (L) for method 7 with three regularization methods
Prediction accuracy of grain yield with Pearson’s correlation for the 7 proposed methods with BRR prior distribution for different numbers of periods for the Fourier basis
| Period | M3 | M5 | M7 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | SE | Mean | SE | Mean | SE | ||||
| 51 | 0.4609 | a | 0.0224 | 0.4607 | a | 0.0219 | 0.4616 | a | 0.0218 |
| 57.38 | 0.4658 | a | 0.0211 | 0.4655 | a | 0.0213 | 0.4658 | a | 0.0214 |
| 65.57 | 0.4639 | a | 0.0201 | 0.4623 | a | 0.0206 | 0.4636 | a | 0.02 |
| 76.5 | 0.4706 | a | 0.0219 | 0.4666 | a | 0.0214 | 0.4705 | a | 0.0217 |
| 91.8 | 0.4757 | a | 0.019 | 0.4755 | a | 0.0192 | 0.4755 | a | 0.0191 |
| 114.75 | 0.4636 | a | 0.0204 | 0.4631 | a | 0.0202 | 0.4638 | a | 0.0201 |
| 153 | 0.4854 | a | 0.0276 | 0.4851 | a | 0.0275 | 0.4853 | a | 0.0276 |
| 229.5 | 0.4726 | a | 0.0214 | 0.4732 | a | 0.0213 | 0.4727 | a | 0.0214 |
| 459 | 0.4935 | a | 0.0238 | 0.4936 | a | 0.0239 | 0.4931 | a | 0.0239 |
Mean is the average Pearson’s correlation and SE is the standard error. Different letters by the columns indicate statistical differences between periods with the Tukey test at 5% level of significance
Prediction accuracy of grain yield with Pearson’s correlation for the 7 proposed methods, under BayesA, BayesB and Bayes Lasso (BL) for three numbers of basis (5, 29 and 51)
| Method | Parameter | BayesA | BayesB | BayesLasso | Average | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of basis | Number of basis | Number of basis | ||||||||||||||||||||||||||||
| 5 | 29 | 51 | 5 | 29 | 51 | 5 | 29 | 51 | ||||||||||||||||||||||
| Pearson’s correlation | Pearson’s correlation | Pearson’s correlation | ||||||||||||||||||||||||||||
| M1 | Mean | 0.501 | a | A | 0.501 | a | A | 0.501 | a | A | 0.510 | a | A | 0.510 | a | A | 0.510 | a | A | 0.494 | a | A | 0.494 | a | A | 0.494 | a | A | 0.501 | a |
| SE | 0.022 | 0.022 | 0.022 | 0.020 | 0.020 | 0.020 | 0.022 | 0.022 | 0.022 | 0.021 | ||||||||||||||||||||
| M2 | Mean | 0.469 | a | A | 0.499 | a | A | 0.511 | a | A | 0.476 | a | A | 0.507 | a | A | 0.508 | a | A | 0.474 | a | A | 0.498 | a | A | 0.504 | a | A | 0.494 | a |
| SE | 0.023 | 0.024 | 0.021 | 0.027 | 0.020 | 0.023 | 0.216 | 0.022 | 0.024 | 0.023 | ||||||||||||||||||||
| M3 | Mean | 0.486 | a | A | 0.508 | a | A | 0.508 | a | A | 0.485 | a | A | 0.507 | a | A | 0.503 | a | A | 0.482 | a | A | 0.504 | a | A | 0.504 | a | A | 0.499 | a |
| SE | 0.020 | 0.025 | 0.025 | 0.020 | 0.024 | 0.026 | 0.020 | 0.026 | 0.026 | 0.024 | ||||||||||||||||||||
| M4 | Mean | 0.482 | a | A | 0.503 | a | A | 0.510 | a | A | 0.482 | a | A | 0.504 | a | A | 0.513 | a | A | 0.478 | a | A | 0.497 | a | A | 0.506 | a | A | 0.497 | a |
| SE | 0.023 | 0.023 | 0.024 | 0.023 | 0.024 | 0.025 | 0.023 | 0.025 | 0.021 | 0.024 | ||||||||||||||||||||
| M5 | Mean | 0.486 | a | A | 0.505 | a | A | 0.507 | a | A | 0.487 | a | A | 0.511 | a | A | 0.513 | a | A | 0.483 | a | A | 0.505 | a | A | 0.496 | a | A | 0.499 | a |
| SE | 0.012 | 0.024 | 0.024 | 0.020 | 0.025 | 0.025 | 0.021 | 0.021 | 0.026 | 0.023 | ||||||||||||||||||||
| M6 | Mean | 0.470 | a | A | 0.499 | a | A | 0.510 | a | A | 0.476 | a | A | 0.500 | a | A | 0.511 | a | A | 0.474 | a | A | 0.498 | a | A | 0.504 | a | A | 0.493 | a |
| SE | 0.022 | 0.024 | 0.021 | 0.027 | 0.023 | 0.022 | 0.022 | 0.022 | 0.024 | 0.023 | ||||||||||||||||||||
| M7 | Mean | 0.486 | a | A | 0.508 | a | A | 0.507 | a | A | 0.485 | a | A | 0.509 | a | A | 0.505 | a | A | 0.482 | a | A | 0.504 | a | A | 0.504 | a | A | 0.499 | a |
| SE | 0.020 | 0.025 | 0.024 | 0.020 | 0.025 | 0.023 | 0.020 | 0.026 | 0.026 | 0.023 | ||||||||||||||||||||
| Implementation Time | Implementation Time | Implementation Time | ||||||||||||||||||||||||||||
| M1 | Mean_T | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a | A | 30.911 | a |
| SE_T | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | 2.16 | ||||||||||||||||||||
| M2 | Mean_T | 9.180 | c | C | 11.233 | c | C | 11.106 | de | A | 10.170 | c | B | 12.160 | c | B | 11.093 | c | A | 12.720 | c | A | 15.340 | b | A | 9.900 | c | A | 11.430 | c |
| SE_T | 0.08 | 0.05 | 0.08 | 0.02 | 0.2 | 0.54 | 0.03 | 0.08 | 0.22 | 1.02 | ||||||||||||||||||||
| M3 | Mean_T | 9.230 | c | B | 10.826 | c | C | 10.910 | e | A | 9.893 | c | B | 12.176 | c | B | 10.973 | c | A | 12.190 | c | A | 15.113 | bc | A | 9.840 | c | A | 11.240 | c |
| SE_T | 0.2 | 0.22 | 0.06 | 0.1 | 0.2 | 0.64 | 0.34 | 0.17 | 0.07 | 1 | ||||||||||||||||||||
| M4 | Mean_T | 30.110 | a | B | 29.920 | a | AB | 27.186 | b | A | 30.110 | a | B | 26.746 | b | B | 26.310 | b | AB | 36.213 | a | A | 32.083 | a | A | 21.390 | b | B | 28.890 | ab |
| SE_T | 1.08 | 1 | 0.05 | 0.9 | 0.03 | 0.94 | 1.46 | 0.78 | 1.73 | 2.46 | ||||||||||||||||||||
| M5 | Mean_T | 26.720 | b | B | 26.426 | b | B | 26.633 | c | A | 26.620 | b | B | 26.463 | b | B | 26.030 | b | A | 32.390 | b | A | 31.776 | a | A | 21.463 | b | B | 27.170 | b |
| SE_T | 0.08 | 0.14 | 0.18 | 0.13 | 0.2 | 0.6 | 0.3 | 0.8 | 1.62 | 1.87 | ||||||||||||||||||||
| M6 | Mean_T | 8.060 | c | C | 9.423 | c | C | 11.353 | d | A | 8.676 | c | B | 10.433 | d | B | 11.980 | c | A | 10.806 | c | A | 13.360 | bc | A | 10.256 | c | A | 10.480 | c |
| SE_T | 0.1 | 0.11 | 0.05 | 0.06 | 0.11 | 0.27 | 0.14 | 0.04 | 1.85 | 1.05 | ||||||||||||||||||||
| M7 | Mean_T | 7.970 | c | C | 9.610 | c | C | 11.196 | de | A | 8.796 | c | B | 10.583 | d | B | 11.973 | c | A | 10.943 | c | A | 13.083 | c | A | 10.060 | c | A | 10.460 | c |
| SE_T | 0.16 | 0.07 | 0.03 | 0.08 | 0.04 | 0.7 | 0.05 | 0.25 | 1.66 | 1.01 | ||||||||||||||||||||
Mean is the average Pearson’s correlation and SE is the standard error. Mean_T and SE_T are the average and standard error (in seconds) for implementing each scenario. The average column was calculated across the numbers of basis of the three methods by row. Different lowercase letters by the columns indicate statistical differences between methods with the Tukey test at 5% level of significance. Different uppercase letters by the rows indicate statistical differences between regularization methods with the Tukey test at 5% level of significance