| Literature DB >> 35222479 |
Xuelin Xie1, Xinye Zhang2, Jingfang Shen1, Kebing Du3.
Abstract
Floods, as one of the most common disasters in the natural environment, have caused huge losses to human life and property. Predicting the flood resistance of poplar can effectively help researchers select seedlings scientifically and resist floods precisely. Using machine learning algorithms, models of poplar's waterlogging tolerance were established and evaluated. First of all, the evaluation indexes of poplar's waterlogging tolerance were analyzed and determined. Then, significance testing, correlation analysis, and three feature selection algorithms (Hierarchical clustering, Lasso, and Stepwise regression) were used to screen photosynthesis, chlorophyll fluorescence, and environmental parameters. Based on this, four machine learning methods, BP neural network regression (BPR), extreme learning machine regression (ELMR), support vector regression (SVR), and random forest regression (RFR) were used to predict the flood resistance of poplar. The results show that random forest regression (RFR) and support vector regression (SVR) have high precision. On the test set, the coefficient of determination (R2) is 0.8351 and 0.6864, the root mean square error (RMSE) is 0.2016 and 0.2780, and the mean absolute error (MAE) is 0.1782 and 0.2031, respectively. Therefore, random forest regression (RFR) and support vector regression (SVR) can be given priority to predict poplar flood resistance.Entities:
Keywords: feature selection; flood disaster; machine learning; model establishment and evaluation; prediction of waterlogging tolerance
Year: 2022 PMID: 35222479 PMCID: PMC8874143 DOI: 10.3389/fpls.2022.821365
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Scientific names of 20 poplar varieties.
| Varieties | Scientific names |
| LS68 |
|
| LS81 |
|
| NL895 |
|
| I-63 |
|
| I-69 |
|
| I-72 |
|
| I-214 |
|
| I-45-51 |
|
| Flevo |
|
| Juba |
|
| LH04-13 |
|
| LH04-17 |
|
| Triplo |
|
| DD102-4 |
|
| Raspalje |
|
| Danhong |
|
| Canadensis |
|
| 2L2025 |
|
| Ningshanica |
|
| Lushan |
|
The specific meanings and correlation coefficients of 26 features.
| Features | Specific meaning | Unit |
| AHs/Cs | Ball-Berry parameter | Dimensionless |
| Cond | Conductance to H2O | mol H2O m–2 s–1 |
| CndCO2 | Total conductance to CO2 | mol CO2 m–2 s–1 |
| CO2S | CO2 concentration on Sample cell | μmol CO2 mol–1 |
| CO2S/CO2R | CO2 concentration on Sample cell/CO2 concentration on Reference cell | Dimensionless |
| C2sfc | CO2 concentration on Leaf Surface | μmol CO2 mol–1 |
| Ci_Pa | Intercellular CO2 partial pressure | Pa |
| Ci/Ca | Intercellular CO2/Ambient CO2 | Dimensionless |
| CndTotal | Total conductance to water vapor | mol H2O m–2 s–1 |
| CTleaf | Computed leaf temperature | °C |
| Fo | Minimal fluorescence (dark) | bit |
| Fm | Maximal fluorescence (dark) | bit |
| Fv | Variable fluorescence | bit |
| H2OS | H2O concentration on Sample cell | mmol H2O mol–1 |
| H2OS/H2OR | H2O concentration on Sample cell/H2O concentration on Reference cell | Dimensionless |
| H2Oi | Intercellular H2O concentration | mmol H2O mol–1 |
| H2Odiff | Difference between Intercellular H2O and Sample cell H2O | mmol H2O mol–1 |
| Photo | Photosynthetic rate | μmol CO2 m–2 s–1 |
| PARabs | Absorbed Photosynthetically active radiation | μmol m–2 s–1 |
| PhiCO2 | Quantum yield corresponding to CO2 assimilation rate | Dimensionless |
| qN_Fo | Non-photochemical quenching (Calculated by Fo) | Dimensionless |
| RH_S | Relative humidity in the sample cell | % |
| RH_S/RH_R | Relative humidity on Sample cell/Relative humidity on Reference cell | Dimensionless |
| SVTleaf | Saturated vapor pressure calculated by leaf temperature | Pa |
| Trans | Transpiration rate | mol H2O m–2 s–1 |
| VpdL | Vapor pressure deficit based on Leaf temperature | kPa |
FIGURE 1Flow chart of Methodology.
RMSE of models with different Nodes or Mtry.
| BPR | ELMR | RFR | |||
| Nodes | RMSE | Nodes | RMSE | Mtry | RMSE |
| 3 | 0.5654 | 2 | 0.3836 | 1 | 0.1940 |
| 5 | 0.4221 | 3 | 0.3785 | 2 | 0.2654 |
| 7 | 0.3703 | 4 | 0.3591 | 3 | 0.3210 |
| 9 | 0.3680 | 5 | 0.3426 | 4 | 0.3558 |
| 11 | 0.3939 | 6 | 0.3438 | 5 | 0.3982 |
Machine learning model parameters.
| Methods | Model parameter |
| BPR | Training function: trainbr |
| Number of input layers: 3 | |
| Number of hidden layers: 9 | |
| Number of output layers: 1 | |
| Transfer function: logsig, purelin (Input-Hidden, Hidden-Output) | |
| net.trainparam.goal: 0.0001 | |
| net.trainparam.lr: 0.01 | |
| net.trainparam.epochs: 1000 | |
| ELMR | Training function: elmtrain |
| Number of input layers: 3 | |
| Number of hidden layers: 5 | |
| Number of output layers: 1 | |
| Activation function: sigmoid | |
| SVR | Training function: svmtrain |
| Model: ε-SVR | |
| Kernel function: RBF | |
| Regularization parameter C: 65 | |
| Gamma: 0.001 | |
| p: 0.01 | |
| RFR | Training function: TreeBagger |
| Number of decision trees: 200 | |
| Minimum number of leaves: 1 | |
| Fraction of in-bag observations (FBoot): 1 |
Descriptive statistics of the three evaluation indicators.
| Methods | Min | Q1 | Med | Q3 | Max |
| Zbio | −2.409917 | −0.748933 | −0.095463 | 0.615642 | 2.614667 |
| Zsap | −1.984857 | −0.799594 | −0.033385 | 0.722554 | 2.419308 |
| Zscore | −2.076712 | −0.554362 | −0.039611 | 0.466923 | 2.257776 |
FIGURE 2Distribution of three evaluation indexes.
FIGURE 3Heat map of Pearson’s correlation coefficient. (A) Heat map of 26 features. (B) Heat map of 6 features. (C) Specific values of 6 features.
FIGURE 4Regression prediction of single feature. (A) Fv. (B) qN_Fo. (C) Fm. (D) H2OS/H2OR. (E) RH_S/RH_R.
FIGURE 5Results of poplar features and varieties clustering. (A) Total. (B) Varieties. (C) Features.
Main information of poplar varieties.
| Samples | Z score | V type |
| Canadensis | −0.31136376 | A |
| DD102-4 | −0.659417544 | A |
| Flevo | −0.149084082 | A |
| I-214 | −1.018704692 | A |
| I-63 | 0.244020869 | A |
| LH04-17 | 0.348729466 | A |
| Ningshanica | −0.843303666 | A |
| Danhong | 0.714574083 | B |
| Juba | 0.528127585 | B |
| LH04-13 | 0.845992953 | B |
| I-69 | 0.356889463 | C |
| I-72 | −0.12622992 | C |
| LS68 | −0.662543236 | C |
| LS81 | 0.717527146 | C |
| Lushan | 0.282405203 | C |
| NL895 | −0.227444975 | C |
| I-45-51 | −0.702652584 | A |
| Triplo | −0.561219766 | A |
| 2L2025 | 0.570264018 | B |
| Raspalje | −0.103562886 | C |
FIGURE 6The regression results of the four machine learning methods on the training set. (A) Results of Predicted. (B) Predicted and actual value. (C) Comparison of results of four methods. (D) The R2, RMSE and MAE of the four machine learning methods on the training set.
The R2, RMSE and MAE of the training set.
| Methods | BPR | ELMR | SVR | RFR |
| R2 | 0.5847 | 0.6401 | 0.7027 | 0.8847 |
| RMSE | 0.3680 | 0.3426 | 0.3113 | 0.1940 |
| MAE | 0.3019 | 0.2741 | 0.1920 | 0.1591 |
FIGURE 7The regression results of the four machine learning methods on the test set. (A) Results of Predicted. (B) Predicted and actual value. (C) Comparison of results of four methods. (D) The R2, RMSE and MAE of the four machine learning methods on the test set.
The R2, RMSE and MAE of the test set.
| Methods | BPR | ELMR | SVR | RFR |
| R2 | 0.5703 | 0.6207 | 0.6864 | 0.8351 |
| RMSE | 0.3254 | 0.3057 | 0.2780 | 0.2016 |
| MAE | 0.2806 | 0.2456 | 0.2032 | 0.1782 |
FIGURE A1Results of multiple linear regression.