| Literature DB >> 35061733 |
Simon P Lailvaux1, Avdesh Mishra2, Pooja Pun3, Md Wasi Ul Kabir3, Robbie S Wilson4, Anthony Herrel5, Md Tamjidul Hoque3.
Abstract
Completing the genotype-to-phenotype map requires rigorous measurement of the entire multivariate organismal phenotype. However, phenotyping on a large scale is not feasible for many kinds of traits, resulting in missing data that can also cause problems for comparative analyses and the assessment of evolutionary trends across species. Measuring the multivariate performance phenotype is especially logistically challenging, and our ability to predict several performance traits from a given morphology is consequently poor. We developed a machine learning model to accurately estimate multivariate performance data from morphology alone by training it on a dataset containing performance and morphology data from 68 lizard species. Our final, stacked model predicts missing performance data accurately at the level of the individual from simple morphological measures. This model performed exceptionally well, even for performance traits that were missing values for >90% of the sampled individuals. Furthermore, incorporating phylogeny did not improve model fit, indicating that the phenotypic data alone preserved sufficient information to predict the performance based on morphological information. This approach can both significantly increase our understanding of performance evolution and act as a bridge to incorporate performance into future work on phenomics.Entities:
Mesh:
Year: 2022 PMID: 35061733 PMCID: PMC8782310 DOI: 10.1371/journal.pone.0261613
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Species names and sample size for each of the 68 taxa comprising the training and verification dataset.
Fig 2Phylogenetic relationships among the 68 lizard taxa from 8 families included in the final model.
Note that phylogeny had no effect on the predictive accuracy of the final, stacked model.
Derivation of indices used to evaluate model classification and prediction.
| Name of Metric | Definition |
|---|---|
| P | True value of the performance feature |
| Pavg | Mean of true values |
| Ppred | Predicted value of the corresponding performance feature |
| Ppred_avg | Mean of predicted values |
| N | Number of samples |
| Pearson Correlation Coefficient (PCC) |
|
| Mean Absolute Error (MAE) |
|
| Root Mean Square Error (RMSE) |
|
Optimum K-value search result (range 1 to 100), for various performance traits.
| Feature | Optimum K (based on RMSE) | Root Mean Square Error (RMSE) |
|---|---|---|
| Jump power | 165 | 7.47 |
| Jump acceleration | 29 | 1.98 |
| Bite force | 57 | 4.53 |
| Jump velocity | 16 | 0.07 |
| Endurance | 154 | 32.08 |
| Sprint speed | 84 | 0.65 |
| Jump distance | 46 | 0.05 |
| Stamina | 25 | 3.66 |
| Angle | 34 | 1.84 |
Configurations of the five stacked models.
| Base Layer | Meta Layer | |
|---|---|---|
| SM1 | XGBR, RFR, GBR, ETR | ETR |
| SM2 | XGBR, RFR, GBR, ETR | GBR |
| SM3 | XGBR, RFR, GBR, ETR | RFR |
| SM4 | XGBR, RFR, GBR, ETR | XGBR |
| SM5 | XGBR, RFR, GBR, ETR | SVR |
Pearson correlation coefficient (PCC) and mean absolute error (MAE) of features.
Jump acceleration exhibited the highest prediction accuracy (bolded). To aid in the interpretation of MAE, we have also provided the mean value for each performance feature from the overall training dataset, as well as the associated standard errors. Note that MAE has the same units as the associated performance trait.
| Feature | Regression method | Mean (±SE) | PCC | MAE |
|---|---|---|---|---|
| Jump power (W/kg) | SVR | 45.94(±0.15) | 0.77 | 1.21 |
|
| 32.17(±0.05) |
|
| |
| Bite force (N) | GBR | 7.74(±0.18) | 0.94 | 1.35 |
| Jump velocity (m/s) | XGBR | 1.57(±0.002) | 0.95 | 0.02 |
| Endurance (s) | GBR | 213.71(±0.65) | 0.28 | 6.70 |
| Sprint (m/s) | RFR | 1.35(±0.02) | 0.88 | 0.23 |
| Jump distance (m) | ETR | 0.33(±0.001) | 0.84 | 0.01 |
| Stamina (m) | XGBR | 16.53(±0.11) | 0.83 | 1.42 |
| Angle | XGBR | 36.44(±0.06) | 0.75 | 0.527 |
Pearson correlation coefficient (PCC) and mean absolute error (MAE) of different stacking models for various performance features.
| Performance feature | Stacked configuration | PCC | MAE |
|---|---|---|---|
| Jump power (W/kg) | SM2 | 0.98 | 0.49 |
| Jump acceleration (m/s2) | SM2 | 0.99 | 0.17 |
| Bite force (N) | SM2 | 0.99 | 0.57 |
| Jump velocity (m/s) | SM2 | 0.99 | 0.01 |
| Endurance (s) | SM2 | 0.95 | 1.73 |
| Sprint speed (m/s) | SM2 | 0.98 | 0.11 |
| Jump distance (m) | SM2 | 0.93 | 0.003 |
| Stamina (m) | SM2 | 0.98 | 0.63 |
| Angle | SM2 | 0.97 | 0.20 |
|
|
|
| |