| Literature DB >> 35100324 |
Ravena Rocha Bessa de Carvalho1, Diego Fernando Marmolejo Cortes2, Massaine Bandeira E Sousa2, Luciana Alves de Oliveira2, Eder Jorge de Oliveira2.
Abstract
Phenotyping to quantify the total carotenoids content (TCC) is sensitive, time-consuming, tedious, and costly. The development of high-throughput phenotyping tools is essential for screening hundreds of cassava genotypes in a short period of time in the biofortification program. This study aimed to (i) use digital images to extract information on the pulp color of cassava roots and estimate correlations with TCC, and (ii) select predictive models for TCC using colorimetric indices. Red, green and blue images were captured in root samples from 228 biofortified genotypes and the difference in color was analyzed using L*, a*, b*, hue and chroma indices from the International Commission on Illumination (CIELAB) color system and lightness. Colorimetric data were used for principal component analysis (PCA), correlation and for developing prediction models for TCC based on regression and machine learning. A high positive correlation between TCC and the variables b* (r = 0.90) and chroma (r = 0.89) was identified, while the other correlations were median and negative, and the L* parameter did not present a significant correlation with TCC. In general, the accuracy of most prediction models (with all variables and only the most important ones) was high (R2 ranging from 0.81 to 0.94). However, the artificial neural network prediction model presented the best predictive ability (R2 = 0.94), associated with the smallest error in the TCC estimates (root-mean-square error of 0.24). The structure of the studied population revealed five groups and high genetic variability based on PCA regarding colorimetric indices and TCC. Our results demonstrated that the use of data obtained from digital image analysis is an economical, fast, and effective alternative for the development of TCC phenotyping tools in cassava roots with high predictive ability.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35100324 PMCID: PMC8803208 DOI: 10.1371/journal.pone.0263326
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Scatter plots of the total carotenoid content and colorimetric indices.
Numerical values represent Pearson’s correlation coefficients between colorimetric indices and total carotenoid content. The blue line represents the 1:1 isoline.
Fig 2Within-group sum of squares of 228 biofortified cassava genotypes based on phenotypic data of total carotenoid content and colorimetric indices (CIELAB).
The grouping criterion was based on the K-means clustering algorithm.
Fig 3Principal Component Analysis (PCA) based on phenotypic data of total carotenoid content and colorimetric indices (CIELAB) evaluated in 228 biofortified cassava genotypes.
Fig 4Boxplot of total carotenoid content and colorimetric indices for each cluster of 228 biofortified cassava genotypes based on principal component analysis.
Different letters represent significant differences between accession groups with p < 0.05 by the Tukey Honest Significant Difference test.
Fig 5Relative importance of colorimetric indices for predicting total carotenoid content using 12 prediction models: Linear Regression with Forward Selection (LRFS), Linear Regression with Backwards Selection (LRBS), Ridge Regression (RR), Linear Regression with Stepwise Selection (LRSS), Generalized Linear Model with Stepwise Feature Selection (GLMSS), Random Forest (RF), Partial Least Squares (PLS), the Bayesian Lasso (BL), the Bayesian Blasso (BBL), Artificial Neural Network (ANN), Support vector machine (SVM), and Classification and regression trees (CART).
Performance of different prediction models for total carotenoid content in cassava roots using colorimetric indices obtained from digital images considering the complete model (all variables) and reduced model (variables with more than 50% relative importance), using the random cross-validation without test set (V-Random), PCA clustering-based cross-validation (IV-Cluster), and random cross-validation with test set (IV-Random).
| Model | V-Random | IV-Cluster | IV-Random | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Complete model | Reduced model | Complete model | Reduced model | Complete model | Reduced model | |||||||||||||
| R2 | RMSE | MAPE | R2 | RMSE | MAPE | R2 | RMSE | MAPE | R2 | RMSE | MAPE | R2 | RMSE | MAPE | R2 | RMSE | MAPE | |
| LRFS | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.04 | 0.71 | 0.33 | 0.14 | 0.71 | 0.31 | 0.14 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| LRBS | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 | 0.71 | 0.34 | 0.16 | 0.71 | 0.31 | 0.15 | 0.93 | 0.26 | 0.06 | 0.93 | 0.26 | 0.05 |
| RR | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 | 0.71 | 0.29 | 0.13 | 0.71 | 0.31 | 0.15 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| LRSS | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 | 0.69 | 0.29 | 0.13 | 0.71 | 0.31 | 0.15 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| GLMSS | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 | 0.69 | 0.29 | 0.13 | 0.71 | 0.31 | 0.15 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| RF | 0.90 | 0.31 | 0.07 | 0.90 | 0.32 | 0.07 | 0.52 | 0.58 | 0.23 | 0.54 | 0.57 | 0.20 | 0.90 | 0.31 | 0.08 | 0.90 | 0.31 | 0.08 |
| PLS | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 | 0.71 | 0.30 | 0.13 | 0.71 | 0.31 | 0.15 | 0.93 | 0.25 | 0.05 | 0.93 | 0.25 | 0.05 |
| BL | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.04 | 0.66 | 0.30 | 0.14 | 0.66 | 0.32 | 0.17 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| BBL | 0.93 | 0.26 | 0.06 | 0.93 | 0.26 | 0.04 | 0.71 | 0.32 | 0.15 | 0.71 | 0.32 | 0.15 | 0.93 | 0.26 | 0.05 | 0.93 | 0.26 | 0.05 |
| ANN | 0.94 | 0.24 | 0.10 | 0.94 | 0.24 | 0.10 | 0.66 | 0.39 | 0.33 | 0.68 | 0.46 | 0.31 | 0.94 | 0.24 | 0.07 | 0.94 | 0.24 | 0.07 |
| SVM | 0.90 | 0.33 | 0.10 | 0.90 | 0.33 | 0.09 | 0.23 | 0.97 | 0.92 | 0.29 | 0.91 | 0.77 | 0.90 | 0.32 | 0.11 | 0.90 | 0.32 | 0.09 |
| CART | 0.82 | 0.43 | 0.13 | 0.82 | 0.43 | 0.13 | 0.29 | 0.80 | 0.22 | 0.29 | 0.80 | 0.22 | 0.81 | 0.43 | 0.14 | 0.81 ns | 0.43 | 0.14 |
a: Linear regression with forward selection (LRFS), linear regression with backward selection (LRBS), ridge regression (RR), linear regression with stepwise selection (LRSS), generalized linear model with stepwise feature selection (GLMSS), random forest (RF), partial least squares (PLS), Bayesian Lasso (BL), Bayesian Blasso (BBL), artificial neural network (ANN), support vector machine (SVM), and classification and regression trees (CART).
b: R2: coefficient of determination.
c:RMSE: root-mean-square error.
d: MAPE: mean absolute percentage error.
ns: not significant by paired t-test.
Fig 6Relationship between observed and predicted values for total carotenoid content of cassava roots.
The prediction was performed based on 12 different models based on random cross-validation without test set (V-Random) (80/20% in training and validation, respectively). Artificial neural network (ANN), Bayesian Blasso (BBL), Bayesian Lasso (BL), classification and regression trees (CART), generalized linear model with stepwise feature selection (GLMSS), linear regression with backward selection (LRBS), linear regression with forward selection (LRFS), linear regression with stepwise selection (LRSS), partial least squares (PLS), random forest (RF), ridge regression (RR), and support vector machine (SVM) were calculated. Each cross-validation fold is represented in different colors. The numerical data included in the graphs represent: linear equations (y) and coefficient of determination (R2).
Fig 7Relationship between observed and predicted values for total carotenoid content of cassava roots.
The prediction was performed based on 12 different models based on random cross-validation with test set (IV-Random) (60/20/20% in training, validation, and test set respectively). Artificial neural network (ANN), Bayesian Blasso (BBL), Bayesian Lasso (BL), classification and regression trees (CART), generalized linear model with stepwise feature selection (GLMSS), linear regression with backward selection (LRBS), linear regression with forward selection (LRFS), linear regression with stepwise selection (LRSS), partial least squares (PLS), random forest (RF), ridge regression (RR), and Support vector machine (SVM) were calculated. Each cross-validation fold is represented in different colors. The numerical data included in the graphs represent: linear equations (y) and coefficient of determination (R2).
Fig 8Relationship between observed and predicted values for total carotenoid content of cassava roots.
The prediction was performed based on 12 different models based on PCA clustering-based with k = 5 (IV-Cluster). Artificial Neural Network (ANN), Bayesian Blasso (BBL), Bayesian Lasso (BL), Classification and Regression Trees (CART), Generalized Linear Model with Stepwise Feature Selection (GLMSS), and Linear Regression with Backward Selection (LRBS). The numerical data included in the graphs represent: linear equations (y) and coefficient of determination (R2).
Fig 9Relationship between observed and predicted values for total carotenoid content of cassava roots.
The prediction was performed based on 12 different models based on PCA clustering-based with k = 5 (IV-Cluster). Linear Regression with Forward Selection (LRFS), Linear Regression with Stepwise Selection (LRSS), Partial Least Squares (PLS), Random Forest (RF), Ridge Regression (RR), and Support Vector Machine (SVM). The numerical data included in the graphs represent: linear equations (y) and coefficient of determination (R2).