| Literature DB >> 31881663 |
Salvatore Esposito1, Domenico Carputo2, Teodoro Cardi1, Pasquale Tripodi1.
Abstract
Crops are the major source of food supply and raw materials for the processing industry. A balance between crop production and food consumption is continually threatened by plant diseases and adverse environmental conditions. This leads to serious losses every year and results in food shortages, particularly in developing countries. Presently, cutting-edge technologies for genome sequencing and phenotyping of crops combined with progress in computational sciences are leading a revolution in plant breeding, boosting the identification of the genetic basis of traits at a precision never reached before. In this frame, machine learning (ML) plays a pivotal role in data-mining and analysis, providing relevant information for decision-making towards achieving breeding targets. To this end, we summarize the recent progress in next-generation sequencing and the role of phenotyping technologies in genomics-assisted breeding toward the exploitation of the natural variation and the identification of target genes. We also explore the application of ML in managing big data and predictive models, reporting a case study using microRNAs (miRNAs) to identify genes related to stress conditions.Entities:
Keywords: PacBio; QTLs dissection; genome-wide association studies; genomics; genotyping by sequencing; machine learning; microRNA; nanopore; phenomics
Year: 2019 PMID: 31881663 PMCID: PMC7020215 DOI: 10.3390/plants9010034
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Figure 1Number of indexed publications in the last 20 years concerning plant phenotyping (source Scopus). A search query in Title-Abstract-Keywords for: phenotyping and plant (blue line), phenotyping and crop (red line), phenotyping and sensors (green line).
Figure 2Integration of genomics and phenomics for the exploitation of genetic resources in genome wide association studies (GWAS), genotype by environment (GxE) estimation, quantitative trait loci (QTLs) analysis, investigation of crop diversity and genomic selection. Related big data are exploited in ML-based algorithms implemented in computational tools leading to precision breeding.
Figure 3Flow chart of a Machine-Learning (ML) approach. A training population in divided in a training set (where ML makes prediction) and in a testing set (where ML validates the results and its accuracy is estimated). The validated model may now be applied to a new population.
Prediction of testing data sets. S (susceptible) and R (resistant/tolerant) refer to reference dataset. SVM predict 10 out of 11 samples, being the less accurate model. The training set model accuracy was calculated fractionating the number of correct predictions (true positive + true negative) by the total number (true positive + true negative + false positive + false negative) and using the sparsity values for the proportion of features used in training each model.
| Reference Data |
|
|
|
|
|
|
|
|
|
|
| Model Accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM |
|
|
|
|
|
|
|
|
|
|
| 0.89 |
| NSC |
|
|
|
|
|
|
|
|
|
|
| 0.96 |
| PLDA |
|
|
|
|
|
|
|
|
|
|
| 0.93 |
| PLDA2 |
|
|
|
|
|
|
|
|
|
|
| 0.96 |
| VoomDLDA |
|
|
|
|
|
|
|
|
|
|
| 0.95 |
| VoomNSC |
|
|
|
|
|
|
|
|
|
|
| 0.96 |
| VoomNBLDA |
|
|
|
|
|
|
|
|
|
|
| 0.95 |
SVM = Support Vector Machine; NSC = Supervised Normalized Cut; PLDA = Parallel Latent Dirichlet Allocation; Voom = Variance modeling at the observational level; DLDA and NBLDA are diagonal discriminant classifiers.
Figure 4Venn diagram showing the number of common and unique miRNAs associated with cold tolerance predicted by ML models PLDA, PLDA2, and VoomNSC using 325 selected features from published miRNAs data related to cold tolerance in S. commersonii.