| Literature DB >> 28929574 |
Heike Sprenger1, Alexander Erban1, Sylvia Seddig2, Katharina Rudack2, Anja Thalhammer1, Mai Q Le3, Dirk Walther1, Ellen Zuther1, Karin I Köhl1, Joachim Kopka1, Dirk K Hincha1.
Abstract
Potato (Solanum tuberosum L.) is one of the most important food crops worldwide. Current potato varieties are highly susceptible to drought stress. In view of global climate change, selection of cultivars with improved drought tolerance and high yield potential is of paramount importance. Drought tolerance breeding of potato is currently based on direct selection according to yield and phenotypic traits and requires multiple trials under drought conditions. Marker-assisted selection (MAS) is cheaper, faster and reduces classification errors caused by noncontrolled environmental effects. We analysed 31 potato cultivars grown under optimal and reduced water supply in six independent field trials. Drought tolerance was determined as tuber starch yield. Leaf samples from young plants were screened for preselected transcript and nontargeted metabolite abundance using qRT-PCR and GC-MS profiling, respectively. Transcript marker candidates were selected from a published RNA-Seq data set. A Random Forest machine learning approach extracted metabolite and transcript markers for drought tolerance prediction with low error rates of 6% and 9%, respectively. Moreover, by combining transcript and metabolite markers, the prediction error was reduced to 4.3%. Feature selection from Random Forest models allowed model minimization, yielding a minimal combination of only 20 metabolite and transcript markers that were successfully tested for their reproducibility in 16 independent agronomic field trials. We demonstrate that a minimum combination of transcript and metabolite markers sampled at early cultivation stages predicts potato yield stability under drought largely independent of seasonal and regional agronomic conditions.Entities:
Keywords: drought tolerance; machine learning; metabolite markers; potato (Solanum tuberosum); prediction models; transcript markers
Mesh:
Substances:
Year: 2017 PMID: 28929574 PMCID: PMC5866952 DOI: 10.1111/pbi.12840
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Figure 1Drought tolerance of 31 potato cultivars (Table S1) based on six field experiments (F1–F5 and F7; Table S2). Drought tolerance was calculated as deviation of relative starch yield from the experimental median (DRYM). DRYM values represent mean values across experiments, and error bars represent the SE of the means. Zero indicates average tolerance, negative values indicate sensitivity, and positive values indicate tolerance.
Results of ANOVA on drought tolerance in 31 potato cultivars
| Source | DF |
|
|
|---|---|---|---|
| Cultivar | 33 | 3.52 | <0.0001 |
| SI | 1 | 0.02 | 0.8876 |
| NSY | 1 | 354.79 | <0.0001 |
Degrees of freedom (DF), F‐statistics and error probability (Pr > F) for the effect of cultivar, stress index (SI) and starch yield under drought conditions normalized to the median starch yield under control conditions over all cultivars (NSY) on DRYM in six field experiments (DF (error) = 532).
Figure 2Expression plots for the selection of reference genes. (a) Relation between log2 FPKM mean and log2 FPKM variance measured by RNA‐Seq. Vertical lines indicate the expression range from 5 to 50 FPKM by an interval of 5. Selected candidates as reference genes are highlighted in red. (b) Expression of 15 candidate genes measured as C t value by qRT‐PCR. The final selection of four reference genes is indicated in grey.
Selected reference genes for qRT‐PCR with their annotated function and coefficient of variation (CV) across 124 tested samples
| Number | PGSC Gene Identifier | Functional annotation | CV |
|---|---|---|---|
| 4 | PGSC0003DMG400011723 | Paramyosin | 0.050 |
| 9 | PGSC0003DMG400026492 | ATP binding protein | 0.056 |
| 27 | PGSC0003DMG400014497 | AP‐2 complex subunit β1 (β‐adaptin B) | 0.064 |
| 50 | PGSC0003DMG400031374 | Zinc finger CCCH domain‐containing protein 17 | 0.064 |
Figure 3Results of qRT‐PCR using 88 selected marker candidates for drought tolerance. Correlation between gene expression measured by qRT‐PCR (log2()) and RNA‐Seq (log2 FPKM) from Sprenger et al. (2016).
Figure 4PCA scores plots of metabolite (a) and transcript (b) data of samples from field experiments and agronomic trials. PCA results indicating the difference between well‐watered control (blue) and drought‐stressed plants (red) as well as 2 years of agronomic trials (2011: green, 2012: orange) are shown for PC1 and PC2.
Figure 5Plots illustrating the metabolite marker selection. (a) Plot of out‐of‐bag (OOB) error rate and its standard deviation (dashed lines) of the Random Forest model in relation to number of metabolite markers (predictors). The model was based on field training data. The least important predictors were eliminated successively from the model resulting in a set of 24 predictors (red diamond) according to the ‘1 SE rule’. (b) Importance of the selected 24 metabolite markers measured as mean decrease in Gini index for Random Forest models of field trial data.
Random Forest model performance for training and validation estimated by out‐of‐bag (OOB) error rate and overall accuracy. Full models based on metabolite, transcript and combined data are compared to reduced models with selected predictors
| Training (OOB error rate) | Validation (overall accuracy) | |||
|---|---|---|---|---|
| Full model | Reduced model | Full model | Reduced model | |
| Metabolite data | 6.02% | 5.81% | 91.6% | 90.0% |
| Transcript data | 8.91% | 10.89% | 69.7% | 66.5% |
| Combined data | 4.3% | 4.3% | 82.6% | 77.7% |
Figure 6Plot of out‐of‐bag (OOB) error rate and its standard deviation (dashed lines) of the Random Forest model in relation to number of transcript markers (a). Equivalent plot of OOB error rate of the Random Forest model for combination of metabolite and transcript data (b). The models were based on field training data. The least important predictors were eliminated successively from the model resulting in a set of 14 transcripts (a) and 27 transcripts/metabolites (b), respectively (indicated by red diamond).
Importance of the top 20 transcript marker candidates in Random Forest models for drought tolerance prediction based on field training data
| Identifier | Functional annotation | MapMan BIN | Importance |
|---|---|---|---|
| 400021019 | Glucosyltransferase | 26.2‐misc.UDP glucosyl and glucuronyl transferases | 7.966 |
| 400031370 | O‐Methyltransferase | 16.2‐secondary metabolism.phenylpropanoids | 5.263 |
| 400028434 | Serine/threonine protein kinase, plant‐type | 35.2‐not assigned.unknown | 5.235 |
| 400082012 | Extensin | 35.2‐not assigned.unknown | 5.234 |
| 400008092 | Glutamyl‐tRNA (Gln) amidotransferase subunit A | 26.8‐misc.nitrilases | 5.219 |
| 400035714 | BED finger‐NBS‐LRR resistance protein | 20.1‐stress.biotic | 4.988 |
| 400083025 | Betaine aldehyde dehydrogenase | 5.10‐fermentation.aldehyde dehydrogenase | 4.551 |
| 400082023 | Lipoxygenase | 17.7.1.2‐hormone metabolism.jasmonate.synthesis‐degradation.lipoxygenase | 4.216 |
| 400068787 | Serine/threonine protein kinase, plant‐type | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 4.202 |
| 400075512 | Poly(ADP‐ribose) glycohydrolase | 29.5‐protein.degradation | 4.154 |
| 400068776 | Flagellin‐sensing 2 | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 3.690 |
| 400071885 | LRR receptor‐like serine/threonine protein kinase | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 3.451 |
| 400045689 | Receptor protein kinase | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 3.387 |
| 400062379 | Gene of unknown function | 35.2‐not assigned.unknown | 3.227 |
| 400004539 | Glutathione S‐transferase | 26.9‐misc.glutathione S transferases | 3.190 |
| 400020366 | Ethylene‐inducing xylanase | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 3.092 |
| 400046899 | TMV resistance protein N | 20.1.7‐stress.biotic.PR‐proteins | 3.089 |
| 400046308 | Reticuline oxidase | 26.8‐misc.nitrilases | 3.072 |
| 400046445 | Serine/threonine protein kinase, plant‐type | 30.2.11‐signalling.receptor kinases.leucine rich repeat XI | 3.033 |
| 400006231 | Bacterial spot disease resistance protein 4 | 20.1.7‐stress.biotic.PR‐proteins | 3.011 |
Variable importance was estimated by the varImp function based on the Gini index. Transcripts highlighted in grey resulted from the variable selection using the varSelRF function. Genes in the biotic stress bin are highlighted in blue, and signalling receptor kinases in red.
Importance of metabolite and transcript marker candidates in the combined Random Forest model for drought tolerance prediction based on field training data
| Predictor identifier | Name | Importance |
|---|---|---|
| 400021019 | Glucosyltransferase | 3.697 |
| 400031370 | O‐Methyltransferase | 2.908 |
| 400035714 | BED finger‐NBS‐LRR resistance protein | 2.704 |
| 400008092 | Glutamyl‐tRNA (Gln) amidotransferase subunit A | 2.610 |
| A175010‐101 | A175010‐101 | 2.585 |
| 400082012 | Extensin | 2.467 |
| 400075512 | Poly(ADP‐ribose) glycohydrolase | 2.383 |
| 400028434 | Serine/threonine protein kinase, plant‐type | 2.281 |
| 400082023 | Lipoxygenase | 1.933 |
| 400083025 | Betaine aldehyde dehydrogenase | 1.857 |
| 400068787 | Serine/threonine protein kinase, plant‐type | 1.750 |
| 400052517 | 70‐kDa subunit of replication protein A | 1.742 |
| A158004‐101 | Glutaric acid, 2‐oxo‐ | 1.674 |
| 400068776 | Flagellin‐sensing 2 | 1.653 |
| 400045689 | Receptor protein kinase | 1.590 |
| A177001‐101 | Ribonic acid | 1.532 |
| 400046308 | Reticuline oxidase | 1.524 |
| A179012‐101 | A179012‐101 | 1.521 |
| A228001‐101 | A228001‐101 | 1.518 |
| 400062379 | Gene of unknown function | 1.511 |
| 400030682 | Gamma aminobutyrate transaminase isoform1 | 1.387 |
| 400004539 | Glutathione S‐transferase | 1.346 |
| A308004‐101 | A308004‐101 | 1.278 |
| A250002‐101 | A250002‐101 | 1.239 |
| A199002‐101 | Galactonic acid | 1.186 |
| 400027201 | Acidic class II 1 3‐beta‐glucanase | 1.152 |
| 400071885 | LRR receptor‐like serine/threonine protein kinase | 1.140 |
Variable importance was estimated by the varImp function based on the Gini index. Metabolites are highlighted in grey.
Drought tolerance prediction accuracy for all cultivars from samples taken in 16 agronomic field trials using the full Random Forest model based on metabolite data
| Observed | Low | Intermediate | High |
|---|---|---|---|
| Predicted | |||
| Low | 143 | 5 | 3 |
| Intermediate | 13 | 157 | 7 |
| High | 2 | 10 | 150 |
| Total | 158 | 172 | 160 |
| Sensitivity (%) | 90.5 | 91.3 | 93.1 |
| Specificity (%) | 97.6 | 93.4 | 96.4 |
Drought tolerance prediction accuracy for all cultivars from samples taken in six agronomic trials using the full Random Forest model based on transcript data
| Observed | Low | Intermediate | High |
|---|---|---|---|
| Predicted | |||
| Low | 48 | 18 | 11 |
| Intermediate | 1 | 34 | 1 |
| High | 11 | 14 | 47 |
| Total | 60 | 66 | 59 |
| Sensitivity | 80.0 | 51.5 | 79.7 |
| Specificity | 76.8 | 98.3 | 80.2 |
Results of model reproducibility measured as overall accuracy for the prediction of drought tolerance using agronomic field trials