Literature DB >> 35166848

Can biochemical traits bridge the gap between genomics and plant performance? A study in rice under drought.

Giovanni Melandri1,2, Eliana Monteverde2,3, David Riewe4,5, Hamada AbdElgawad6,7, Susan R McCouch2, Harro Bouwmeester1,8.   

Abstract

The possibility of introducing metabolic/biochemical phenotyping to complement genomics-based predictions in breeding pipelines has been considered for years. Here we examine to what extent and under what environmental conditions metabolic/biochemical traits can effectively contribute to understanding and predicting plant performance. In this study, multivariable statistical models based on flag leaf central metabolism and oxidative stress status were used to predict grain yield (GY) performance for 271 indica rice (Oryza sativa) accessions grown in the field under well-watered and reproductive stage drought conditions. The resulting models displayed significantly higher predictability than multivariable models based on genomic data for the prediction of GY under drought (Q2 = 0.54-0.56 versus 0.35) and for stress-induced GY loss (Q2 = 0.59-0.64 versus 0.03-0.06). Models based on the combined datasets showed predictabilities similar to metabolic/biochemical-based models alone. In contrast to genetic markers, models with enzyme activities and metabolite values also quantitatively integrated the effect of physiological differences such as plant height on GY. The models highlighted antioxidant enzymes of the ascorbate-glutathione cycle and a lipid oxidation stress marker as important predictors of rice GY stability under drought at the reproductive stage, and these stress-related variables were more predictive than leaf central metabolites. These findings provide evidence that metabolic/biochemical traits can integrate dynamic cellular and physiological responses to the environment and can help bridge the gap between the genome and the phenome of crops as predictors of GY performance under drought.
© The Author(s) 2022. Published by Oxford University Press on behalf of American Society of Plant Biologists.

Entities:  

Mesh:

Year:  2022        PMID: 35166848      PMCID: PMC9157150          DOI: 10.1093/plphys/kiac053

Source DB:  PubMed          Journal:  Plant Physiol        ISSN: 0032-0889            Impact factor:   8.005


Introduction

In rice (Oryza sativa), as in most crops, grain yield (GY) is a highly complex trait. It is controlled by many genes of small effect, and these genes operate in coordinated networks that are influenced by pleiotropic and epistatic effects as well as by genotype-by-environment-by-management interactions (Xing and Zhang, 2010). The intrinsic complexity and polygenic nature of GY makes it a difficult trait to improve using marker-assisted selection. On the other hand, genomic selection (GS) overcomes the limits that are associated with the absence of major effect genes by simultaneously estimating the effect of many markers (and underlying genes) distributed over the whole genome. However, the need to account for different sources of nongenetic variability, and nonadditive modes of gene action has made model choice and implementation of GS challenging for improving complex traits (Rice and Lipka, 2021). One limitation of genomics for the prediction of complex phenotypes, such as GY, lies in the fact that the information encoded in genetic markers is a poor predictor of an organism’s ability to dynamically respond to environmental stimuli at the physiological level (Yin et al., 2004). For these reasons, in other important cereal crops, such as maize (Zea mays) and wheat (Triticum aestivum), physiological traits have been studied in connection with genetics to improve crop performance (Cooper et al., 2014; Reynolds and Langridge, 2016). Cellular physiology provides a key interface between genotype and phenotype. It represents an internal phenotype (endophenotype) that integrates transcriptomic, proteomic, and metabolomic networks of regulation that are interconnected and continuously respond to environmental factors (Großkinsky et al., 2015). Among these multiple cellular layers of information, metabolite levels are more directly linked to the phenotype than are gene transcripts and protein levels (Fernie and Stitt, 2012) and the possibility of introducing metabolic/biochemical phenotyping to complement genomics in breeding pipelines for the improvement of crop performance has been considered for years (Fernandez et al., 2021). In the last 15 years, metabolome-based models have been used to predict complex traits, such as biomass, in large Arabidopsis (Arabidopsis thaliana) and maize populations of recombinant inbred lines (Meyer et al., 2007; Sulpice et al., 2009; Steinfath et al., 2010; Riedelsheimer et al., 2012). In rice, these models were successfully employed to predict the yield of hybrids by directly using the hybrid’s metabolite profiles (Xu et al., 2016) or those of the parents (Dan et al., 2016). Despite the value of these findings for hybrid breeding programs, the narrow genetic background of the materials used in these studies did not explore the large, qualitative, and quantitative genetic diversity available for rice metabolism (Chen et al., 2014). In addition, most of the metabolomics studies in crop species, including rice, have been conducted under control conditions while in Arabidopsis natural variation in metabolic plasticity (i.e. metabolic changes induced by the environmental changes) was shown to be an important factor contributing to phenotypic plasticity (Kleessen et al., 2014). For these reasons, it is still necessary to evaluate the power of metabolic/biochemical-based models for predicting GY under different environmental conditions in large panels of genetically diverse crop accessions. It is also necessary to compare the predictive ability of metabolic/biochemical-based models, genomic-based models, and models based on combined datasets for the same trait to understand when and if metabolic/biochemical traits complement or potentially outperform genomics-based prediction for yield improvement. It was recently shown that a multivariable model based on levels of flag leaf central metabolites and oxidative stress markers/enzymes was able to efficiently predict drought-induced GY loss in a large panel of genetically diverse indica rice accessions grown in the field (Melandri et al., 2020a). Here we use the same dataset to predict GY under both well-watered and drought conditions, in addition to predict stress-induced GY loss. We also evaluate a genomic dataset as the basis for predicting the same GY traits under the same conditions. This allowed us to: (1) analyze the differences between metabolic/biochemical-based models, genomic-based models, and models that integrate the two datasets for the prediction of rice GY traits under well-watered and drought conditions and (2) identify which biochemical pathways/antioxidants are important predictors for GY performance in this crop.

Results

Relationships between GY, plant height (PH), and flowering time (FT)

In this study, the GY performance of 271 tropical and sub-tropical, traditional, and improved indica rice varieties (Supplemental Table S1) was assessed in a field experiment under irrigated (control) and reproductive-stage drought conditions. Drought stress reduced GY (GYLOSS) by an average of 29.3% (paired t test: P < 0.001) (Supplemental Table S2). GYs under control (GYCON) and drought (GYDRO) conditions were highly correlated (Pearson correlation, r = 0.75, P < 0.001), and high estimates of broad-sense heritability were observed under both treatments (H2 = 0.89 and 0.84 for GYCON and GYDRO, respectively) (Figure 1; Supplemental Table S3). Interestingly, GYLOSS was significantly (P < 0.001) and negatively correlated (r = −0.61) with GYDRO but not with GYCON (Figure 1). This observation indicates that the yield performance of the accessions exhibited genotype-by-treatment interaction and was highly influenced by reproductive-stage drought.
Figure 1

Correlation matrix between values (BLUEs) of PH, FT, and GY—under control (CON) and drought (DRO) conditions—and GY loss (GYLOSS) of the 271 indica rice accessions. PH units are expressed in centimeters, FT in days, GY in grams/m2, and GYLOSS in percentage. Pearson correlations (r, stronger correlations are represented by larger numbers) and levels of significance (in green, ***P < 0.001, **P < 0.01, *P < 0.05) are reported in the upper-right portion of the matrix. Scatterplots of the pairwise combinations between traits (trendline in red) are reported in the bottom-left portion of the matrix. Trait distributions are represented along the diagonal of the matrix (trendline in blue).

Correlation matrix between values (BLUEs) of PH, FT, and GY—under control (CON) and drought (DRO) conditions—and GY loss (GYLOSS) of the 271 indica rice accessions. PH units are expressed in centimeters, FT in days, GY in grams/m2, and GYLOSS in percentage. Pearson correlations (r, stronger correlations are represented by larger numbers) and levels of significance (in green, ***P < 0.001, **P < 0.01, *P < 0.05) are reported in the upper-right portion of the matrix. Scatterplots of the pairwise combinations between traits (trendline in red) are reported in the bottom-left portion of the matrix. Trait distributions are represented along the diagonal of the matrix (trendline in blue). To explore the nature of this interaction, we further assessed the relationships between GY-related traits and two important agronomic phenotypes, plant height (PH), and flowering time (FT). Both of these traits showed high heritability estimates (PH: H2 = 0.97; FT: H2 = 0.99; Supplemental Table S3) and displayed significant variation in the diversity panel (Figure 1; Supplemental Table S2). In this indica rice panel, there was a high correlation between PHCON and PHDRO (r = 0.95, P < 0.001), and the distribution of PH under both treatments was bi-modal, with two, distinct normal distributions around two different peaks (Figure 1). PH differences under both treatments are strongly associated with allelic variation of a single-nucleotide polymorphism (SNP) marker on chromosome 1 (position: 38,286,772 bp; Welch’s t test: P < 0.001; Supplemental Figure S1). This SNP marker was previously mapped for PH differences in a larger version of this panel (Kadam et al., 2017), and the linkage disequilibrium block (259 kbp) surrounding the marker included the gibberellin 20-oxidase biosynthetic gene, also known as SEMI-DWARF1 (OsGA20ox2, LOC_Os01g66100), introduced during the Green Revolution. The genotypic difference associated with the diagnostic SNP marker is a strong predictor of PH across environments, despite a mean reduction of 8 cm when PHDRO is compared with PHCON (paired t test: P < 0.001; Supplemental Table S2). This stature-associated SNP is also strongly associated with yield performance (Welch’s t test: P < 0.001; Supplemental Figure S1), as evidenced by the significant (P < 0.001) negative correlation between PH and GY under both control (r = −0.31) and drought (r = −0.26) conditions (Figure 1). In contrast, GYLOSS displayed a random distribution among both short and tall accessions of the panel (Supplemental Figure S2). In this study, FT was synchronized by sowing and transplanting accessions on different dates to ensure that drought stress was imposed on all genotypes at the flowering stage (Kadam et al., 2018). FT synchronization was largely, but not entirely achieved (Supplemental Table S1). Reproductive-stage drought stress resulted in an almost uniform delay of three days in FT for all accessions (Supplemental Table S2) consistent with the high correlation (r = 0.97, P < 0.001) between FT under control (FTCON) and drought (FTDRO) conditions (Figure 1). Despite the synchronization of FT for stress application, FTDRO was negatively and weakly correlated with GYLOSS (r = −0.16, P < 0.01) suggesting that drought-induced yield loss was partially mitigated by late FT.

Variation in GY is better predicted by metabolome/oxidative stress status-based models than by genomic-based models

Our previous work (Melandri et al., 2020a) showed that cross-validated (CV) partial least squares regression (PLSR) modeling based on 111 flag leaf metabolites, oxidative stress markers, and enzyme activities (hereafter MetabOxi) measured under drought efficiently predicted stress-induced GYLOSS in 292 genetically diverse indica rice genotypes. In this study, we expanded the PLSR modeling approach to predict GYCON and GYDRO, in addition to GYLOSS, in a subset of 271 accessions from the same experiment using the MetabOxi dataset (Supplemental Table S4). Control values of the MetabOxi dataset were used to predict GYCON while drought values of the same dataset were used to predict GYDRO and GYLOSS. To compare the strength of modeling using these biochemical markers with genomic prediction, a genomic dataset consisting of 81,347 SNP markers on the 271 accessions (Supplemental Data Set 1) was also used to build PLSR models for prediction of GYCON, GYDRO, and GYLOSS. In addition, Ridge-Regression Best Linear Unbiased Prediction (RR-BLUP) and BayesB models, more commonly employed in GS studies, were used to run MetabOxi- and genomic-based models as the basis for comparing the predictive ability of all three models for the same traits. Goodness of prediction (Q2) was used to quantify the predictability of the models (for more details on the calculation of Q, see “Materials and methods”) and used throughout the manuscript to describe and discuss the results. Prediction accuracies were also calculated as Pearson correlation (Pearson’s r) coefficients between observed and predicted GY values. For each GY trait, Q2 values were similar between the 10-fold CV PLSR, RR-BLUP, and BayesB models and always higher for the MetabOxi than the genomic dataset (Figure 2, A and C). Q2 values were most similar for GYCON when MetabOxi- (Q2 = 0.32–0.40) and genomic-based (Q2 = 0.31–0.32) models were compared. Differences in predictability were greater for GYDRO where genomic-based models displayed similar values (Q2 = 0.35) as for GYCON but MetabOxi-based models showed markedly better values (Q2 = 0.54–0.56). The gap between MetabOxi- and genomic-based models further increased when predicting GYLOSS, with MetabOxi-based models showing good predictability (Q2 = 0.59–0.64) and genomic-based models showing almost null values (Q2 = 0.03–0.06). In all cases, Pearson correlation coefficients showed the same trends as Q2 values (Figure 2, A and C). Overall, the higher predictive power of the MetabOxi dataset compared to the genomic dataset suggests that metabolite levels and enzyme activities are more closely aligned to the ability of a plant to dynamically respond to stress than are fixed genetic determinants. This is especially true for drought-induced GYLOSS, followed by GYDRO and, to a lesser extent, for GYCON.
Figure 2

Multivariate models for the prediction of GY performance in the 271 indica rice accessions of the panel. Scatterplots of observed (BLUEs) versus predicted values of the 10-fold CV MetabOxi-based (A, in blue), genomic-based (B, in purple), and MetabOxi + Genomic-based (C, in orange) PLSR, RR-BLUP, and BayesB models for the prediction of GY—under control (CON) and drought (DRO) conditions—and GYLOSS. GY units are expressed in grams/m2 and GYLOSS in percentage. Predictability values (Q and Pearson’s r) of the models are displayed in each scatterplot (Pearson’s r values in brackets).

Multivariate models for the prediction of GY performance in the 271 indica rice accessions of the panel. Scatterplots of observed (BLUEs) versus predicted values of the 10-fold CV MetabOxi-based (A, in blue), genomic-based (B, in purple), and MetabOxi + Genomic-based (C, in orange) PLSR, RR-BLUP, and BayesB models for the prediction of GY—under control (CON) and drought (DRO) conditions—and GYLOSS. GY units are expressed in grams/m2 and GYLOSS in percentage. Predictability values (Q and Pearson’s r) of the models are displayed in each scatterplot (Pearson’s r values in brackets).

Models based on combined MetabOxi and genomic data have a similar predictive power as MetabOxi-based models alone

We compared the prediction accuracies of the 10-fold CV PLSR, RR-BLUP, and BayesB models based on the combined MetabOxi and genomic datasets (MetabOxi + Genomic in Figure 2B) with the same models based on a single dataset (Figure 2, A and C) for GY traits. The MetabOxi + Genomic-based RR-BLUP and BayesB models always predicted GY traits better than genomic-based models alone, while the PLSR models showed no difference (Figure 2, B and C). Compared with MetabOxi-based models, MetabOxi + Genomic-based RR-BLUP and BayesB models showed a virtually identical predictability for GYCON, GYDRO, and GYLOSS (Figure 2, A and B). The MetabOxi + Genomic-based PLSR models displayed lower predictability values for GY-related traits, particularly for GYDRO and GYLOSS (GYCON showed the same predictability). Overall, these results suggest that combining MetabOxi and genomic information into a single model did not improve the prediction of GY-related traits compared with the use of MetabOxi-based models alone. In the case of PLSR, the integration of the two datasets reduced the predictability of GY under stress compared with the MetabOxi-based models.

Adjusting GY for PH and FT consistently improves the predictive power of genomic-based models only

Given the influence of PH and FT variation on GY performance (as discussed above), we next tested if accounting for differences in PH and FT in the context of GY performance could improve the predictability of MetabOxi- and/or genomic-based models. To address this question, the 10-fold CV PLSR, RR-BLUP, and BayesB models were re-run using re-estimated values of GYCON, GYDRO, and GYLOSS calculated using PH, FT, or both (PH + FT) as trait covariates. The MetabOxi-based models showed virtually no improvement in predicting GYCON when the GY values were adjusted using PH and/or FT as covariates, while prediction of GYLOSS and GYDRO was slightly improved (a max Q2 increase of ∼0.12 was observed using PH + FT corrected values) (Table 1). In contrast, the genomic-based models displayed a larger increase in predictability using the covariate-adjusted GY traits. The increase in predictability was again minimal for GYCON (max Q2 increase of 0.05) and larger for GYDRO and GYLOSS (max Q2 increase of ∼0.25, for both traits). Predictability values for the genomic-based models were most improved for GYDRO and GYLOSS when the data were adjusted using either PH alone or PH + FT as covariates, while improvement was minimal (max Q2 increase of 0.08) when FT alone was used as a covariate (Table 1). This suggests that variation in PH exerts a stronger influence on GY performance under drought than variation in FT, consistent with the correlations among agronomic traits described above (Figure 1). Despite the increased predictive ability of the genomic-based and the MetabOxi-based models when covariate-adjusted GY traits were used as input data, it is noteworthy that the MetabOxi-based models always out-performed the genomic models in terms of predictability (Table 1). In all cases, Pearson correlation coefficients showed the same trends as Q2 values (Table 1).
Table 1

Predictability of MetabOxi- and genomic-based models for GY traits nonadjusted and adjusted by PH and FT

MetabOxi-based models
Genomic-based models
PLSR
RR-BLUP
BayesB
PLSR
RR-BLUP
BayesB
GY traits Q 2 r Q 2 r Q 2 r Q 2 r Q 2 r Q 2 r
BLUEs no covGYCON0.320.580.370.610.400.630.310.570.320.560.320.56
GYDRO0.540.740.560.750.550.740.350.600.350.600.350.59
GYLOSS0.610.780.590.770.640.800.030.250.060.260.060.26
BLUEs cov PHGYCON0.340.590.370.610.400.630.310.570.310.560.310.56
GYDRO0.600.780.640.800.640.800.540.730.530.730.530.73
GYLOSS0.580.760.600.770.600.770.240.510.320.570.320.57
BLUEs cov FTGYCON0.300.560.350.590.380.610.350.600.350.600.350.59
GYDRO0.580.770.620.790.620.790.410.640.420.650.420.64
GYLOSS0.660.810.660.810.670.820.030.300.140.370.140.38
BLUEs cov PH and FTGYCON0.320.570.360.600.380.620.360.610.360.600.360.60
GYDRO0.650.810.680.830.690.830.590.770.580.760.580.76
GYLOSS0.670.820.660.820.670.820.240.510.300.550.300.55

Predictability (Q2 and Pearson’s r) values of the PLSR, RR-BLUP, and BayesB models for the best linear unbiased estimators (BLUEs) of GY—under control (CON) and drought (DRO) conditions—and GY loss (GYLOSS) calculated considering PH and FT as covariates (cov PH, cov FT, cov PH&FT) or without (no cov, same values as in Figure 2, A and C).

Predictability of MetabOxi- and genomic-based models for GY traits nonadjusted and adjusted by PH and FT Predictability (Q2 and Pearson’s r) values of the PLSR, RR-BLUP, and BayesB models for the best linear unbiased estimators (BLUEs) of GY—under control (CON) and drought (DRO) conditions—and GY loss (GYLOSS) calculated considering PH and FT as covariates (cov PH, cov FT, cov PH&FT) or without (no cov, same values as in Figure 2, A and C). We also considered the effect of PH and FT (individually and together) as secondary traits to predict GY by running multi-trait PLSR and RR-BLUP models (for more details on the calculation of the models see “Materials and methods”) and report the results in Supplemental Table S5. Interestingly, the results (compare Supplemental Table S5 with Table 1) show that better GY predictabilities were determined when PH, FT, or both were used as covariates for calculating the BLUEs rather than incorporating them into the multi-trait models. Taken together, these results suggest that metabolite values and enzyme activities provide a way to quantitatively estimate dynamic physiological responses to stress that differentiate individual plants in a population, and that these MetabOxi-variables already integrate inherent differences in PH and FT known to impact GY performance.

Rankings of MetabOxi-based model predictors reveal the importance of biochemical pathways and antioxidants for GY performance

Each of the 10 MetabOxi-based sub-models (generated by the CV procedure) for the prediction of GY traits provided a rank of importance for the 111 MetabOxi-variables. By multiplying the 10 ranks derived from single sub-models, the overall ranking of each MetabOxi-variable was calculated (Supplemental Tables S6–S8). We next determined the correlation between the MetabOxi-variables and GY traits, PH, and FT (Supplemental Tables S9 and S10) to gain insight into the nature (positive or negative) and strength (r) of their associations. The top three MetabOxi-variables from each GY model (Table 2), that is, those with the lowest rank-products (lower rank-product implies higher importance), indicate which biochemical and antioxidant pathways are important for GY prediction in our dataset (Figure 3). We additionally tested if these top-ranked predictors had a significant effect on GY by fitting them in linear models as single explanatory variables for the trait (Supplemental Table S11).
Table 2

Best predictive variables of the MetabOxi-based models for the prediction of GY traits

MetabOxi-based PLSR model
MetabOxi-based RR-BLUP model
MetabOxi-based BayesB model
Trait to predictRankVariableRank-prodCorr. with traits (rs and P)
VariableRank-prodCorr. with traits (rs and P)
VariableRank-prodCorr. with traits (rs and P)
GYPHFTGYPHFTGYPHFT
GYCON1Chlorogenic acid1−0.36***0.39***−0.33***α-ketoglutaric acid480.27***−0.31***−0.09 nsα-ketoglutaric acid1440.27***−0.31***−0.09 ns
2Isocitric acid1,0240.39***−0.38***−0.21*Chlorogenic acid1,152−0.36***0.39***−0.33***Galactinol432−0.23*−0.01 ns0.50***
3Citric acid275,5620.37***−0.45***−0.08 nsUridine23,3280.18 ns0.22 *−0.22*Chlorogenic acid4,608−0.36***0.39***−0.33***
GYDRO1DHAR10.58***−0.10 ns0.02 nsDHAR10.58***−0.10 ns0.02 nsDHAR10.58***−0.10 ns0.02 ns
2MDA1,024−0.41***0.15 ns−0.17 nsMDA17,280−0.41***0.15 ns−0.17 nsα-ketoglutaric acid5,7600.20 ns−0.38***0.05 ns
3MDHAR233,2800.21 ns−0.09 ns−0.19 nsα-ketoglutaric acid20,7360.20 ns−0.38***0.05 nsMDA1,244,160−0.41***0.15 ns−0.17 ns
GYLOSS1DHAR1−0.62***−0.10 ns0.02 nsDHAR1−0.62 ***−0.10 ns0.02 nsDHAR1−0.62***−0.10 ns0.02 ns
2MDA1,0240.61***0.15 ns−0.17 nsMDA1,0240.61 ***0.15 ns−0.17 nsMDHAR2,304−0.05 ns−0.09 ns−0.19 ns
3MDHAR59,049−0.05 ns−0.09 ns−0.19 nsMDHAR59,049−0.05 ns−0.09 ns−0.19 nsMDA26,2440.61***0.15 ns−0.17 ns

Top three ranked predictive variables of the 10-fold CV MetabOxi-based PLSR, RR-BLUP, and BayesB models for prediction of GY under control (GYCON) and drought (GYDRO), and for GYloss. Variables are ranked based on their rank-product value (Rank-prod.). Correlations between the MetabOxi-variables and the GY traits, PH, and (FT are reported. R: Pearson correlation coefficient. Bonferroni-corrected significance of the correlation (P): ***P < 0.001, *P < 0.05, ns = not significant.

Figure 3

Summary of the main biochemical pathways predictors for GY performance in the indica rice panel and their relationships with GY—under control (CON) and drought (DRO) conditions—and GYLOSS. The blue triangle represents the TCA cycle (isocitric, citric, and α-ketoglutaric acids) and constitutive antioxidants (chlorogenic acid and galactinol) which displayed higher prediction importance from left to right (GYCON → GYDRO → GYLOSS). The purple triangle represents the ascorbate–glutathione cycle (DHAR and MDHAR) and lipid peroxidation (MDA) which displayed higher prediction importance from right to left (GYCON ← GYDRO ← GYLOSS). The influence of PH and FT on the pathways of the two triangles is represented by the red arrow (up = high; down = low).

Summary of the main biochemical pathways predictors for GY performance in the indica rice panel and their relationships with GY—under control (CON) and drought (DRO) conditions—and GYLOSS. The blue triangle represents the TCA cycle (isocitric, citric, and α-ketoglutaric acids) and constitutive antioxidants (chlorogenic acid and galactinol) which displayed higher prediction importance from left to right (GYCON → GYDRO → GYLOSS). The purple triangle represents the ascorbate–glutathione cycle (DHAR and MDHAR) and lipid peroxidation (MDA) which displayed higher prediction importance from right to left (GYCON ← GYDRO ← GYLOSS). The influence of PH and FT on the pathways of the two triangles is represented by the red arrow (up = high; down = low). Best predictive variables of the MetabOxi-based models for the prediction of GY traits Top three ranked predictive variables of the 10-fold CV MetabOxi-based PLSR, RR-BLUP, and BayesB models for prediction of GY under control (GYCON) and drought (GYDRO), and for GYloss. Variables are ranked based on their rank-product value (Rank-prod.). Correlations between the MetabOxi-variables and the GY traits, PH, and (FT are reported. R: Pearson correlation coefficient. Bonferroni-corrected significance of the correlation (P): ***P < 0.001, *P < 0.05, ns = not significant. Among the 111 MetabOxi-variables evaluated as top-ranked predictors for GYCON, organic acids consistently ranked high (Table 2). Chlorogenic acid (3-caffeoyl-quinic acid), a compound with antioxidant and pathogen defense activity in plants, is present among the top-ranked variables for all three models (ranked first, second, and third in the PLSR, RR-BLUP, and BayesB models, respectively). It correlates negatively with GYCON (r = −0.36, P < 0.001), positively with PHCON (r = 0.39, P < 0.001) and negatively with FTCON (r = −0.33, P < 0.001) (Table 2; Supplemental Table S9). These correlations suggest that the variation in concentration of this compound among rice accessions fully integrates the complex interconnections between GY, PH, and FT observed in the rice panel (Figure 1). A second organic acid, α-ketoglutaric acid (2-oxo-glutaric acid) ranked first in both the RR-BLUP and BayesB models. It is an intermediate of the tricarboxylic acid (TCA) cycle, like isocitric and citric acid, which are the second and third top-ranked variables of the PLSR model. All three of these TCA cycle intermediates are positively correlated with GYCON (P < 0.001, r = 0.27, 0.39, and 0.37 for α-ketoglutaric, isocitric, and citric acid, respectively) and negatively with PHCON (P < 0.001, r = −0.31, −0.38, and −0.45 for α-ketoglutaric, isocitric, and citric acid, respectively) while they show no correlation with FTCON (Table 2). This indicates that higher abundance of TCA cycle intermediates is associated with shorter PH in the panel, and higher GYCON performance, independent of differences in FT (Figure 3). A third variable, galactinol, is a sugar alcohol with osmoprotective/antioxidant activity. It ranked second in the BayesB model for GYCON (fourth in the RR-BLUP model, Supplemental Table S6) and, like chlorogenic acid, is negatively correlated with GYCON (r = −0.23, P < 0.05) but positively correlated with FTCON (r = 0.50, P < 0.001); it shows no association with differences in PHCON (Table 2). This may indicate that the variation in galactinol levels impacts GY performance and is mainly associated with the imperfect FT synchronization of the panel. Less clear is the biochemical contribution of uridine, the third best predictor of the RR-BLUP model (fourth in the BayesB model, Supplemental Table S6) where variation is not associated with differences in GYCON though it is positively correlated with PHCON (r = 0.22, P < 0.05) and negatively with FTCON (r = −0.22, P < 0.05) (Table 2). The presence of uridine among the top BayesB model predictors hints at the importance of both PH and FT differences on GY performance of the accessions under control conditions. In contrast to the predictors identified for GYCON, the top-ranked predictors in the MetabOxi-based GYDRO and GYLOSS models are mostly antioxidant enzymes or oxidative stress markers that are not significantly correlated with variation in either PHDRO or FTDRO (Table 2). The antioxidant enzyme dehydroascorbate reductase (DHAR) is the highest-ranking model predictor (rank-prod = 1 in all models) for both GYDRO and GYLOSS. It is positively (r = 0.58, P < 0.001) and negatively (r = −0.62, P < 0.001) correlated with the two traits, respectively, and these correlations are the most significant among the 111 MetabOxi-variables (Supplemental Table S10). The lipid peroxidation product malondialdehyde (MDA) ranked as the second-best variable in the PLSR and RR-BLUP models, and third in BayesB, for the prediction of both GYDRO and GYLOSS (Table 2). In contrast to DHAR, MDA is negatively correlated with GYDRO (r = −0.41, P < 0.001) and positively with GYLOSS (r = 0.61, P < 0.001). The fact that DHAR and MDA are the top-ranked predictors of GYDRO and GYLOSS suggests that, during drought imposition, the oxidative stress status of the flag leaf is more predictive of GY performance than flag leaf central metabolism (Figure 3). This is underscored by the fact that another antioxidant enzyme, monodehydroascorbate reductase (MDHAR) was identified as the second (BayesB model) and third (PLSR and RR-BLUP) most important predictor of GYLOSS. MDHAR is only marginally correlated with GYDRO (r = 0.21, P = 0.05) (Supplemental Table S10) and not correlated with GYLOSS (Table 2). In addition, MDHAR was the only top-ranked predictor with a nonsignificant effect on GY (GYLOSS) when fit as a single explanatory variable in a linear model with the trait as response variable (Supplemental Table S11). Like DHAR, MDHAR regenerates oxidized ascorbate to its reduced form and is involved in the ascorbate–glutathione antioxidant cycle. Its presence as a top-ranked model variable suggests the importance of this cycle in counteracting drought-induced GY reduction, despite the weak relationship between its activity and GY. Interestingly, α-ketoglutaric acid, which was also positively associated with GYCON, ranked second in the BayesB and third in the RR-BLUP model as a predictor of GYDRO though it was not significantly correlated with GYDRO (Table 2). Differences in α-ketoglutaric acid values were negatively correlated with PH under both control and drought conditions (PHDRO: r = −0.38, P < 0.001), suggesting that the accumulation of this TCA cycle intermediate in shorter plants contributes positively to both constitutive GYCON performance and to GYDRO performance, even if the latter is also strongly determined by dynamic responses to stress (Figure 1).

Discussion

The major goal of this study was to compare the potential of flag leaf metabolism/oxidative stress status and genetic markers to predict GY performance in a diversity panel of indica rice accessions grown under well-watered and drought conditions in the field. Our results show that MetabOxi-based models predict GY with superior accuracy compared to genomic-based models. This higher accuracy can be explained by the fact that MetabOxi-variables integrate quantitative estimates of complex biological processes summarized as metabolite levels and/or enzyme activities. These measurements incorporate responses from a multi-layered network of regulation (DNA, RNA, and protein) in response to dynamic internal and external stimuli (Keurentjes, 2009; Sulpice and McKeown, 2015). This is especially true in the context of environmental stress-driven perturbations, including those caused by drought. Indeed, the dynamic response to external signals from a changing environment (phenotypic plasticity) is often characterized by changing metabolite levels and enzyme activities as a result of post-translational and/or transcriptional regulation (Stitt et al., 2010). In support of this hypothesis, we observed that MetabOxi-based models were superior at predicting GYLOSS and GYDRO, while genomic-based models predicted GYDRO with lower accuracy and were essentially unable to predict GYLOSS (Figure 2, A and C). Interestingly, under well-watered conditions, the predictive power for GYCON of MetabOxi- and genomic-based models was virtually identical. These findings indicate that under nonstress conditions, genetic determinants are equally predictive of plant performance as basal flag leaf central metabolites and oxidative stress markers/enzymes, but under suboptimal conditions, metabolic/biochemical traits provide valuable endophenotypes that are much more predictive of crop yield stability than genomic information (Kumar et al., 2017; Sulpice, 2020). The possibility of modeling multi-omics data for a deeper understanding of complex phenotypic traits (e.g. crop yield under stress) and to improve the accuracy of selection in breeding (mainly through GS) is a “hot topic” in plant systems biology and plant breeding (Jamil et al., 2020; Scossa et al., 2021; Tong and Nikoloski, 2021; van Dijk et al., 2021). Our results indicate that the integration of biochemical (MetabOxi) and genomic data in the same statistical model may slightly increase (GYCON and GYDRO) or decrease (GYLOSS) trait predictability compared with the best single omics (MetabOxi-based) models (Figure 2). These results differ from previous studies where the integration of multi-omics data improved the predictability of GY in maize and rice hybrids under nonstressed field conditions (Westhues et al., 2017; Schrag et al., 2018; Wang et al., 2019; Xu et al., 2020). A possible explanation for the difference might be that in our study we brought together two “distant” omics layers which are difficult to connect without the information carried by intermediate omics layers, that is, transcriptome and proteome. In support of this hypothesis, Schrag et al. (2018) found that the combination of two “close” omics layers, genome and transcriptome, was more predictive of GY in maize hybrids than combining genome and metabolome. However, Xu et al. (2020) found that genome and metabolome was the best omics combination for the prediction of GY in rice hybrids, and that a combination of three or four omics layers (adding transcriptome and proteome) provided no improvement. Thus, the value of integrating different types of omics data for the prediction of GY might also depend on factors such as crop species, data quality, and the presence of quantifiable environmental stress at the field site, with the latter having a strong association with metabolic/biochemical data, as clearly demonstrated in this study. This study also underscores the importance of the statistical model used for prediction, as the model itself can impact the predictability of the trait when different omics datasets are integrated. The predictability values of the MetabOxi + Genomic PLSR models were similar to the corresponding MetabOxi-based model for GYCON, but lower for GYDRO and GYLOSS (Figure 2). The fact that the same did not happen for the MetabOxi + Genomic-based RR-BLUP and BayesB models suggest that the PLSR algorithm might not be able to integrate the MetabOxi and genomic datasets in an efficient way. Indeed, the concatenation of two data matrices of vastly different sizes (111 versus 81,347 variables for the MetabOxi and genomic datasets, respectively) into a single matrix in the combined PLSR model might have over-represented the global data structure since the weighting of each variable is governed by the total sum of squares. This resulted in a reduction of the relative contribution of the MetabOxi variables for the prediction (Höskuldsson and Svinning, 2006; Reinke et al., 2018). Overall, these results highlight the need for further studies to better integrate omics data to fully exploit their explanatory power in the context of complex quantitative traits. Another important finding of our study is that the performance of MetabOxi-based models was little improved when PH and/or FT were introduced as covariates whereas the genomic-based models showed significant improvement, particularly for GYDRO and GYLOSS. The same analysis demonstrated that variation in PH had a stronger influence on GY performance than FT (not entirely surprising, given that FT was synchronized in the study). In rice, variation in the gibberellic acid 20-oxidase gene (Os20ox2), also known as the SEMI-DWARF1 (SD1) gene, is significantly associated with both PH and GY. A recessive allele of this gene, sd1-d (distinguished by a 382-bp deletion in Exons 1 and 2) was introduced during the “Green Revolution” in the 1960s and has become widely disseminated in modern, high-yielding rice varieties since that time (Asano et al., 2011). Semi-dwarf plants carrying sd1-d thrive under favorable conditions (e.g. in paddy fields with availability of water and nitrogen), but yield similarly to the taller, traditional varieties carrying the SD1 allele under unfavorable conditions (e.g. in upland/rainfed fields where water and/or nitrogen are in short and/or variable supply) (Lafitte et al., 2007). Thus, variation at the SD1 locus impacts not only PH but has far-reaching repercussions that impact yield performance, and many of the physiological differences associated with PH in the indica rice diversity panel are implicitly integrated into the values of metabolites and oxidative stress markers/enzymes in the MetabOxi dataset, but this level of integration is not observed in the genomic data. An examination of the top-ranked MetabOxi-variables selected as predictors of GYCON, GYDRO, and GYLOSS underscores the integrative nature of the metabolic data. All three GYCON models identified chlorogenic acid among the top predictors. Chlorogenic acid is strongly and negatively correlated with GYCON and positively correlated with PHCON. This compound is widely described in the literature for its beneficial antioxidant and anti-herbivore activity in plants (Niggeweg et al., 2004; Ferreres et al., 2011; Kundu and Vadassery, 2019), consistent with the high degree of environmental plasticity associated with tall, low-yielding traditional varieties of rice adapted to environmentally variable, low-input production systems (Lempe et al., 2013; Dwivedi et al., 2016). In contrast, the shorter, higher-yielding modern varieties have been bred for relatively stable, high-input systems where high levels of chlorogenic acid provided few advantages. It might be that in this field trial, under irrigated conditions and with the application of fertilizers and weed, insect and disease control, a constitutively higher activity of the chlorogenic acid pathway in the traditional, tall accessions represented a metabolic cost and conferred little or no advantage, as evidenced by the lower GY performance. Chlorogenic acid is also described in the literature as an intermediate compound in lignin biosynthesis (Volpi e Silva et al., 2019). Thus, a higher availability of this metabolite may positively affect growth, resulting in increased PH. Three intermediates of the TCA cycle (citric, isocitric, and α-ketoglutaric acid) were also among the top predicting MetabOxi-variables of the GYCON models. This fundamental pathway provides energy and carbon skeletons for many plant biosynthetic processes (Sweetlove et al., 2010; Araújo et al., 2012) and these three organic acids are positively correlated with GYCON and negatively with PHCON. The negative correlation with PHCON suggests the presence of an altered TCA/biosynthetic activity between the short, high-yielding varieties of the panel compared to the tall, lower-yielding traditional accessions. This indicates that the translation of increased radiation- and nitrogen-use efficiency into higher yields in modern semi-dwarf rice varieties (Zhu et al., 2016) is also determined by metabolic adaptations of central metabolism (Figure 3). During drought, leaf oxidative stress status is more predictive of GY performance than central metabolism. This is evidenced by the selection of three antioxidant enzymes/oxidative stress markers—DHAR, MDHAR, and MDA—as top-ranked variables from the MetabOxi-based models for predicting GYDRO and GYLOSS. When drought is imposed at the flowering stage, like in this study, plants are constrained in their ability to make system-wide metabolic adjustments due to the focused export of assimilates from the flag leaf to the developing panicles (Yoshida, 1972; Biswal and Kohli, 2013). This likely increases the importance of antioxidant mechanisms to counteract oxidative damage resulting from enhanced generation of reactive oxygen species (ROS) in response to drought (Melandri et al., 2021). Among the top predictors, MDA, which is negatively correlated with GYDRO and positively with GYLOSS, is a lipid peroxidation product indicative of stress-induced oxidative damage to the cellular lipid membranes (Møller et al., 2007). DHAR and MDHAR are two antioxidant enzymes involved in the ascorbate–glutathione cycle, the central redox-hub in planta, where oxidized ascorbate is recycled to its reduced form that, in turn, can be utilized for the scavenging of ROS (Foyer and Shigeoka, 2010; Foyer and Noctor, 2011; Smirnoff, 2011). In contrast to MDA, DHAR is positively correlated with GYDRO and negatively with GYLOSS. The opposite relationships of MDA and DHAR with GY traits highlight the importance of the ascorbate–glutathione cycle in preventing drought-induced oxidative damage and its negative impact on rice GY (Melandri et al., 2020a). Surprisingly, MDHAR is not correlated with GYDRO or GYLOSS. The importance of MDHAR in the prediction models may be associated with its ascorbate reducing activity that contributes to increase the efficiency of DHAR in a nonlinear synergistic fashion (Shin et al., 2013). Interestingly, the drought levels of DHAR (also of MDHAR and MDA) do not significantly correlate with variation in PHDRO. This suggests that there is abundant genotypic variation for the activity of this enzyme under drought in both the tall, low-yielding traditional landrace varieties and in the shorter, higher-yielding modern varieties of the panel. This combination of findings makes the ascorbate–glutathione cycle, and DHAR in particular, interesting as breeding targets for improving drought tolerance of rice varieties at the reproductive stage.

Conclusions

This study provides evidence that metabolic/biochemical traits, referred to as endophenotypes, can help bridge the gap between the genome and the visible phenome of plants, and that they outperform the explanatory power of genetic markers when used as variables in models for predicting yield performance under stress. Therefore, breeding pipelines aimed at improving drought resilience in rice could benefit from integrating the information carried by metabolic/biochemical traits representative of the plant endophenome. In particular, our study identified antioxidant enzymes and oxidative stress markers as strong predictors of drought tolerance in rice at the reproductive stage, with higher importance than variables associated with leaf central metabolism. Thus, high activity of leaf antioxidant enzymes and low oxidative damage represent two phenotypes that could guide the development of drought tolerant rice varieties. Despite their value as predictors, using oxidative stress markers and antioxidant enzyme activities as selection tools in breeding is challenging because of their responsiveness to environmental changes, developmental stages, and even diurnal variation. The effort involved in collecting a large number of plant tissue samples in the field, within a limited time window, and synchronizing the developmental stage of many hundreds of accessions, as done in this study, will likely remain a job for the fundamental research community or prebreeding experts, rather than for commercial breeders. Further efforts are needed to translate the information captured by metabolic/biochemical traits into rapid and cost-effective tools for routine breeding application.

Materials and methods

Genetic resources and experimental design

The 271 accessions of rice (O. sativa subsp. indica) (Supplemental Table S1) were part of a larger panel (approximately 300) used in a field experiment at the International Rice Research Institute, Los Baños, Philippines during the 2013 dry season. The panel includes traditional and improved indica rice varieties originating from rice-growing countries in tropical and sub-tropical regions around the world. The panel was evaluated for a number of diverse traits as the basis for GWA mapping (Rebolledo et al., 2016; Kadam et al., 2017, 2018; Melandri et al., 2020b). The experiment comprised a control field and a drought stress field, with three replicates (experimental blocks) of the panel arranged in a serpentine design for each treatment. To synchronize flowering, the accessions were divided into six groups according to the number of days to flowering (previously collected data), and progressively sown and transplanted, with intervals of 10 days between each group. Drought stress consisted of 14 consecutive days of water withholding applied only to the stress field at the reproductive stage (targeting 50% flowering). At the end of the stress period, the field was re-watered until all the accessions reached maturity for harvest. Further details on the field experiment can be found in Kadam et al. (2018).

Statistical analysis of agronomic traits

Best linear unbiased predictors (BLUEs) of PH, FT, and GY for individual accessions in the same treatment were calculated considering only field replicates (two for control and three for drought) used for the metabolomics and oxidative stress status analyses (Supplemental Table S1). The BLUEs for each line under the two experimental conditions were calculated by the following general mixed model: where is the response variable for the ith genotype at the jth block, is the intercept, is the effect of the ith genotype, is the random effect of the jth block with , and is the experimental error. PH under control (PHCON) and drought (PHDRO) conditions was expressed in centimeter. FT under control (FTCON) and drought (FTDRO) conditions were expressed as number of days (calendar days in 2013) required for 50% flowering. GY under control (GYCON) and drought (GYDRO) conditions was expressed in grams/meter square. Percentage of GY loss (GYLOSS) of each accession was calculated as 100*(GYCON – GYDRO)/(GYCON). BLUEs of GYCON, GYDRO, and GYLOSS were also calculated considering PH, FT, and PH&FT as covariates by the following general mixed model: where is the response variable for the ith genotype at the jth block, is the intercept, is the effect of the ith genotype, is the random effect of the jth block with , and are the covariates in the ith genotype and the jth block, and and are the regression slopes of the covariates (PH and/or FT) for the corrected BLUEs, and is the experimental error. For each agronomic trait under the same condition, broad-sense heritability (H2), which captures the proportion of phenotypic variance explained by genetic factors (Supplemental Table S3), was calculated by the following formula: where is the genotypic variance, is the environmental variance, and is the number of replications.

Leaf tissue sampling, metabolite profiling, oxidative stress status analysis, and data pre-processing

Flag/top leaves of the 271 rice accessions were sampled from control and drought field replicates (two for control and three for drought) and immediately frozen in liquid nitrogen as described in Melandri et al. (2020a). Drought field replicates were collected (09.30–11.00 h) on Day 14 of the stress treatment. Control field replicates were collected 2 days later, during the same time window. For each accession and condition, equal amounts of leaf tissue from each field replicate were pooled together before performing biochemical analyses. Leaf tissues were analyzed by untargeted GC–MS-based metabolite profiling to assess the variation in polar metabolites as described by Riewe et al. (2012) and Riewe et al. (2016). A total of 88 metabolites were identified, predominantly primary metabolites (amino acids, sugars, and organic acids). Glucose, fructose, and sucrose were quantified spectrophotometrically (Riewe et al., 2008). The same leaf materials were analyzed for the oxidative stress status. For this, the level of molecular antioxidants (2), oxidative stress markers (2), and the activity of enzymes (16) involved in the antioxidant system and in photorespiration were quantified using high-throughput colorimetric assays (Zinta et al., 2014; AbdElgawad et al., 2016). Further details on metabolite profiling and analysis of oxidative stress status of the samples can be found in Melandri et al. (2020a). The values of metabolites and oxidative stress markers/enzymes activities were log10 transformed to improve normality before being used for statistical analyses. Imputation of missing values of metabolites and oxidative stress markers/enzyme activities was performed by the function knnImputation in the R package “DMwR” (Torgo, 2010). The list of the 111 metabolites and oxidative stress markers/enzyme activities considered in this study, and their variation among accessions and treatments, are shown in Supplemental Table S4.

Genotypic data

The 271 accessions of this study represent a subgroup of a larger panel of 329 indica accessions that were genotyped using genotyping-by-sequencing. The genotypic dataset consisted of 91,591 SNP markers (with 22.8% missing data imputed by the Fast Phase Hidden Markov Model, Scheet and Stephens, 2006) with minor allele frequency (MAF) ≥ 0.05 (Rebolledo et al., 2016). The subset of accessions (271) used in this study altered the MAF threshold and therefore the 91,591 SNPs were re-filtered for MAF ≥ 0.05 to exclude rare alleles. The resulting 81,347 SNP map is available in a nucleotide-based hapmap format (hmp) as Supplemental Data Set 1 (.rds). Before being used for modeling, the SNP map was transformed from the hapmap format to a numeric “0, 1, 2” format using an R script, where “0” and “2” denote the major and minor homozygous alleles, respectively, and “1” denotes the heterozygote.

Multivariable models for the prediction of GY traits

Metabolome/oxidative stress status-based and genomic-based multivariable models were used to predict GY performance of the rice accessions. Three different methods were used to generate the prediction models: PLSR, RR-BLUP, and BayesB. For real prediction estimates based on independent data, the multivariable models were built employing a 10-fold CV procedure for which the 271 accessions were randomly subdivided into 10 groups without replacement. These 10 groups were kept the same in generating PLSR, RR-BLUP, and BayesB prediction models thus allowing for full comparability of their results. Each multivariable model was fit with data from 9th of the groups (training set), while data from the 10th group was used for model testing (test set), and the process was iterated such that each group of samples was used for model testing one time. The predictability (Q2) for the 10-fold CV models was calculated as follows: where PRESS is the predictive residual error sum of squares, and TSS the total sum of squares. PRESS and TSS were calculated as follows: where is the observed GY (GYCON and GYDRO, or GYLOSS) value of the ith individual, is the predicted GY value of the ith individual, and is the mean of the predicted GY values of the n (271) individuals.

PLSR models

Let Y be an n × 1 vector of GY responses (BLUEs of GYCON and GYDRO, or GYLOSS) and X is an n-observation by p-variable matrix of predictors (the set of 111 metabolites/oxidative stress markers and enzymes or the 81,347 SNP markers), PLSR aims to decompose X into a set of A orthogonal scores such that the covariance with corresponding Y scores is maximized. The X-weight and Y-loading vectors that result from the decomposition are used to estimate the vector of regression coefficients, βPLS, such that Y = X βPLS + ε where ε is an n × 1 vector of error terms. The R package “pls” (Mevik and Wehrens, 2007) was used for PLSR in this study. Each variable was centered (mean subtraction) and scaled (standard deviation division) before analysis. In the 10-fold CV procedure, for each training set, a PLSR model was constructed with the GY trait as a single dependent variable (Y) and the set of metabolites/oxidative stress markers and enzymes or the SNP markers as the independent variables (X). To choose the appropriate number of factors for each training model (A from above), leave-one-out cross validation was used to estimate root mean squared error (RMSECV) for models fit with zero through 10 factors (linear combinations of the metabolites/oxidative stress markers and enzymes or of the SNP markers), and the model that produced the smallest RMSECV was selected for prediction of the GY trait in the test set.

RR-BLUP models

The RR-BLUP model is described as follows: where is the GY response (BLUE of GYCON and GYDRO, or GYLOSS) of the ith individual, is the intercept, is the genotype at the kth predictor of the ith individual, is the total number of predictors (the set of 111 metabolites/oxidative stress markers and enzymes or the 81,347 SNP markers), is the estimated random additive effect of the kth predictor with , and is the residual error term with . The BLUP of each received the following penalty: where all the terms are the same as those described above. This model was implemented using the R package “rrBLUP” (Endelman, 2011).

BayesB models

The basic model of BayesB is the same as RR-BLUP, but in this case all parameters are treated as random variables in a Bayesian framework, and we do not assume the same variance for all predictor effects. The prior distributions were defined as and , where , for the intercept we assumed a flat prior. For each I, the prior distribution of is assumed to be zero with probability π and a scaled inverse chi-squared distribution with probability (1 − π). The prior of π is a beta distribution. The prior of is also a scaled inverse chi-squared distribution. A Gibbs sampler algorithm was then applied to infer all the parameters in the model. The BayesB model was implemented using the R package “BGLR” (Pérez and De Los Campos, 2014).

Predictor importance for the MetabOxi-based models

For the metabolome/oxidative stress status-based models, the relative importance of the predictors was summarized using rank-products (Smit et al., 2007; Mumm et al., 2016). To this purpose, for each of the 10 different single sub-models (generated by the cross-validating procedure), the predictors were ranked (from 1 to 111) based on their absolute regression coefficient (with rank 1 for the predictors with the highest absolute value). Then, for each predictor, the rank numbers from the 10 sub-models were multiplied together, giving a final rank-product for the overall model. A low rank-product implies that a predictor is of high importance in the model.

Combined MetabOxi and genomic-based models

For the PLSR models based on the combined MetabOxi and genomic data, a single data matrix of predictors (P-variables) was generated concatenating the two datasets based on the samples (n-observations). Then, the PLSR models for the prediction of GY responses were built as for the single datasets, described above. The combined RR-BLUP and BayesB models were built as follows: where SNP and MET denote the genomic and MetabOxi data, respectively. Parameter assumptions were the same as for the prediction models based on a single dataset described above.

Multi-trait MetabOxi and genomic-based models

For the multi-trait PLSR and RR-BLUP models based on MetabOxi and genomic data, the values of the secondary traits (PH, FT, and both) were included in the training sets of the CV procedure, but not in the test sets. Then the models were run as described above. It was not possible to run a multi-trait BayesB model because of the specific R package (“BGLR”; Pérez and De Los Campos, 2014) we used.

Supplemental data

The following materials are available in the online version of this article. Boxplots representing the PH performance under control (PHCON) and drought (PHDRO) conditions of the rice accessions carrying the minor (AA) or major (GG) alleles at the locus (Chr1 pos: 38,286,772 bp; Supplemental Data Set 1) associated with the gibberellin 20-oxidase biosynthetic gene (SEMI-DWARF1; recessive sd1 and functional wild-type SD1 allele). Distributions of GY traits in the 271 indica rice accessions sorted by increasing PH. PH, FT, and GY of the 271 indica rice accessions. Agronomic trait performance of the 271 indica rice accessions. Heritabilities and variances for PH, FT, and GY of the 271 indica rice accessions. . Flag leaf values of the 111 MetabOxi variables in the 271 indica rice accessions under control and drought conditions. Predictability of multi-trait MetabOxi- and genomic-based models for the prediction of GY traits. Ranking of the MetabOxi-variables in the PLSR, RR-BLUP, and BayesB models for the prediction of GY under control conditions. Ranking of the MetabOxi-variables in the PLSR, RR-BLUP, and BayesB models for the prediction of GY under drought conditions. Ranking of the MetabOxi-variables in the PLSR, RR-BLUP, and BayesB models for the prediction of drought-induced GY loss. Correlations between control values of the 111 MetabOxi-variables and GY, FT, and PH under control conditions. Correlations between drought values of the 111 MetabOxi-variables and GY, GY loss, FT, and PH under drought conditions. Results of the linear models created by fitting GY traits (response) and the top-ranked predictors (single explanatory variable) identified by the MetabOxi-based models. A total of 81,347 SNP map in hapmap (hmp) format. Click here for additional data file.
  58 in total

1.  Understanding oxidative stress and antioxidant functions to enhance photosynthesis.

Authors:  Christine H Foyer; Shigeru Shigeoka
Journal:  Plant Physiol       Date:  2010-11-02       Impact factor: 8.340

Review 2.  Integrating multi-omics data for crop improvement.

Authors:  Federico Scossa; Saleh Alseekh; Alisdair R Fernie
Journal:  J Plant Physiol       Date:  2020-12-17       Impact factor: 3.549

3.  Metabolic efficiency underpins performance trade-offs in growth of Arabidopsis thaliana.

Authors:  Sabrina Kleessen; Roosa Laitinen; Corina M Fusari; Carla Antonio; Ronan Sulpice; Alisdair R Fernie; Mark Stitt; Zoran Nikoloski
Journal:  Nat Commun       Date:  2014-03-28       Impact factor: 14.919

4.  Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism.

Authors:  Wei Chen; Yanqiang Gao; Weibo Xie; Liang Gong; Kai Lu; Wensheng Wang; Yang Li; Xianqing Liu; Hongyan Zhang; Huaxia Dong; Wan Zhang; Lejing Zhang; Sibin Yu; Gongwei Wang; Xingming Lian; Jie Luo
Journal:  Nat Genet       Date:  2014-06-08       Impact factor: 38.330

5.  Chlorogenic acid-mediated chemical defence of plants against insect herbivores.

Authors:  A Kundu; J Vadassery
Journal:  Plant Biol (Stuttg)       Date:  2019-01-08       Impact factor: 3.081

6.  Engineering plants with increased levels of the antioxidant chlorogenic acid.

Authors:  Ricarda Niggeweg; Anthony J Michael; Cathie Martin
Journal:  Nat Biotechnol       Date:  2004-04-25       Impact factor: 54.908

7.  Metabolic and developmental adaptations of growing potato tubers in response to specific manipulations of the adenylate energy status.

Authors:  David Riewe; Lukasz Grosman; Henrik Zauber; Cornelia Wucke; Alisdair R Fernie; Peter Geigenberger
Journal:  Plant Physiol       Date:  2008-02-27       Impact factor: 8.340

8.  Genetic Improvements in Rice Yield and Concomitant Increases in Radiation- and Nitrogen-Use Efficiency in Middle Reaches of Yangtze River.

Authors:  Guanglong Zhu; Shaobing Peng; Jianliang Huang; Kehui Cui; Lixiao Nie; Fei Wang
Journal:  Sci Rep       Date:  2016-02-15       Impact factor: 4.379

9.  Biomarkers for grain yield stability in rice under drought stress.

Authors:  Giovanni Melandri; Hamada AbdElgawad; David Riewe; Jos A Hageman; Han Asard; Gerrit T S Beemster; Niteen Kadam; Krishna Jagadish; Thomas Altmann; Carolien Ruyter-Spira; Harro Bouwmeester
Journal:  J Exp Bot       Date:  2020-01-07       Impact factor: 6.992

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.