Literature DB >> 30854479

A large-scale metabolomics study to harness chemical diversity and explore biochemical mechanisms in ryegrass.

Arvind K Subbaraj1, Jan Huege2, Karl Fraser2, Mingshu Cao2, Susanne Rasmussen2,3, Marty Faville2, Scott J Harrison2,4, Chris S Jones2,5.   

Abstract

Perennial ryegrass (Lolium perenne) is integral to temperate pastoral agriculture, which contributes most of the milk and meat production worldwide. Chemical profiles and diversity of ryegrass offer several opportunities to harness specific traits and elucidate underlying biological mechanisms for forage improvement. We conducted a large-scale metabolomics study of perennial ryegrass comprising 715 genotypes, representing 118 populations from 21 countries. Liquid/gas chromatography-mass spectrometry based targeted and non-targeted techniques were used to analyse fructan oligosaccharides, lipids, fatty acid methyl esters, polar and semi-polar compounds. Fructan diversity across all genotypes was evaluated, high- and low-sugar groups identified, and fructan accumulation mechanisms explored. Metabolites differentiating the two groups were characterised, modules and pathways they represent deduced, and finally, visualisation and interpretation provided in a biological context. We also demonstrate a workflow for large-scale metabolomics studies from raw data through to statistical and pathway analysis. Raw files and metadata are available at the MetaboLights database.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30854479      PMCID: PMC6399292          DOI: 10.1038/s42003-019-0289-6

Source DB:  PubMed          Journal:  Commun Biol        ISSN: 2399-3642


Introduction

Perennial ryegrass (Lolium perenne L. Family: Poaceae)[1] supports most of the milk and meat production worldwide[2]. Geno-phenotypic characteristics of ryegrass are therefore critical in determining feed quality for the animal[3], degradation in the rumen[4] and livestock production responses[5-7]. Consequently, breeding techniques[8-11] are employed to produce cultivars with desirable traits such as high water soluble carbohydrate content, neutral detergent fibre, crude protein content and digestibility, in addition to forage yield, seed yield, pest and disease resistance[12,13]. Of special interest and relevance to this study are the high-sugar cultivars, which have elevated levels of fructans. Fructans are the major storage carbohydrate in ryegrass, and are made up of varying degrees and complexities of linear or branched fructose polymers[14], denoted by the degree of polymerisation (DP). DP directs fructan accumulation and thereby the total sugar content of these high-sugar cultivars[15]. High-sugar cultivars are proposed to increase milk and meat production through enhanced protein utilisation by ruminants[16]. In addition, lipid composition of ryegrass affects quality of animal products[12], and secondary metabolites possess anti-parasitic activity in ruminants[17]. We hypothesised that chemical diversity of ryegrass, especially fructan content, offers opportunities to harness variation in these traits into cultivars for improved ruminant performance and novel product characteristics and, exploring underlying biochemical mechanisms of high-sugar grasses, in addition to a better mechanistic understanding of fructan accumulation, will also help decipher major changes in primary and secondary metabolism. Metabolomics[18,19] provides a snapshot of chemical diversity, enabling metabotypic classification of genotypes[20-22] and in conjunction with other –omics sciences[23], a better understanding of biological mechanisms[18]. The advent of advanced bio/cheminformatic tools and techniques[24] have since propelled metabolomics towards system-level evaluations via data-fusion[25] and pathway mapping[26]. We conducted a mass spectrometry based metabolomics study of 5 clonal replicates of 715 ryegrass genotypes (3575 plants), representing 118 populations from 21 countries (Supplementary Figure 1). Fructans, fatty acid methyl esters (FAMEs), lipids, polar and semi-polar compounds were analysed using ultra-high-performance liquid chromatography (U)HPLC and gas chromatography–mass spectrometry (GC–MS) systems. The objectives of the current study were to verify and demonstrate quality control measures undertaken for big metabolomics data, evaluate diversity of ryegrass genotypes for fructan/sugar content, and thereby identify high- and low-sugar plant genotypes under New Zealand climatic conditions, determine the role of DP of fructans in directing total sugar content, and finally elucidate potential metabolic variation between high- and low-sugar grasses in the context of data from other analytical streams (lipids, FAMEs, polar and semi-polar compounds). Constant monitoring and post-run evaluation of quality control parameters accounted for technical variation in samples, and where these parameters were not met, batches were re-run following instrument calibration. The quality control procedures adopted here were therefore appropriate for large-scale metabolomics studies, rendering reliable data for downstream processing. The sum of low- (3–5), mid- (10–12) and high- (18–20) DP fructans was used as a measure of total sugar content, and of the 715 genotypes surveyed, 39 high- and 31 low-sugar genotypes were identified. High-DP fructans contributed significantly more to the total sugar content, measured as hexose units, in high-sugar grasses. A negative correlation between high- and low-DP fructans in the high-sugar group, further identified 11 genotypes which had greater high-DP content than a reference high-sugar genotype (Aberdart). These results, in addition to immediate inclusion in breeding exercises, offer a better understanding of fructan accumulation in high-sugar grasses, and subsequently ample scope for genetic improvement. Between high- and low-sugar grasses, major differences in primary metabolism were observed, with most lipid classes and fatty acids significantly higher in the low-sugar group. Differences in secondary metabolism were also noticed, where high-sugar grasses recorded lower concentrations of flavonoids and lignins. Identification of compounds and mapping them to metabolic pathways, successfully led to visualisation of a biochemical snapshot of high-sugar grasses.

Results

Quality control monitoring and evaluation

In large-scale metabolomics studies, demonstration of quality control monitoring and verification of quality control parameters is a prerequisite which accounts for technical variation, and affects data quality and thereby subsequent interpretation. Drifts in mass accuracy and retention time of the internal standard in quality control samples indicates technical variation between batches of samples. Here, drifts in mass accuracy and retention times of the internal standard 2′,7′-Dichlorofluorescein in quality control samples of the semi-polar stream (positive ionisation mode), across all 36 batches was demonstrated (Supplementary Figure 2). Mass accuracy was within the ±5 ppm threshold (Supplementary Figure 2A), and retention time drifts were within ±0.2 min from the median (Supplementary Figure 2B). Quality control monitoring for the lipid and polar streams also generated identical results. A post-run evaluation of run-order effects was also conducted immediately after each batch was completed. Supplementary Figure 3 shows an exemplar principal component analysis (PCA) of a single batch of samples classified based on run-order, where Supplementary Figure 3A shows no significant run-order effect, while Supplementary Figure 3B shows a notable run-order effect. In this case, the batch representing Supplementary Figure 3A was proceeded to the super batch, whereas that representing Supplementary Figure 3B was re-run. Since quality control samples were not representative of the sample set, they clustered separately from the samples.

Chemical diversity

Fructan content: A typical total ion chromatogram of a sample run for fructan measurement, depicting low (3–5), mid (10–12) and high (18–20) DP ranges, is shown in Supplementary Figure 4. These ranges were used to measure the total sugar content of the sample. Samples in a single batch for fructan/sugar estimates before (Supplementary Figure 5A) and after (Supplementary Figure 5B) normalisation by a linear trend are shown in Supplementary Figure 5. Likewise, estimates between all 36 batches of samples, before (Supplementary Figure 6A) and after (Supplementary Figure 6B) normalisation for batch-effects, are shown in Supplementary Figure 6. The resultant data matrix, obtained after these normalisation procedures was used for classification of genotypes based on total sugar content. Normal distribution of ryegrass genotypic diversity based on total sugar content is shown in Fig. 1. Of the top 10%, only genotypes that fulfilled our two-tier criterion of having a minimum of three replicates in the top 10% of the whole sample set (3575), were classified as the high-sugar group (n = 133 samples; p = 0.0001; Tukey’s HSD). Based on these conditions, the high-sugar group comprised 39 genotypes, of which 19 had genetic lineage to New Zealand (Supplementary Table 1). As anticipated, genotypes of the high-sugar grasses Aberdart and Aurora, bred in the UK[27], were present in this group. The remainder of this pool was made up of genotypes from Netherlands, Denmark, Australia, Slovakia, Tunisia and Germany (Supplementary Table 1). Likewise, the bottom 10%, low-sugar group (n = 106), comprised 31 genotypes, with 14 being of New Zealand origin and the remainder of genotypes from Tunisia (Supplementary Table 1).
Fig. 1

Diversity of ryegrass genotypes (open circle, black up pointing triangle, black down pointing triangle) (a) based on total sugar content ±SE (n = 5), with black up pointing triangle denoting the top 10% (high-sugar group) and, black down pointing triangle denoting the bottom 10% (low-sugar group) that fulfilled the two-tier criterion, and b showing the genotypic diversity based on total sugar content, relative to a normal distribution curve

Diversity of ryegrass genotypes (open circle, black up pointing triangle, black down pointing triangle) (a) based on total sugar content ±SE (n = 5), with black up pointing triangle denoting the top 10% (high-sugar group) and, black down pointing triangle denoting the bottom 10% (low-sugar group) that fulfilled the two-tier criterion, and b showing the genotypic diversity based on total sugar content, relative to a normal distribution curve Fructan accumulation in high- vs. low-sugar grasses: Fructan accumulation was significantly higher (p = 0.0001; Tukey’s HSD) in high-sugar grasses across all DPs (Fig. 2a). However, within the high-sugar group, the contribution of high- and mid-DPs to the total sugar content was more prominent than low-DP (Fig. 2a). A negative correlation between high- and low-DP was evident within high-sugar genotypes (Fig. 2b), which when extended to the 39 genotypes, revealed the ones with higher high- to low-DP ratios (Fig. 2c). Provided genotypes with high-DP levels are preferred within the high-sugar group, 11 genotypes with greater high-DP content than the current standard for high-sugar genotypes (Aberdart) were identified (Fig. 2c).
Fig. 2

a Boxplots of total sugar content between high- and low-sugar groups, distributed across average peak intensities of low (DP3–5), mid (DP10–12) and high (DP18–20) degree of polymerisation (DP) of fructans. Patterned and shaded boxes denote high- and low-sugar groups, respectively. Different upper case letter codes presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. b Correlation analysis (Pearson method) between low-, mid- and high-DP fructans within the 39 high-sugar genotypes. Positive correlations are displayed in blue and negative correlations in red colour. Colour intensity and the size of the circle are proportional to the correlation coefficients. Legend colour at the bottom shows the correlation coefficients and the corresponding colours. c Comparison of average peak intensities ± SE of high- (Patterned bars) and low-DP (shaded bars) fructans across the 39 high-sugar genotypes, along with respective linear regression trends. The high-sugar genotype Aberdart is marked as the current standard. Data underlying the plots in c are available in Supplementary Data 3 

a Boxplots of total sugar content between high- and low-sugar groups, distributed across average peak intensities of low (DP3–5), mid (DP10–12) and high (DP18–20) degree of polymerisation (DP) of fructans. Patterned and shaded boxes denote high- and low-sugar groups, respectively. Different upper case letter codes presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. b Correlation analysis (Pearson method) between low-, mid- and high-DP fructans within the 39 high-sugar genotypes. Positive correlations are displayed in blue and negative correlations in red colour. Colour intensity and the size of the circle are proportional to the correlation coefficients. Legend colour at the bottom shows the correlation coefficients and the corresponding colours. c Comparison of average peak intensities ± SE of high- (Patterned bars) and low-DP (shaded bars) fructans across the 39 high-sugar genotypes, along with respective linear regression trends. The high-sugar genotype Aberdart is marked as the current standard. Data underlying the plots in c are available in Supplementary Data 3 Polar and semi-polar compounds: Following data processing, the final data matrices from the HILIC streams had 222 (positive) and 198 (negative), and those from the C18 streams had 175 (positive) and 152 (negative) metabolic features, respectively. Data for high- and low-sugar groups were compared, and of the total 747 features, 293 were significantly different between the two groups, based on t tests with a false-discovery rate cut-off of p < 0.05 (Fig. 3; Supplementary Data 1). Multivariate analysis with PCA for each analytical stream, failed to discriminate the two groups (Supplementary Figure 7).
Fig. 3

Cloudplot of t stat and −log10 p values of metabolic features from the HILIC and C18 streams (positive- and negative-ionisation modes), significantly different between high- and low-sugar groups, based on t tests with a false-discovery rate cut-off of p < 0.05. A positive t stat value indicates high- > low-sugar group, whereas a negative value indicates high- < low-sugar group. Cloud size represents the magnitude of –log10 p value. HP (Purple), HN (Green), CP (Brown) and CN (Blue) represent different analytical streams corresponding to HILIC positive, negative and C18 positive and negative, respectively

Cloudplot of t stat and −log10 p values of metabolic features from the HILIC and C18 streams (positive- and negative-ionisation modes), significantly different between high- and low-sugar groups, based on t tests with a false-discovery rate cut-off of p < 0.05. A positive t stat value indicates high- > low-sugar group, whereas a negative value indicates high- < low-sugar group. Cloud size represents the magnitude of –log10 p value. HP (Purple), HN (Green), CP (Brown) and CN (Blue) represent different analytical streams corresponding to HILIC positive, negative and C18 positive and negative, respectively An overview of the discriminating features (Fig. 3) showed that polar compounds from HILIC positive and negative-ionisation streams, indicative of primary metabolism, demonstrated maximum variation between the high- and low-sugar groups, compared to semi-polar compounds from C18 positive- and negative-ionisation streams, largely representative of secondary metabolism. The HILIC positive stream accounted for 104 significantly different features, 35 of which were higher in the high-sugar group (Fig. 3). The HILIC negative stream revealed 64 significantly different features, of which 29 were higher in the high-sugar group (Fig. 3). C18 positive and negative streams had 80 and 45 discriminating features, of which 29 and 16 respectively, were significantly higher in the high-sugar group (Fig. 3). Lipids: Major lipid classes identified by the non-targeted lipidomics method (Fig. 4a) and the targeted FAMEs method (Fig. 4b) are shown in Fig. 4. Taken together, concentrations of phosphatidylserine (p = 0.005), phosphatidylglycerol (p = 0.0001), phosphatidylcholine (p = 0.001), monogalactosyldiacylglycerol (p = 0.004), sulfoquinovosyldiacylglycerol (p = 0.0001), digalactosyldiacylglycerol (p = 0.009), diglycerides (p = 0.0001) and fatty acids C16:0 (p = 0.0001), C16:1 (p = 0.0001), C18:1 (p = 0.0001), C18:2 (p = 0.0001) and C18:3 (p = 0.0001) were higher in the low-sugar group (Fig. 4; Tukey’s HSD). Lysophosphatidylethanolamine, lysophosphatidylglycerol, lysophosphatidylcholine, phosphatidic acid, phosphatidylmethanol, phosphatidylethanolamine, phosphatidylinositol, monogalactosylmonoacylglycerol, digalactosylmonoacylglycerol, monoglycerides and fatty acid C18:0, were not significantly different between the two groups (p > 0.05). Triglycerides alone were in higher concentrations in the high-sugar group (p = 0.002; Tukey’s HSD; Fig. 4). Overall, 100 lipid species belonging to 18 lipid classes were identified by the non-targeted lipidomics stream (Supplementary data 2).
Fig. 4

Boxplots of peak intensities of lipid classes identified by a the non-targeted lipidomics method and b normalised peak intensities of fatty acid methyl esters (FAMEs) identified by the targeted method, between the high- and low-sugar groups. Different upper case letter codes, where presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. LPE lysophosphatidylethanolamine, LPG lysophosphatidylglycerol, LPC lysophosphatidylcholine, PA phosphatidic acid, Pme phosphatidylmethanol, PE phosphatidylethanolamine, PI phosphatidylinositol, PS phosphatidylserine, PG phosphatidylglycerol, PC phosphatidylcholine, TG triglyceride, MGMG monogalactosylmonoacylglycerol, DGMG digalactosylmonoacylglycerol, MGDG monogalactosyldiacylglycerol, SQDG sulfoquinovosyldiacylglycerol, DGDG digalactosyldiacylglycerol, MG monoglyceride, DG diglyceride. C16:0, C16:1, C18:0, C18:1, C18:2 and C18:3 refer to fatty acids with their respective number of carbon atoms and double bonds

Boxplots of peak intensities of lipid classes identified by a the non-targeted lipidomics method and b normalised peak intensities of fatty acid methyl esters (FAMEs) identified by the targeted method, between the high- and low-sugar groups. Different upper case letter codes, where presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. LPE lysophosphatidylethanolamine, LPG lysophosphatidylglycerol, LPC lysophosphatidylcholine, PA phosphatidic acid, Pme phosphatidylmethanol, PE phosphatidylethanolamine, PI phosphatidylinositol, PS phosphatidylserine, PG phosphatidylglycerol, PC phosphatidylcholine, TG triglyceride, MGMG monogalactosylmonoacylglycerol, DGMG digalactosylmonoacylglycerol, MGDG monogalactosyldiacylglycerol, SQDG sulfoquinovosyldiacylglycerol, DGDG digalactosyldiacylglycerol, MG monoglyceride, DG diglyceride. C16:0, C16:1, C18:0, C18:1, C18:2 and C18:3 refer to fatty acids with their respective number of carbon atoms and double bonds Compound identification: Compounds identified in the current study based on matching with a local library of authentic standards, de novo matching with public domain mass spectral databases, and/or the Mummichog programme, are presented in Table 1. A matching with the local library was given maximum confidence (Level 1), followed by matching with spectral databases (Level 2). Supplementary Figure 8 shows one such match of quinic acid/quinate (C7H12O6; KEGG ID—C00296), where the extracted ion chromatogram for the parent mass [M–H]− m/z 191.0554 from a sample in HILIC-negative ionisation stream co-elutes with those for diagnostic fragments m/z 173.0449 (C7H9O5), 127.0392 (C6H7O3), 111.0443 (C5H3O3) and 93.0336 (C6H5O) (Supplementary Figure 8A). Also, mass spectra of diagnostic fragments m/z 93.0336, 111.0078, and 127.0392 (Supplementary Figure 8B) matched with corresponding spectra in the public domain MS database METLIN[28] (Supplementary Figure 8C).
Table 1

Summary of compounds identified by matching with a local library of authentic standards, public domain mass spectral databases and/or Mummichog, with their respective KEGG IDs, analytical stream, univariate statistics and level of confidence in identification

KEGG IDNameStreamUnivariate statisticsIdentification
Foldt statAUCConfidenceLibraryDatabase Mummichog
Parent [M ± H]±Diagnostic m/z m/z Tentative match
C00317AmylopectinHP4.6217.111.0Level 3867.2384 M + K[1+]
C00208MaltoseHP2.4910.150.9Level 3307.1023, 325.1129, 381.0794 M–H4O2 + H[1+], M–H2O + H[1+], M + K[1+]
C01083alpha,alpha-TrehaloseHP2.4910.150.9Level 3307.1023, 325.1129, 381.0794M–H4O2 + H[1+], M–H2O + H[1+], M + K[1+]
C01235GalactinolHP2.4910.150.9Level 3307.1023, 325.1129, 381.0794M–H4O2 + H[1+], M–H2O + H[1+], M + K[1+]
C043326,7-Dimethyl-8-(1'-D-Ribityl)LumazineHP2.4910.150.9Level 3325.1129M[1+]
C00117d-Ribose 5'-PhosphateHP2.049.600.8Level 3145.0496M–HCOOK + H[1+]
C00124d-GalactoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C00137Myo-inositolHN2.049.600.8Level 1179.0556
C00221Beta-d-GlucoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C00231d-Xylulose 5'-PhosphateHP2.049.600.8Level 3145.0496M–HCOOK + H[1+]
C00267Alpha-d-GlucoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C00962Beta-d-GalactoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C009662-DehydropantoateHP2.049.600.8Level 3145.0496M[1+]
C01077O-acetyl-l-homoserineHP2.049.600.8Level 2102.0550, 74.060990.0556, 116.071, 118.0867, 145.0496, 180.0867M–C3H4O2 + H[1+], M–HCOOH + H[1+], M–CO2 + H[1+], M-NH3 + H[1+], M + H2O + H[1+]
C01112d-Arabinose 5-PhosphateHP2.049.600.8Level 3145.0496M–HCOOK + H[1+]
C01825l-GalactoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C01906HamameloseHP2.049.600.8Level 2181.0714, 163.0608145.0496M–H4O2 + H[1+]
C02336Beta-d-FructoseHP2.049.600.8Level 3145.0496M–H4O2 + H[1+]
C03906Beta-l-Arabinose 1-PhosphateHP2.049.600.8Level 3145.0496M–HCOOK + H[1+]
C04236(2S)-2-Isopropyl-3-OxosuccinateHP2.049.600.8Level 3101.0237, 127.0394, 145.0496M–C3H4O2 + H[1+], M–HCOOH + H[1+], M–CO + H[1+]
C06006(S)-2-Aceto-2-HydroxybutanoateHP2.049.600.8Level 3145.0496M[1+]
C005554-AminobutyraldehydeHP1.706.950.8Level 3127.0394M + K[1+]
C01210N-Methylethanolamine PhosphateHP1.706.950.8Level 3109.0288, 127.0394M–HCOOH + H[1+], M–CO + H[1+]
C023511,2-BenzoquinoneHP1.706.950.8Level 2109.0282, 81.0342109.0288, 81.034, 127.0394M + H[1+], M-CO + H[1+], M + H2O + H[1+]
C00296Quinate/Quinic acidHN2.266.720.8Level 2191.0554
C00111Glycerone PhosphateHP1.666.650.8Level 3125.0003, 85.029M–CO2 + H[1+], M–HCOOK + H[1+]
C00661d-Glyceraldehyde 3-PhosphateHP1.666.650.8Level 3125.0003, 85.029M–CO2 + H[1+], M–HCOOK + H[1+]
C012341-Aminocyclopropane-1-CarboxylateHP1.666.650.8Level 3102.0554, 74.0606, 84.0449, 85.029, 120.066M + H[1+], M–CO + H[1+], M–H2O + H[1+], M–NH3 + H[1+], M + H2O + H[1+]
C026312-IsopropylmaleateHP1.666.650.8Level 385.029M–C3H4O2 + H[1+]
C13482PhosphodimethylethanolamineHP1.666.650.8Level 397.029, 85.029M–C3H4O2 + H[1+], M + 2 H[2+]
C177591-O-Feruloyl-Beta-d-GlucoseHN1.534.450.7Level 3337.0929, 371.0984M–H2O-H[−], M–H + O[−]
C10883(+)-SesamolinHN1.723.430.7Level 3371.0984, 390.0726M(37Cl)–H[−], M + Na-2H[−]
C00152l-AsparagineHP1.77−2.540.6Level 1133.0614
C09315UmbelliferoneCN2.15−2.750.6Level 2161.0237, 162.0272161.0239, 202.0505M–H[−], M + ACN-H[−]
C01460VitexinCN1.65−2.750.6Level 2431.0976, 311.0555447.0929, 477.1036M–H + O[−], M + HCOO[−]
C01714IsovitexinCN1.65−2.750.6Level 3447.0929, 477.1036M-H + O[−], M + HCOO[−]
C01821IsoorientinCN1.65−2.750.6Level 3447.0929, 506.1017M(13C)–H[−], M + CH3COO[−]
C08604ChrysantheminCN1.65−2.750.7Level 3447.0929, 463.088, 493.1007, 506.1017M–H[−], M–H + O[−], M + HCOO[−], M + CH3COO[−]
C10114OrientinCN1.65−2.750.6Level 2447.0925, 327.0499447.0929, 506.1017M(13C)–H[−], M + CH3COO[−]
C12137Pelargonidin-3-O-Beta-D-GlucosideCN1.65−2.750.6Level 3447.0929, 477.1036M–H + O[−], M + HCOO[−]
C16298Cyanidin 5-O-Beta-D-GlucosideCN1.65−2.750.6Level 3447.0929, 506.1017M(13C)–H[−], M + CH3COO[−]
C01617TaxifolinCP1.77−2.850.6Level 3287.0548M–H2O + H[1+]
C05631EriodictyolCP1.77−2.850.6Level 3287.0548M[1 + ]
C05909LeucodelphinidinCP1.77−2.850.6Level 3287.0548M–H4O2 + H[1+]
C00327l-CitrullineHP1.66−4.250.7Level 2176.1038, 159.0772159.0766M–NH3 + H[1+]
C00065l-SerineHP1.51−4.410.6Level 1106.0506
C010928-Amino-7-OxononanoateCN1.75−4.910.7Level 3223.0608M + K-2H[−]
C02666Coniferyl AldehydeCN1.75−4.910.7Level 2177.0550, 162.0321193.0501, 223.0608M–H + O[−], M + HCOO[−]
C05610Sinapoyl AldehydeCN1.75−4.910.7Level 3207.0658, 223.0608M–H[−], M–H + O[−]
C03319DTDP-Beta-l-RhamnoseHP2.15−5.040.8Level 3501.0645M–HCOOH + H[1+]
C11907DTDP-4-Dehydro-6-Deoxy-Alpha-D-GlucopyranoseHP2.15−5.040.8Level 3501.0645M–CO2 + H[1+]
C00021S-Adenosyl-l-HomocysteineCN1.75−5.710.7Level 3385.1138M(34S)–H[−]
C011751-O-Sinapoyl-Beta-d-GlucoseCN1.75−5.710.7Level 2385.1133, 205.0499, 191.0554385.1138, 367.103M–H[−], M–H2O–H[−]
C168271-O-(4-Coumaroyl)-Beta-d-GlucoseCN1.75−5.710.7Level 3341.0875, 371.098, 385.1138M–H + O[−], M + HCOO[−], M + CH3COO[−]

HP, HN, CP and CN denote different analytical streams corresponding to HILIC positive, negative and C18 positive and negative, respectively; a positive t stat value indicates high- > low-sugar group (↑), whereas a negative value indicates high- < low-sugar group (↓); area under the curve (AUC) is a summary statistic for receiver–operator characteristic (ROC) curves, and denotes the trade-off between the specificity and sensitivity of a compound to enable binary classification of the two groups. On a rough scale, AUC values of 0.9–1.0 = excellent, 0.8–0.9 = good, 0.7–0.8 = fair, 0.6–0.7 = poor and 0.5–0.6 = fail, denote respective powers of the compound to direct binary classification[79]; compounds with matches in the library, spectral database or Mummichog were scored with 1, 2 or 3 levels of confidence, respectively[44].

Summary of compounds identified by matching with a local library of authentic standards, public domain mass spectral databases and/or Mummichog, with their respective KEGG IDs, analytical stream, univariate statistics and level of confidence in identification HP, HN, CP and CN denote different analytical streams corresponding to HILIC positive, negative and C18 positive and negative, respectively; a positive t stat value indicates high- > low-sugar group (↑), whereas a negative value indicates high- < low-sugar group (↓); area under the curve (AUC) is a summary statistic for receiver–operator characteristic (ROC) curves, and denotes the trade-off between the specificity and sensitivity of a compound to enable binary classification of the two groups. On a rough scale, AUC values of 0.9–1.0 = excellent, 0.8–0.9 = good, 0.7–0.8 = fair, 0.6–0.7 = poor and 0.5–0.6 = fail, denote respective powers of the compound to direct binary classification[79]; compounds with matches in the library, spectral database or Mummichog were scored with 1, 2 or 3 levels of confidence, respectively[44]. Metabolic features (m/z) and their tentative matches used by Mummichog for identification are presented in Table 1. As is evident, a single ion mass may relate to many compounds or a group of masses may relate to one compound. Nevertheless, the compound classes identified by Mummichog provides sufficient information to interrogate the data further towards a higher level of confidence (Level 2). In the case of coniferyl aldehyde (Table 1), which was initially identified by Mummichog using m/z 193.0501 and 223.0608 corresponding to M–H + O[−] and M + HCOO[−], respectively, the parent mass of coniferyl aldehyde (C10H10O3), [M–H]− m/z 177.0550, and its diagnostic fragment m/z 162.0321 (C9H6O3) were subsequently queried in the sample and spectral databases. As explained in Supplementary Figure 8, co-elution of the extracted ion chromatograms of these features, and matching of these spectra in the sample with corresponding spectra in MassBank[29], with a mass error of <5 ppm, led to tentative identification of coniferyl aldehyde with Level 2 confidence. Even so, redundancies in identifications by Mummichog, for example, d-Ribose 5-Phosphate, alpha-d-Galactose, beta-d-Glucose, beta-d-Galactose etc., all identified for m/z 145.0496, only conform to the identification of monosaccharides or monosaccharide phosphates in the high-sugar group. Therefore, Mummichog results helped identify potential leads for compound identification, albeit with a lower level of confidence. Modules and pathway analysis: Of all identified compounds input into KEGG Mapper (Table 1) with rice pathways (osa) as a reference, 29 mapped to metabolic pathways (osa01100), 23 to the biosynthesis of secondary metabolites (osa01110), 8 to the biosynthesis of amino acids (osa01230), 6 to carbon metabolism (osa01200) and the rest to miscellaneous pathways related to primary and secondary metabolism (Table 2). Each pathway is characterised by several modules, and each module comprises several compounds and corresponding reactions. Redundancies in a single compound represented in multiple modules, and a single module accommodating several identified compounds, was observed (Table 2). A pictorial representation of compounds mapped to respective pathways/modules in the context of high-sugar grasses is depicted in Fig. 5.
Table 2

KEGG pathway modules and hierarchical bin structure for MapMan style representation of compounds identified in Table 1, matched with rice reference pathways (osa) using KEGG Mapper

Bin codeNameModuleReactionCompoundKEGG ID
1Metabolic pathways
1.1 Nucleotide and amino acid metabolism
1.1.1 Cysteine and methionine metabolism
1.1.1.1Methionine degradationM00035S-Adenosyl-l-Methionine → l-HomocysteineS-Adenosyl-l-HomocysteineC00021
1.1.1.2Methionine degradationM00035l-Serine → l-Cystathioninel-SerineC00065
1.1.1.3Cysteine biosynthesisM00609Methionine → CysteineS-Adenosyl-l-HomocysteineC00021
1.1.1.4Cysteine biosynthesisM00021Serine → Cysteinel-SerineC00065
1.1.1.5Cysteine biosynthesisM00338Homocysteine + Serine → Cysteinel-SerineC00065
1.1.2 Serine and threonine metabolism
1.1.2.1Serine biosynthesisM00020Glycerate-3P → Serinel-SerineC00065
1.1.3 Cofactor and vitamin biosynthesis
1.1.3.1Ascorbate degradationM00550Ascorbate → d-Xylulose 5-Phosphated-Xylulose 5-PhosphateC00231
1.1.3.2Pantothenate biosynthesisM00119Valine/l-Aspartate → Pantothenate2-DehydropantoateC00966
1.1.3.3Biotin biosynthesisM00123Pimeloyl-ACP/CoA → Biotin8-Amino-7-OxononanoateC01092
1.1.3.4Biotin biosynthesis, BioI pathwayM00573Long-chain-acyl-ACP → Pimeloyl-ACP → Biotin8-Amino-7-OxononanoateC01092
1.1.3.5Biotin biosynthesis, BioW pathwayM00577Pimelate → Pimeloyl-CoA → Biotin8-Amino-7-OxononanoateC01092
1.1.3.6Ascorbate biosynthesisM00114Glucose 6-Phosphate → Ascorbatel-GalactoseC01825
1.1.3.7Riboflavin biosynthesisM00125GTP → Riboflavin/FMN/FAD6,7-Dimethyl-8-(d-Ribityl)LumazineC04332
1.1.4 Arginine and proline metabolism
1.1.4.1Urea cycleM00029l-CitrullineC00327
1.1.5 Branched-chain amino acid metabolism
1.1.5.1Valine/Isoleucine biosynthesisM00019Pyruvate → Valine/2-Oxobutanoate → Isoleucine(S)-2-Aceto-2-HydroxybutanoateC06006
1.1.5.2Isoleucine biosynthesisM00570Threonine → 2-Oxobutanoate → Isoleucine(S)-2-Aceto-2-HydroxybutanoateC06006
1.2 Carbohydrate and lipid metabolism
1.2.1 Lipid metabolism
1.2.1.1Ceramide biosynthesisM00094l-SerineC00065
1.2.1.2Inositol phosphate metabolismM00131Ins(1,3,4,5)P4 → Ins(1,3,4)P3 → Myo-inositolMyo-InositolC00137
1.2.2 Other carbohydrate metabolism
1.2.2.1PhotorespirationM00532l-SerineC00065
1.2.2.2Nucleotide sugar biosynthesisM00554Galactose → UDP-Galactosed-GalactoseC00124
1.2.2.3Galactose degradation, Leloir pathwayM00632Galactose → Alpha-D-Glucose 1-Phosphated-GalactoseC00124
1.2.2.4Glucuronate pathway, Uronate pathwayM00014d-Xylulose 5-PhosphateC00231
1.2.2.5Nucleotide sugar biosynthesisM00549Glucose → UDP-GlucoseAlpha-d-GlucoseC00267
1.2.2.6Trehalose biosynthesisM00565D-Glucose 1-Phosphate → Trehalosealpha,alpha-TrehaloseC01083
1.2.3 Central carbohydrate metabolism
1.2.3.1Glycolysis (Embden–Meyerhof pathway)M00001Glucose → PyruvateGlycerone PhosphateC00111
1.2.3.2Glycolysis, core module involving three-carbon compoundsM00002Glycerone PhosphateC00111
1.2.3.3GluconeogenesisM00003Oxaloacetate → Fructose 6-PhosphateGlycerone PhosphateC00111
1.2.3.4Pentose phosphate pathway (pentose phosphate cycle)M00004d-Ribose 5-PhosphateC00117
1.2.3.5PRPP biosynthesisM00005Ribose 5-Phosphate → PRPPd-Ribose 5-PhosphateC00117
1.2.3.6Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
1.2.3.7Pentose phosphate pathway, archaeaM00580Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
1.2.3.8Pentose phosphate pathway (pentose phosphate cycle)M00004d-Xylulose 5-PhosphateC00231
1.2.3.9Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Xylulose 5-PhosphateC00231
1.2.3.10Glycolysis (Embden–Meyerhof pathway)M00001Glucose → PyruvateAlpha-d-GlucoseC00267
1.2.4 Lipopolysaccharide metabolism
1.2.4.1CMP-KDO biosynthesisM00063d-Arabinose 5-PhosphateC01112
1.3 Energy metabolism
1.3.1 Carbon fixation
1.3.1.1Reductive pentose phosphate cycle (Calvin cycle)M00165d-Ribose 5-PhosphateC00117
1.3.1.2Reductive pentose phosphate cycleM00167Glyceraldehyde 3-Phosphate → Ribulose 5-Phosphated-Ribose 5-PhosphateC00117
1.4 Secondary metabolism
1.4.1 Biosynthesis of secondary metabolites
1.4.1.1Monolignol biosynthesisM00039Phenylalanine/Tyrosine → MonolignolConiferyl AldehydeC02666
1.4.1.2Monolignol biosynthesisM00039Phenylalanine/Tyrosine → MonolignolSinapoyl AldehydeC05610
1.5 Others
1.5.1Amino acidl-AsparagineC00152
1.5.2OligosaccharidesMaltoseC00208
1.5.3Glycolysis/GluconeogenesisBeta-d-GlucoseC00221
1.5.4Arginine and proline metabolism4-AminobutyraldehydeC00555
1.5.5Cysteine and methionine metabolismO-Acetyl-l-HomoserineC01077
1.5.6Lipopolysaccharide biosynthesisd-Arabinose 5-PhosphateC01112
1.5.7Biosynthesis of secondary metabolites1-Aminocyclopropane-1-CarboxylateC01234
1.5.8Biosynthesis of phenylpropanoidsTaxifolinC01617
1.5.9Ascorbate and aldarate metabolisml-GalactoseC01825
1.5.10Amino sugar and nucleotide sugar metabolismBeta-l-Arabinose 1-PhosphateC03906
1.5.11Biosynthesis of secondary metabolites(2S)-2-Isopropyl-3-OxosuccinateC04236
1.5.12Biosynthesis of secondary metabolitesPelargonidin 3-O-GlucosideC12137
2Biosynthesis of secondary metabolites
2.1 Nucleotide and amino acid metabolism
2.1.1 Cysteine and methionine metabolism
2.1.1.1Cysteine biosynthesisM00021Serine → Cysteinel-SerineC00065
2.1.1.2Ethylene biosynthesisM00368Methionine → Ethylene1-Aminocyclopropane-1-CarboxylateC01234
2.1.2 Cofactor and vitamin biosynthesis
2.1.2.1Pantothenate biosynthesisM00119Valine/l-Aspartate → Pantothenate2-DehydropantoateC00966
2.1.2.2Ascorbate biosynthesisM00114Glucose 6-Phosphate → Ascorbatel-GalactoseC01825
2.1.2.3Riboflavin biosynthesisM00125GTP → Riboflavin/FMN/FAD6,7-Dimethyl-8-(d-Ribityl)LumazineC04332
2.1.3 Branched-chain amino acid metabolism
2.1.3.1Valine/Isoleucine biosynthesisM00019Pyruvate → Valine/2-Oxobutanoate → Isoleucine(S)−2-Aceto-2-HydroxybutanoateC06006
2.2 Carbohydrate and lipid metabolism
2.2.1 Other carbohydrate metabolism
2.2.1.1PhotorespirationM00532l-SerineC00065
2.2.1.2Trehalose biosynthesisM00565D-Glucose 1-Phosphate → Trehalosealpha,alpha-TrehaloseC01083
2.3 Secondary metabolism
2.3.1 Biosynthesis of secondary metabolites
2.3.1.1Monolignol biosynthesisM00039Phenylalanine/Tyrosine → MonolignolConiferyl AldehydeC02666
2.3.1.2Monolignol biosynthesisM00039Phenylalanine/Tyrosine → MonolignolSinapoyl AldehydeC05610
2.4 Others
2.4.1Flavonoid biosynthesisEriodictyolC05631
2.4.2Flavonoid biosynthesisLeucodelphinidinC05909
2.4.3Biosynthesis of phenylpropanoidsUmbelliferoneC09315
2.4.4Biosynthesis of secondary metabolitesPelargonidin 3-O-GlucosideC12137
2.4.5Biosynthesis of phenylpropanoidsTaxifolinC01617
2.4.6Biosynthesis of secondary metabolites2-IsopropylmaleateC02631
2.4.7Biosynthesis of secondary metabolites(2S)-2-Isopropyl-3-OxosuccinateC04236
3Biosynthesis of amino acids
3.1 Nucleotide and amino acid metabolism
3.1.1 Cysteine and methionine metabolism
3.1.1.1Cysteine biosynthesisM00609Methionine → CysteineS-Adenosyl-l-HomocysteineC00021
3.1.1.2Cysteine biosynthesisM00021Serine → Cysteinel-SerineC00065
3.1.1.3Cysteine biosynthesisM00338Homocysteine + Serine → Cysteinel-SerineC00065
3.1.2 Serine and threonine metabolism
3.1.2.1Serine biosynthesisM00020Glycerate 3-Phosphate → Serinel-SerineC00065
3.1.3 Arginine and proline metabolism
3.1.3.1Urea cycleM00029l-CitrullineC00327
3.1.3.2Arginine biosynthesisM00844Ornithine → Argininel-CitrullineC00327
3.1.3.3Arginine biosynthesisM00845Glutamate → Acetylcitrulline → Argininel-CitrullineC00327
3.1.4 Branched-chain amino acid metabolism
3.1.4.1Valine/Isoleucine biosynthesisM00019Pyruvate → Valine/2-Oxobutanoate → Isoleucine(S)-2-Aceto-2-HydroxybutanoateC06006
3.1.4.2Isoleucine biosynthesisM00570Threonine → 2-Oxobutanoate → Isoleucine(S)-2-Aceto-2-HydroxybutanoateC06006
3.2 Carbohydrate and lipid metabolism
3.2.1 Central carbohydrate metabolism
3.2.1.1Glycolysis, core module involving three-carbon compoundsM00002Glycerone PhosphateC00111
3.2.1.2PRPP biosynthesisM00005Ribose 5-Phosphate → PRPPd-Ribose 5-PhosphateC00117
3.2.1.3Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
3.2.1.4Pentose phosphate pathway, archaeaM00580Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
3.2.1.5Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Xylulose 5-PhosphateC00231
3.3 Others
3.3.1Amino acidl-AsparagineC00152
4Carbon metabolism
4.1 Nucleotide and amino acid metabolism
4.1.1 Serine and threonine metabolism
4.1.1.1Serine biosynthesisM00020Glycerate 3-Phosphate → Serinel-SerineC00065
4.1.2Cysteine and methionine metabolism
4.1.2.1Cysteine biosynthesisM00021Serine → Cysteinel-SerineC00065
4.2 Energy metabolism
4.2.1 Methane metabolism
4.2.1.1Formaldehyde assimilation, serine pathwayM00346l-SerineC00065
4.2.1.2Formaldehyde assimilation, xylulose monophosphate pathwayM00344Glycerone PhosphateC00111
4.2.1.3Formaldehyde assimilation, ribulose monophosphate pathwayM00345Glycerone PhosphateC00111
4.2.2 Carbon fixation
4.2.2.1Reductive pentose phosphate cycle (Calvin cycle)M00165d-Ribose 5-PhosphateC00117
4.2.2.2Reductive pentose phosphate cycleM00167Glyceraldehyde 3-Phosphate → Ribulose 5-Phosphated-Ribose 5-PhosphateC00117
4.3 Carbohydrate and lipid metabolism
4.3.1 Central carbohydrate metabolism
4.3.1.1Glycolysis (Embden-Meyerhof pathway)M00001Glucose → PyruvateGlycerone PhosphateC00111
4.3.1.2Glycolysis, core module involving three-carbon compoundsM00002Glycerone PhosphateC00111
4.3.1.3Pentose phosphate pathway (Pentose phosphate cycle)M00004d-Ribose 5-PhosphateC00117
4.3.1.4PRPP biosynthesisM00005Ribose 5-Phosphate → PRPPd-Ribose 5-PhosphateC00117
4.3.1.5Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
4.3.1.6Pentose phosphate pathway, archaeaM00580Fructose 6-Phosphate → Ribose 5-Phosphated-Ribose 5-PhosphateC00117
4.3.1.7Pentose phosphate pathway (Pentose phosphate cycle)M00004d-Xylulose 5-PhosphateC00231
4.3.1.8Pentose phosphate pathway, non-oxidative phaseM00007Fructose 6-Phosphate → Ribose 5-Phosphated-Xylulose 5-PhosphateC00231
4.3.1.9Glycolysis (Embden-Meyerhof pathway)M00001Glucose → PyruvateAlpha-d-GlucoseC00267
4.3.2 Other carbohydrate metabolism
4.3.2.1PhotorespirationM00532l-SerineC00065
4.4 Others
4.4.1Glycolysis/GluconeogenesisBeta-d-GlucoseC00221
5Galactose metabolism
5.1Glycolysis, core module involving three-carbon compoundsGlycerone PhosphateC00111
5.2Galactose degradationd-GalactoseC00124
5.3Inositol phosphate metabolismMyo-InositolC00137
5.4Glycolysis (Embden–Meyerhof pathway)Alpha-d-GlucoseC00267
5.5Galactose metabolismGalactinolC01235
6ABC transporters
6.1Phosphate and amino acid transportersl-SerineC00065
6.2Monosaccharide transportersMyo-InositolC00137
6.3Oligosaccharide, polyol, and lipid transportersMaltoseC00208
6.4Oligosaccharide, polyol, and lipid transportersalpha,alpha-trehaloseC01083
7Cysteine and methionine metabolism
7.1Methionine degradationM00035S-Adenosyl-l-HomocysteineC00021
7.2Glycine, serine and threonine metabolisml-SerineC00065
7.3Aspartate metabolismO-Acetyl-l-HomoserineC01077
7.4Propanoate metabolism1-Aminocyclopropane-1-CarboxylateC01234
8Flavonoid biosynthesis
8.1Flavone and flavonol biosynthesisVitexinC01460
8.2Flavone and flavonol biosynthesisTaxifolinC01617
8.3Flavanone biosynthesisEriodictyolC05631
8.4Flavan 3,4-diols biosynthesisLeucodelphinidinC05909
9Glycerophospholipid metabolism
9.1Ether lipid metabolismGlycerone PhosphateC00111
9.2Phosphocholine biosynthesisN-Methylethanolamine PhosphateC01210
9.3Phosphocholine biosynthesisPhosphodimethylethanolamineC13482
10Glycolysis/gluconeogenesis
10.1Core module involving three-carbon compoundsGlycerone PhosphateC00111
10.2Starch and sucrose metabolismBeta-d-GlucoseC00221
10.3Starch and sucrose metabolismAlpha-d-GlucoseC00267
11Amino sugar and nucleotide sugar metabolism
11.1Uridine diphosphate sugar metabolismAlpha-d-GlucoseC00267
11.2Uridine diphosphate sugar metabolismBeta-d-FructoseC02336
11.3Uridine diphosphate sugar metabolismBeta-l-Arabinose 1-PhosphateC03906
12Carbon fixation in photosynthetic organisms
12.1GlycolysisGlycerone PhosphateC00111
12.2Reductive pentose phosphate cycled-Ribose 5-PhosphateC00117
12.3Reductive pentose phosphate cycled-Xylulose 5-PhosphateC00231
132-Oxocarboxylic acid metabolism
13.1Pyruvate reductive amination2-IsopropylmaleateC02631
13.2Pyruvate reductive amination(2S)−2-Isopropyl-3-OxosuccinateC04236
13.3Pyruvate reductive amination(S)−2-Aceto-2-HydroxybutanoateC06006
14Phenylpropanoid biosynthesis
14.1Sinapate derivatives1-O-Sinapoyl-Beta-d-GlucoseC01175
14.2Coniferyl alcohol derivativesConiferyl AldehydeC02666
14.3Sinapate derivativesSinapoyl AldehydeC05610
15Pentose phosphate pathway
15.1PRPP biosynthesisRibose 5-Phosphate → PRPPd-Ribose 5-PhosphateC00117
15.2PRPP biosynthesisBeta-d-GlucoseC00221
15.3PRPP biosynthesisd-Xylulose 5-PhosphateC00231
16Anthocyanin biosynthesis
16.1FlavonoidsChrysantheminC08604
16.2FlavonoidsPelargonidin 3-O-GlucosideC12137
16.3FlavonoidsCyanidin 5-O-GlucosideC16298
17Ascorbate and aldarate metabolism
17.1Sugar alcoholsMyo-InositolC00137
17.2Ascorbate degradationAscorbate → d-Xylulose 5-Phosphated-Xylulose 5-PhosphateC00231
17.3Aldosesl-GalactoseC01825
18Valine, leucine and isoleucine biosynthesis
18.1Pyruvate metabolism2-IsopropylmaleateC02631
18.2Leucine biosynthesis(2S)-2-Isopropyl-3-OxosuccinateC04236
18.3Isoleucine biosynthesis(S)-2-Aceto-2-HydroxybutanoateC06006
19Glycine, serine and threonine metabolism
19.1Pyruvate, cysteine and tryptophan metabolisml-SerineC00065
20Pentose and glucuronate interconversions
20.1Glucuronate interconversionGlycerone PhosphateC00111
20.2Pentose interconversiond-Xylulose 5-PhosphateC00231
21Fructose and mannose metabolism
21.1Core module involving three-carbon compoundsGlycerone PhosphateC00111
21.2Fructose biosynthesisAlpha-d-GlucoseC00267
22Starch and sucrose metabolism
22.1OligosaccharidesMaltoseC00208
22.2Oligosaccharidesalpha,alpha-TrehaloseC01083
23Inositol phosphate metabolism
23.1Core module involving three-carbon compoundsGlycerone PhosphateC00111
23.2Inositol phosphate metabolismIns(1,3,4,5)P4 → Ins(1,3,4)P3 → Myo-InositolMyo-InositolC00137
24Flavone and flavonol biosynthesis
24.1FlavonesVitexinC01460
24.2FlavonesIsovitexinC01714
25Phenylalanine, tyrosine and tryptophan biosynthesis
25.1Shikimate pathwayQuinateC00296

KEGG pathways and modules that involve compounds identified in Table 1 were generated by KEGG Mapper[75]; Bin codes denote the hierarchical structure of KEGG pathways and modules used to generate MapMan[77] style representation of the identified compounds (Fig. 4); ↑ refers to a compound with positive t stat value (high- > low-sugar group), whereas ↓ refers to a compound with a  negative t stat value (high- < low-sugar group).

Fig. 5

Pictorial representation of biochemical activity of high-sugar grasses in MapMan[77] (Ver 3.6.0RC1; copyright of the Max–Planck-Institute for Molecular Plant Physiology, Golm, Germany), depicting KEGG pathways and modules involving the compounds identified in Table 1, with rice pathways (osa) as a reference (Table 2). Dots represent compounds identified in respective modules, where red dots denote compounds with positive t stat value (high- > low-sugar group), and blue dots denote compounds with negative t stat value (high- < low-sugar group). CHO: carbohydrates, Bra-AA: branched chain amino acids, CoF and Vit: cofactor and vitamins, ABC: ATP-binding cassette

KEGG pathway modules and hierarchical bin structure for MapMan style representation of compounds identified in Table 1, matched with rice reference pathways (osa) using KEGG Mapper KEGG pathways and modules that involve compounds identified in Table 1 were generated by KEGG Mapper[75]; Bin codes denote the hierarchical structure of KEGG pathways and modules used to generate MapMan[77] style representation of the identified compounds (Fig. 4); ↑ refers to a compound with positive t stat value (high- > low-sugar group), whereas ↓ refers to a compound with a  negative t stat value (high- < low-sugar group). Pictorial representation of biochemical activity of high-sugar grasses in MapMan[77] (Ver 3.6.0RC1; copyright of the Max–Planck-Institute for Molecular Plant Physiology, Golm, Germany), depicting KEGG pathways and modules involving the compounds identified in Table 1, with rice pathways (osa) as a reference (Table 2). Dots represent compounds identified in respective modules, where red dots denote compounds with positive t stat value (high- > low-sugar group), and blue dots denote compounds with negative t stat value (high- < low-sugar group). CHO: carbohydrates, Bra-AA: branched chain amino acids, CoF and Vit: cofactor and vitamins, ABC: ATP-binding cassette All major pathways, i.e., metabolic, biosynthesis of secondary metabolites, biosynthesis of amino acids and carbon metabolism, were broadly classified into nucleotide and amino acid, carbohydrate and lipid and energy metabolism modules (Fig. 5). Other modules related to primary metabolism comprised galactose metabolism, glycolysis, amino sugar metabolism, carbon fixation, oxocarboxylic acid metabolism, pentose phosphate pathway, ascorbate metabolism, pentose and glucuronate conversions, fructose and mannose metabolism, starch and sucrose metabolism, inositol phosphate metabolism and glycerophospholipid metabolism, while modules related to secondary metabolism comprised flavone, flavonol, anthocyanin, flavonoid and phenylpropanoid biosynthesis (Table 2; Fig. 5). In all major pathways, compounds related to metabolism of the amino acids cysteine, methionine, serine, threonine, arginine and proline were found in low concentrations in the high-sugar group. However, their intermediate products (S)-2-Aceto-2-Hydroxybutanoate, O-Acetyl-l-Homoserine, (2S)-2-Isopropyl-3-Oxosuccinate and 2-Dehydropantoate were at higher concentrations (Fig. 5). These intermediate products were in turn involved in the biosynthesis of branched-chain amino acids leucine, isoleucine and valine (M00019, M00570 and M00432; Fig. 5; Table 2). On the other hand, compounds related to carbohydrate, lipid and energy metabolism were at high concentrations, while those related to the biosynthesis of secondary metabolites were low (Fig. 5). L-Serine and S-Adenosyl-l-Homocysteine were primarily involved in cysteine, methionine, serine and threonine metabolism (Modules M00035, M00609, M00021, M00338 and M00020; Table 2). l-Citrulline was involved in arginine and proline metabolism via the urea cycle (M00029), and (S)-2-Aceto-2-Hydroxybutanoate was involved in branched chain amino acid metabolism (M00019 and M00570). Glycerone phosphate, a breakdown product of fructose 1, 6-Biphosphate and an isomer of 3-Phosphoglyceraldehyde (3-PGA), d-Ribose 5-Phosphate, d-Xylulose 5-Phosphate and alpha-d-Glucose, all found in higher concentrations in the high-sugar group, were involved in the central carbohydrate metabolism via glycolysis/gluconeogenesis (M00001, M00002 and M00003) and the pentose phosphate pathway (M00007, M00580 and M00004; Fig. 5; Table 2). Other compounds such as myo-Inositol were involved in lipid metabolism via the inositol phosphate metabolism module (M00131: Table 2), and d-Arabinose 5-Phosphate was involved in lipopolysaccharide biosynthesis (M00063: Table 2), both reportedly at increased concentrations in the high-sugar group (Fig. 5). l-Serine (low concentration), d-Galactose, d-Xylulose 5-Phosphate, alpha-d-Glucose and alpha,alpha-Trehalose (all high concentrations) were some of the metabolites involved in other carbohydrate metabolism modules which comprised photorespiration (M00532), galactose degradation (M00632), glucuronate pathway (M00014) and nucleotide sugar biosynthesis (M00549). Ryegrass being a C-3 plant[30], the energy metabolism pathway comprising carbon fixation and methane metabolism modules involved d-Ribose 5-Phosphate (high concentration) in the Calvin cycle (M00165) and glycerone phosphate (high concentration) in formaldehyde assimilation (M00344 and M00345), respectively. Biosynthesis of secondary metabolites involved coniferyl aldehyde, sinapoyl aldehyde, 1-O-Sinapoyl-beta-d-Glucose and 1-O-(4-Coumaroyl)-beta-d-Glucose, all in low concentrations, and related to the biosynthesis of monolignols (M00039). Compounds classified as others’ represented metabolites that did not feature in any modules, but were indirectly involved in the corresponding pathways. In that light, compounds classified as others’ in metabolic pathways mainly comprised amino acids, sugars and sugar phosphates (predominantly in higher concentrations), and those classified under the biosynthesis of secondary metabolites comprised umbelliferone, vitexin, isovitexin, orientin, isoorientin, pelargonidin-3-O-beta-d-Glucoside, cyanidin-3-O-beta-d-Glucoside and cyanidin 5-O-beta-d-Glucoside, (predominantly in lower concentrations; Fig. 5; Table 2). Compounds belonging to minor pathways were classified as those involved in primary or secondary metabolism (Fig. 5). Clearly, compounds related to the primary metabolism of sugars via galactose metabolism, glycolysis, amino sugar metabolism, carbon fixation, oxocarboxylic acid metabolism, pentose phosphate pathway, ascorbate metabolism, pentose and glucuronate conversions, fructose and mannose metabolism, starch and sucrose metabolism, inositol phosphate metabolism and lipids via glycerophospholipid metabolism, were all found at higher concentrations in high-sugar grasses. Likewise, compounds related to secondary metabolism via flavone, flavonol, anthocyanin, flavonoid, and phenylpropanoid biosynthesis, were all found in lower concentrations (Fig. 5; Table 2). Amongst the ABC transporters, l-Serine, a phosphate and amino acid transporter was found in low concentration, whereas myo-Inositol, maltose and alpha, alpha-Trehalose, mono- and oligosaccharide transporters, were found at higher concentrations (Fig. 5; Table 2). Quinate/Quinic acid, involved in the biosynthesis of phenylalanine, tyrosine and tryptophan via the shikimate pathway, was found at higher concentration in high-sugar grasses, and so were compounds involved in the biosynthesis of the branched chain amino acids valine, leucine and isoleucine (Fig. 5; Table 2). Overall, a snapshot of metabolic pathways in high-sugar grasses revealed: an increase in concentrations of compounds involved in primary metabolism, mainly comprising sugars; decrease in concentrations of compounds involved in secondary metabolism, mainly comprising flavonoids and lignins; decrease in concentrations of compounds involved in cysteine, methionine, serine, arginine, proline and threonine metabolism, and; a concomitant increase in concentrations of compounds involved in the metabolism of branched chain amino acids valine, isoleucine and leucine, and aromatic amino acids phenylalanine, tyrosine and tryptophan.

Discussion

The distribution of 715 ryegrass genotypes based on their total sugar content (Fig. 1), provides an overview of their performance under natural NZ autumn conditions. High-sugar content in leaves has a direct relationship with the metabolisable energy of ryegrass, which in turn enhances protein capture and supply to the ruminant[16,31]. While, seasonal variation of fructan content has also been hypothesised[32], we have only measured fructan content at a single seasonal time point. A time-resolved analysis of fructan diversity is therefore expected to shed more light on the genotype and/or genotype × environment interactions[33] in the high-sugar group identified here (Supplementary Table 1). We also hypothesised that the DP contributes to differences in total sugar content, both between and within the high- and low-sugar groups. Between the high- and low-sugar groups, all three DP classes (low, mid or high) were significantly higher in the high-sugar group (Fig. 2a). However, as reported earlier[15], within the high-sugar group, the contribution of high-DP fructans to the total fructan content was greater than the low-DP fructans (Fig. 2a). For the standard high-sugar genotypes (Aber cultivars), high-DP fructans are considered critical in retaining the total sugar content under seasonal variation compared to normal genotypes[34]. Therefore, considering high-DP fructans are sought for breeding reliable high-sugar genotypes, 11 genotypes which have greater high-DP content than Aberdart, have been identified (Fig. 2c). Given that fructan accumulation and degradation in ryegrass is still poorly understood[14], the negative correlation between high- and low-DPs within the high-sugar group (Fig. 2b), remains unexplained and deems further investigation. The metabolomics approach employed here for fructan analysis has therefore led to a better understanding of fructan accumulation in high-sugar genotypes prior to genotype selection. Moreover, the contrast provided here with the low-sugar group, in the context of lipids, polar and semi-polar compounds (Figs. 3 and 4), has established a platform to scrutinise underlying biological mechanisms (Fig. 5; Table 2). Another route to high-metabolisable energy has been via increased lipid content in leaves[35]. Elevated levels of triglycerides/triacylglycerols and poly-unsaturated fatty acids at the cost of proteins, would lead to improved nitrogen utilisation and feed conversion efficiency of the ruminants[35], resulting in enhanced nutritional value of the end-products[36]. Exclusive efforts aimed at high-lipid content, primarily based on metabolic engineering, are therefore in practice[35,37,38]. While fatty acids have been the focus of this endeavour[36], other lipid classes or the lipidome have largely been ignored. The lipid profile presented here (Fig. 4a), therefore potentially marks the first comprehensive report of the ryegrass lipidome (Supplementary Data 2). A negative relationship between sugar and fatty acid content (Fig. 4b) has also been reported by Morgan[36]. When the sugar content is low due to diurnal, environmental or genotypic factors, plants have the ability to redirect cellular activity towards lipid catabolism, resulting in an increase in fatty acids[39]. Manipulating this carbon flux towards lipid biosynthesis has also been suggested to achieve elevated levels of lipids in plants[37]. A few classes of phospholipids, i.e., phosphatidylserine, phosphatidylglycerol and phosphatiylcholine, diglycerides and galactolipids were found in higher concentrations in the low-sugar group and triglycerides were higher in the high-sugar group (Fig. 4a). Galactolipids are found in thylakoid membranes of chloroplasts and are abundant in photosynthetically active leaves[40]. A hypothesis that an increase in FA content correlates with chlorophyll content in leaves, and thereby galactolipids, was tested by Morgan[36]. In general, polar lipids mainly contributed by galactolipids had a positive correlation with FA content, whereas neutral lipids mainly comprising triglycerides had a negative correlation[36]. These results are in accordance with the present study (Fig. 4). Triglycerides are storage lipids and, until recently, were thought to be absent in leaves[41]. They are believed to be intermediate products in the catabolism of galactolipids to sucrose, mostly accumulating in leaves during the day[41]. While their role in lipid metabolism is largely unknown, their elevated levels in high-sugar grasses provides sufficient context to pursue this result further. Few studies have investigated the impact of changes in primary metabolism on the fluxes into secondary metabolism[42,43]. Non-targeted metabolomics, through a wide coverage of primary and secondary metabolites as demonstrated in this study, is well positioned to elucidate global metabolic changes, and thereby the cross-linkages between primary and secondary metabolism. Combined with other –omics data, metabolomics provides a robust platform for biological interpretation. In spite of these inherent advantages, non-targeted studies are limited by the bottleneck of compound identification, resulting in compounds predominantly identified with low to medium-level confidence[44]. Taking cues from non-targeted studies and proceeding towards unequivocal identification using targeted approaches is therefore recommended. Here, a snapshot of metabolic activity in the high-sugar group (Fig. 5) revealed an increase in concentrations of compounds involved in primary metabolism, mainly comprising sugars, a decrease in concentrations of compounds related to metabolism of the amino acids cysteine, methionine, serine, threonine, arginine and proline, increase in concentrations of compounds involved in the biosynthesis of aromatic and branched chain amino acids, and a concomitant decrease in compounds involved in secondary metabolism, mainly comprising flavonoids and lignins. Plant secondary metabolites are of interest due to their bewildering diversity, and their roles in defence, stress response, UV protection, allelopathy and signalling[45]. The link between carbohydrate and secondary metabolism is primarily mediated by the shikimate pathway, through synthesis of aromatic amino acids, leading into the phenylpropanoid pathway[46,47]. Quinic acid, a constituent of the phenylpropanoid pathway, is a precursor of chlorogenic acid[47], and also involved in the biosynthesis of lignins and flavonols. Quinic acid was found at higher concentrations in the high-sugar group, and on the other hand, coniferyl aldehyde and sinapoyl aldehyde (lignin subunits[48]), were found in lower concentrations (Table 1). Chlorogenic acid concentrations were not significantly different between the high- and low-sugar groups. In sorghum (Sorghum bicolor) plants with high biomass, quinic acid and lignin contents were high, whereas in the high-sugar ryegrass cultivar ‘Aberdove’, the concentration of chlorogenic acid was high[49]. A partitioning of quinic acid towards biosynthesis of either lignin or chlorogenic acid was therefore hypothesised[47]. However, the preferential flux towards either of these compounds is unknown. Grass-fibre composition is another key target of breeding which affects feed intake and digestibility[50]. This is largely determined by cell wall components, and lignins, complex polyphenolic polymers and end products of the phenylpropanoid pathway are amongst the major cell wall constituents[51]. The shikimate pathway described here in the context of total sugar content, should persuade further studies on secondary metabolism in ryegrass. In addition to the shikimate pathway, the interface between primary and secondary metabolism in plants is far more complex, with the production of secondary metabolites associated with glycolysis, TCA cycle, aliphatic amino acids, pentose phosphate pathway[52] and nitrogen status[53]. Towards the primary objective of screening for high-sugar cultivars, we set-off with a survey of 715 ryegrass genotypes based on their fructan content. However, breeding for select traits requires a thorough understanding of the population wide diversity of these traits, and their genetic control mechanisms[21]. Non-targeted metabolomics studies, as employed here, delivered the broadest coverage of metabolites, thereby enabling a better understanding of the underlying biochemical mechanisms. A snapshot of metabolites and the corresponding modules/pathways they represent in the high-sugar group (Fig. 5), established key insights on primary and secondary metabolism, that merit further investigation. This study therefore signifies one of many avenues that can be explored with these data. For example, the low-sugar group indirectly led to genotypes with greater lipid content, thereby creating opportunities to tap into lipid content through breeding applications. Likewise, the flux from primary to secondary metabolism, as reported here, may facilitate breeding strategies specifically targeting select secondary metabolites. Studies exploring these avenues in secondary metabolites are already underway[20,54]. In conclusion, we have established levers that help explain biochemical activity in ryegrass, which when operated towards specific objectives can deliver desired traits. The raw files from this study, maintained at MetaboLights database, we envisage will cater to further explorations towards these objectives.

Methods

Experimental

Plant material: Ryegrass seeds were obtained from the Margot Forde Germplasm Centre at AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand. Seeds of 724 genotypes (denoted by a specific code, e.g., PG0698) which included NZ cultivars (82), natural population/ecotypes (49), NZ elite breeding populations (336), overseas cultivars (181), enhanced germplasm (71) and unknown cultivars (5) were selected. Five clonal replicates (n = 5; denoted as PG0698_1 for the first replicate) per genotype were cultivated and used for metabolomics studies. Plant trial conditions are described in Supplementary Methods. Perennial ryegrass forms a natural symbiotic association with epichloae fungi (Epichloë spp.)[50], which produce metabolites that protect the host from biotic and abiotic stresses[55]. It has been shown that infection with these endophytes affects the metabolic profiles of ryegrass[49,56]. To ensure that the ryegrass metabolome alone was analysed and reported, endophyte-free plants were used. This was achieved by heat and fungicide treatment of seed to kill the fungus, prior to germinating and growing the plants. After growing out and prior to clonal replication, the endophyte-free status of plants was assessed by an immunoblot analysis[57] of 10 tillers per plant. Subsequently, the presence of peramine, a pyrrolopyrazine alkaloid[58] specific to most endophytes, was also monitored. Nine genotypes that showed presence of peramine were identified and excluded (Supplementary Figure 9). This resulted in 715 genotypes from 118 populations for subsequent analyses. Here, a genotype refers to a group of plants with the same genetic makeup, while a population consists of one or more genotypes grouped together based on a common trait (genotypic/phenotypic/geographical origin). For example, PG0698 refers to the genotype in a population named ‘Hillary’, a NZ based cultivar (Supplementary Table 1). Extractions: At 60 days after transplanting, leaves were harvested during late autumn (May 2012), snap-frozen in liquid nitrogen, freeze-dried, ground to a coarse powder, transferred to glass vials and stored at −80 °C until further use. Five aliquots of 50 mg each (4 for analyses and 1 standby) of each sample were weighed into microcentrifuge tubes. A surrogate QC sample[59] comprising random amounts of ground ryegrass leaf material, irrespective of the presence or absence of endophyte was used. Since this QC sample was not representative of the sample set, it was solely used for monitoring sample degradation and for tracking run-order effects within a batch and not used for batch normalisations or ensuing data processing. Polar and semi-polar compounds. A single aliquot (50 mg) was used for analysing both polar and semi-polar compounds. One millilitre (1 ml) of extraction solvent comprising acetonitrile:water containing 0.1% formic acid (50:50, v/v), and 2′,7′-Dichlorofluorescein (1 µg/ml; CAS No. 76-54-0; MW = 401.2) and l-Tyrosine-3, 3-d2 (10 µg/ml; CAS No. 72963-72-0; MW = 183.20) as internal standards for semi-polar and polar compounds respectively, were added to the aliquot. A ceramic bead was added and samples were mixed in a bead-mill homogeniser (TissueLyser II; Qiagen, Valencia, CA, USA) for 3 min, after which they were centrifuged for 6 min (18188g). Approximately 500 µl of the aqueous extract was transferred to an autosampler vial and stored at −20 °C until analysis. Lipids. For lipids, all procedures were similar to that of polar and semi-polar compounds, except that the extraction solvent was made of damp chloroform:methanol (67:33, v/v), containing 2′,7′-Dichlorofluorescein (20 µg/ml) as an internal standard. Fructans. Boiling water (1.5 ml) was added to another aliquot (50 mg). After homogenisation, samples were placed in a hot water bath (90 °C) for 30 min. Samples were cooled to room temperature, centrifuged, and approximately 500 µl of the aqueous extract was transferred to an autosampler vial for analysis. Samples were stored (−20 °C) for a minimal period to avoid fructan degradation. Fatty acid methyl esters (FAMEs). After extraction of fructans, the remaining supernatant was discarded and the pellet was freeze-dried overnight. The dried aliquot was transferred to a 15 ml Falcon tube containing 50 µl of pentadecanoic acid (C15:0; CAS No. 1002-84-2; MW = 242.4) in heptane (4 mg/ml) as an internal standard. After the addition of 1 ml of 1 M methanolic HCl reagent[60], the tubes were purged with nitrogen, sealed and heated in a hot water bath (80 °C) for 1 h. Tubes were then cooled to room temperature, and 50 µl of heptadecanoic acid methyl ester (Premethyl ester C17:0; CAS No. 1731-92-6; MW = 284.48) in heptane (4 mg/ml) was added as another internal standard. A total of 600 µl of heptane and 1 ml of 0.9% sodium chloride (w/v; NaCl) were added and the tubes were manually shaken to extract FAMEs into the heptane phase. Totally, 150 µl of the heptane layer was transferred to a 250 µl glass insert fitted in an autosampler vial for GC–MS analysis. Chromatography and tandem mass-spectrometry: (U)HPLC-MS. (U)HPLC-MS conditions for analysis of polar and semi-polar compounds using ZIC-pHILIC[61] and C18[62] columns, were as described in respective references. An Exactive Orbitrap (Thermo Fisher Scientific, Waltham, MA, USA) mass spectrometer with electrospray ionisation was used for analyses in both positive and negative modes. Fructans were analysed as described by Harrison, et al.[63] using a porous graphitised carbon column and LTQ linear ion trap mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) with electrospray ionisation in negative mode. The Thermo LC–MS system (Thermo Fisher Scientific, Waltham, MA, USA) for analysis of lipids consisted of an Accela 1250 quaternary UHPLC pump, a PAL auto-sampler fitted with a 15,000 psi injection valve (CTC Analytics AG., Zwingen, Switzerland) a 20 μl injection loop, and a Q-Exactive OrbitrapTM mass spectrometer with electrospray ionisation. A C1 column (50 × 2.1 mm, 5 µm; Thermo Fisher Scientific, Waltham, MA, USA), maintained at 25 °C with a gradient elution programme and a flow rate of 500 µl/min, was used for chromatographic separation. The mobile phase comprised water containing 0.1% formic acid (Solvent A) and isopropanol:acetonitrile containing 0.1% formic acid (50:50, v/v; Solvent B). The gradient was set to hold solvent A at 80% from 0 to 1 mins, gradually decline to 0% at 18.1 mins, maintained at the same level up to 20 mins, increased to 80% at 20.1 mins and finally allowed to equilibrate as such for the rest of the programme, i.e., 25 min. The samples were cooled in the auto-sampler at 4 °C and the injection volume of each sample was 2 μl. The first 1.5 min and the last 5 min of the chromatogram were diverted to waste. Both full and data dependent MS2 scans were collected in profile data acquisition mode. For full scan mode, a mass resolution setting of 35,000 was set to record a mass range of m/z 200–2000 with a maximum trap fill time of 250 ms. For MS2 scan mode, the same mass resolution setting was maintained with a maximum trap fill time of 120 ms. The isolation window of selected MS1 scans was ± 1.5 m/z with a collision energy of 30 eV. Samples were run in both positive and negative ionisation modes separately. Positive-ion mode parameters were as follows: spray voltage, 4.0 kV; capillary temperature, 275 °C; capillary voltage, 90 V, tube lens 120 V. Negative-ion mode parameters were as follows: spray voltage, −4.0 kV; capillary temperature, 275 °C; capillary voltage, −90 V, tube lens −100 V. The nitrogen source gas desolvation settings were the same for both modes (arbitrary units): sheath gas, 40; auxiliary gas, 10; sweep gas, 5. The Xcalibur software package provided by the manufacturer was used to create these settings. GC–MS. Analysis of FAMEs was undertaken using a Thermo DSQ II Trace Ultra gas chromatograph (Thermo Fisher Scientific, Waltham, MA, USA) fitted with a DB5 GC capillary column. GC–MS conditions were as described by Browse et al.[60]. Standards and reagents: All standards were purchased from Sigma–Aldrich Chemicals Co. (St. Louis, MO). Ultrapure water was obtained from a Milli-Q system (Millipore, Bedford, MA). All solvents used were of Optima LC–MS grade and were purchased from Thermo Fisher Scientific (Auckland, New Zealand).

Data analysis

Sample sequence and batches: For each stream of analysis, samples (3575) were systematically randomised across 36 batches, making sure that two clonal replicates of a genotype were not present in the same batch. A single batch comprised approximately 100 samples interspersed with a QC sample for every 10 samples (10–12 QC samples per batch). The sequence comprised blanks, QC and samples run in positive mode, followed by the same order in negative mode. Fructans and FAMEs were analysed in negative-[63] and positive- ionisation modes alone, respectively. QC monitoring and troubleshooting: For each batch, the quality of runs was determined by constantly monitoring the respective internal standard in QC samples for: (1) consistency in retention time (±0.5 min), (2) mass accuracy (±5 ppm) and (3) signal intensity. In the instance of constant drift in one of these parameters, the batch was immediately stopped and re-run after recalibrating the mass spectrometer. After completion of the run sequence, samples of each batch were retained at −20 °C until another QC check (based on PCA) was conducted. PCA of samples classified based on run-order within each batch was used to reveal any significant run-order effects. Where a significant run-order effect was apparent, the batches were re-run. Otherwise, batches were passed on to a super batch (a collection of batches that have passed the QC tests), for further processing. Score plots of PCA provide a simple and quick qualitative assessment of variability within a sample set[64]. Fructans: Fructan identification and DP measurements were based on XCMS[65] (Supplementary Table 2) and an in-house R script. Essentially, a target list from DP2 to DP20, denoting the parent mass, dimer and commonly formed adducts (formic acid and chlorine) for each DP, with corresponding retention times, was created based on respective extracted ion chromatograms. The script was used for (1) peak detection, (2) peak grouping, (3) retention time correction, (4) peak filling, (5) normalisation of run-order within each batch[20], (6) normalisation of batch-effects using comBat[66], (7) matching the resultant data matrix with the target list and (8) creating a table with peak intensities for each DP, including the molecular ion, dimer and adduct masses, across all samples. Peak intensities of all ions representing each DP were summed, and the sum was multiplied by the number of hexose units, to establish a common baseline for comparisons. An exemplar is presented in the case of DP3 (Supplementary Table 3). As a measure of the total sugar content, fructans in the low (DP3, 4 and 5), mid (DP10, 11 and 12) and high (DP18, 19 and 20) DP ranges were added. To delineate maximum separation between high- and low-sugar groups and maintain consistency of results, a two-tier criterion was established. First, the 715 genotypes (n = 5) were ranked as top 10% and bottom 10% based on total sugar content, and second, the full sample set (3575) was ranked as top 10% and bottom 10% based on total sugar content. Within the top 10% and bottom 10% of genotypes identified, only genotypes with three or more (out of five) clonal replicates were selected for comparisons. This two-tier criterion enabled identification of genotypes with minimum variation. Raw files corresponding to these high- (n = 133) and low-sugar (n = 106) samples from other analytical streams (polar, semi-polar, FAMEs and lipids) were collated for further comparative analysis. Lipids, polar and semi-polar compounds: XCMS[65] (Supplementary Table 2) and in-house R scripts with appropriate parameters for UHPLC (C18) and HPLC (Lipids and HILIC) settings were used for: (1) peak detection using centwave; (2) grouping; (3) retention time alignment using obiwarp; (4) peak re-grouping and (5) filling of missing peaks. The resultant data matrices were cleaned and post-processed by using: (1) diffreport function of XCMS to generate extracted ion chromatograms of all identified peaks, and eliminating the ones that represented background noise, and; (2) CAMERA[67] to identify and eliminate isotopes (HILIC and C18) and (3) normalisation for batch-effects using comBat[66]. The final data matrices of metabolic features were used for local library matching and statistical analysis. Metabolic features, defined here as molecular entities or ion types with a unique mass-to-charge ratio (m/z) and retention time. LipidSearch and local library matching: Lipid identification was performed using LipidSearch software (Thermo Fisher Scientific, USA)[68]. Raw lipid files corresponding to high- and low-sugar groups were uploaded to LipidSearch separately for positive and negative ionisation modes. Product ion search on Q-ExactiveTM data was selected with a mass tolerance of 6 ppm for precursor ions and 10 ppm for product ions, along with a selection of lipid classes (Supplementary Table 4). Lipid species/classes identified for each file were then merged based on retention time alignments and a single file with the merged results, one each for each ionisation mode, was generated. This library of identified lipids was matched against the final data matrices based on parent mass and retention times. Peak intensities generated by XCMS settings for the identified lipids were ultimately used for further statistical analyses. The lipidomics results shown here were obtained by a low level data fusion[25] of the positive and negative ionisation modes by horizontal concatenation, and taking an average of the different lipid ions that were detected for a specific lipid class. For example, an average of the peak intensities of lysophosphatidylethanolamine LPE(16:0) − H, LPE(18:3) − H, LPE(18:2) − H, LPE(16:0) + H, LPE(18:3) + H and LPE(18:2) + H was taken to obtain the overall peak intensity of the LPE class. Authentic standards, mostly plant based, from the AgResearch chemical inventory were run through the HILIC and C18 streams, under conditions identical to the current study. Parent masses in respective ionisation modes and retention times were listed in a table, and this library of authentic standards was matched against the corresponding final data matrices with tolerances of 5 ppm for parent mass and 3 s for retention time, to identify any hits. The libraries had 297, 170, 142 and 227 parent masses to match in C18 positive and negative, and HILIC positive- and negative-ionisation modes, and predominantly contained amino acids, secondary metabolites and their derivatives in C18, and amino acids, sugars, organic acids and their derivatives in HILIC streams, respectively. Statistical analysis: Statistical analyses were performed using MetaboAnalyst ver 3.0, an online metabolomics analysis suite[69]. A data scaling procedure (auto-scaling) was carried out, where the data were normalised (mean-centred and divided by standard deviation of each variable) so that the features (peak intensities) are comparable. Univariate and multivariate data analyses[70] were conducted. For univariate analysis using t tests, features detected as significant with a false discovery rate cut-off of p < 0.05 were evaluated. For the multivariate approach, PCA was used to interrogate the data. Score plots which display the distribution of each sample along the composite variables of the score plot graph are presented. Minitab 18 (Minitab Inc., USA) software was used to generate the interval plot (Fig. 1), cloud plot (Fig. 3), and boxplots (Fig. 2a, Fig.4); MS Excel (Microsoft Inc., USA) was used to generate Fig. 2c; and an online correlation analysis tool (www.sthda.com) was used to generate Fig. 2b. Pathway analysis and compound identification: While eliminating redundant signals is vital for the identification of biomarkers, retaining these signals will facilitate network predictions[25]. Therefore, the raw data matrix obtained from the diffreport function (high- vs. low-sugar groups) of XCMS was used for pathway analysis. Network predictions/pathway analysis was performed using Mummichog[71], a programme that combines metabolite prediction and network analysis in one step[72,73]. In contrast to the traditional approach, Mummichog first populates related spectral features to a network, with the hypothesis that if features reflect biological activity, then the metabolites they represent must exhibit enrichment in the local network. Metabolite identifications along with enriched pathway modules impart a broad understanding of tentative mechanisms, and provides scope for further probing. Input files for Mummichog from the HILIC and C18 streams comprised m/z values, retention time, p values and fold-change values. Due to its genetic synteny with ryegrass[74], metabolic networks/pathways of barley (Hordeum vulgare) from the Plant Metabolic Network (PMN), www.plantcyc.org, December, 2016, were used as a reference. Appropriate parameters were selected for the ionisation mode, instrument (Orbitrap) and mandatory identification of parent masses, while other parameters were set as default. Significant nodes of all four streams from respective activity networks were collated, and m/z features showing fold-change greater than 1.5 alone were selected. Features that matched authentic standards in the local library, and manually identified compounds that had fold-change greater than 1.5, were also added to this list. KEGG IDs for these shortlisted features were uploaded to KEGG Mapper[75], http://www.kegg.jp/kegg/tool/map_pathway2.html, December, 2016. Modules directly involving the identified compounds with reference to rice (Oryza sativa) pathways, were displayed. In addition to compounds identified with level 1 confidence[44] by matching with a local library of authentic standards, de novo compound identification with level 2 confidence was performed as demonstrated by Subbaraj et al.[76]. Compounds identified by Mummichog were only given a level 3 status, a level that implies compound class. Pathway visualisation using MapMan: MapMan uses a hierarchical ontology approach to visualise pathways and their corresponding modules in a functional context[77]. A customised map, based on the module/pathway data generated by KEGG Mapper, was built for the current study. The experimental data file comprised the compounds identified in Table 1, along with the respective t stat values. KEGG Mapper results were sorted into appropriate bins in accordance with MapMan syntax (Table 2) and added as the mapping file. Finally, a custom made picture (.bmp) that accommodates all the identified pathways and modules was added to the pathways folder. The ImageAnnotator module of MapMan uses mapping files as its data source, and then paints the input experimental data to the custom made images/maps according to the hierarchical structure of the mapping files[77]. A summarised overview of data analysis is presented in Fig. 6.
Fig. 6

Summarised overview of data analysis from raw files of fructan data through to total sugar content, classification of high- and low-sugar groups, and subsequent analysis of fatty acid methyl esters (FAMEs), lipids, polar and semi-polar compounds

Summarised overview of data analysis from raw files of fructan data through to total sugar content, classification of high- and low-sugar groups, and subsequent analysis of fatty acid methyl esters (FAMEs), lipids, polar and semi-polar compounds Description of Supplementary Data
  43 in total

1.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

2.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets.

Authors:  Carsten Kuhl; Ralf Tautenhahn; Christoph Böttcher; Tony R Larson; Steffen Neumann
Journal:  Anal Chem       Date:  2011-12-12       Impact factor: 6.986

Review 3.  Sugar sensing and signaling in plants: conserved and novel mechanisms.

Authors:  Filip Rolland; Elena Baena-Gonzalez; Jen Sheen
Journal:  Annu Rev Plant Biol       Date:  2006       Impact factor: 26.379

Review 4.  The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans.

Authors:  Warwick B Dunn; Ian D Wilson; Andrew W Nicholls; David Broadhurst
Journal:  Bioanalysis       Date:  2012-09       Impact factor: 2.681

5.  Responses of primary and secondary metabolism to sugar accumulation revealed by microarray expression analysis of the Arabidopsis mutant, pho3.

Authors:  Julie C Lloyd; Oksana V Zakhleniuk
Journal:  J Exp Bot       Date:  2004-05-07       Impact factor: 6.992

6.  Fatty acid composition of leaf lipids determined after combined digestion and fatty acid methyl ester formation from fresh tissue.

Authors:  J Browse; P J McCourt; C R Somerville
Journal:  Anal Biochem       Date:  1986-01       Impact factor: 3.365

7.  A QTL analysis of host plant effects on fungal endophyte biomass and alkaloid expression in perennial ryegrass.

Authors:  Marty J Faville; Lyn Briggs; Mingshu Cao; Albert Koulman; M Z Zulfi Jahufer; John Koolaard; David E Hume
Journal:  Mol Breed       Date:  2015-07-18       Impact factor: 2.589

8.  MetaboAnalyst 3.0--making metabolomics more meaningful.

Authors:  Jianguo Xia; Igor V Sinelnikov; Beomsoo Han; David S Wishart
Journal:  Nucleic Acids Res       Date:  2015-04-20       Impact factor: 16.971

Review 9.  Bioinformatics: the next frontier of metabolomics.

Authors:  Caroline H Johnson; Julijana Ivanisevic; H Paul Benton; Gary Siuzdak
Journal:  Anal Chem       Date:  2014-11-20       Impact factor: 6.986

10.  Low pyrrolizidine alkaloid levels in perennial ryegrass is associated with the absence of a homospermidine synthase gene.

Authors:  Geoffrey P Gill; Catherine J Bryant; Mikhail Fokin; Jan Huege; Karl Fraser; Chris Jones; Mingshu Cao; Marty J Faville
Journal:  BMC Plant Biol       Date:  2018-04-06       Impact factor: 4.215

View more
  4 in total

1.  Predicting the quality of ryegrass using hyperspectral imaging.

Authors:  Paul R Shorten; Shane R Leath; Jana Schmidt; Kioumars Ghamkhar
Journal:  Plant Methods       Date:  2019-06-06       Impact factor: 4.993

2.  DBnorm as an R package for the comparison and selection of appropriate statistical methods for batch effect correction in metabolomic studies.

Authors:  Nasim Bararpour; Federica Gilardi; Cristian Carmeli; Jonathan Sidibe; Julijana Ivanisevic; Tiziana Caputo; Marc Augsburger; Silke Grabherr; Béatrice Desvergne; Nicolas Guex; Murielle Bochud; Aurelien Thomas
Journal:  Sci Rep       Date:  2021-03-11       Impact factor: 4.379

3.  Development of RIKEN Plant Metabolome MetaDatabase.

Authors:  Atsushi Fukushima; Mikiko Takahashi; Hideki Nagasaki; Yusuke Aono; Makoto Kobayashi; Miyako Kusano; Kazuki Saito; Norio Kobayashi; Masanori Arita
Journal:  Plant Cell Physiol       Date:  2022-03-11       Impact factor: 4.927

4.  Untargeted Multimodal Metabolomics Investigation of the Haemonchus contortus Exsheathment Secretome.

Authors:  Nikola Palevich; Paul H Maclean; Paul M Candy; Wendy Taylor; Ivona Mladineo; Mingshu Cao
Journal:  Cells       Date:  2022-08-15       Impact factor: 7.666

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.