Literature DB >> 34327326

Exploring the genic resources underlying metabolites through mGWAS and mQTL in wheat: From large-scale gene identification and pathway elucidation to crop improvement.

Jie Chen^1,2, Mingyun Xue^1,2, Hongbo Liu¹, Alisdair R Fernie³, Wei Chen^1,2.

Abstract

Common wheat (Triticum aestivum L.) is a leading cereal crop, but has lagged behind with respect to the interpretation of the molecular mechanisms of phenotypes compared with other major cereal crops such as rice and maize. The recently available genome sequence of wheat affords the pre-requisite information for efficiently exploiting the potential molecular resources for decoding the genetic architecture of complex traits and identifying valuable breeding targets. Meanwhile, the successful application of metabolomics as an emergent large-scale profiling methodology in several species has demonstrated this approach to be accessible for reaching the above goals. One such productive avenue is combining metabolomics approaches with genetic designs. However, this trial is not as widespread as that for sequencing technologies, especially when the acquisition, understanding, and application of metabolic approaches in wheat populations remain more difficult and even arguably underutilized. In this review, we briefly introduce the techniques used in the acquisition of metabolomics data and their utility in large-scale identification of functional candidate genes. Considerable progress has been made in delivering improved varieties, suggesting that the inclusion of information concerning these metabolites and genes and metabolic pathways enables a more explicit understanding of phenotypic traits and, as such, this procedure could serve as an -omics-informed roadmap for executing similar improvement strategies in wheat and other species.

Entities: Chemical

Keywords: mGWAS; mQTL; metabolomics; pathway elucidation; wheat

Mesh：

Substances：
Plant Proteins

Year: 2021 PMID： 34327326 PMCID： PMC8299079 DOI： 10.1016/j.xplc.2021.100216

Source DB: PubMed Journal: Plant Commun ISSN： 2590-3462

Introduction

Metabolomics is an -omics tool that emerged following several other -omics, such as genomics, transcriptomics, and proteomics. This -omics approach has been followed for more than 20 years (Alseekh and Fernie, 2018), and techniques have become mainstream and protocols feasible even at a commercial level. Therefore, we will not detail the basic protocols or techniques, since they have been extensively described previously (Lisec et al., 2006; Vos et al., 2007; Kruger et al., 2008; Cajka and Fiehn, 2014). Common wheat (Triticum aestivum L.), also known as bread wheat, is a leading cereal crop ultimately accounting for approximately 20% of the calories consumed by humans (He et al., 2018). However, the genomic reference for this hexaploid crop has only recently become available (IWGSC, 2018), a fact that has significantly constrained fundamental research in this species. For example, it took several years from mapping the wheat vernalization gene VRN1 onto a chromosomal interval (Galiba et al., 1995; Dubcovsky et al., 1998; Barrett et al., 2002; Iwaki et al., 2002) to finally obtaining the gene by positional cloning (Yan et al., 2003). Benefiting from advanced technologies, multiple -omics applications, including genomics, transcriptomics, proteomics, and metabolomics (Biyiklioglu et al., 2018; IWGSC, 2018; Xiang et al., 2019; Shi et al., 2020c; Ma et al., 2020), are available in the -omics toolbox to interpret the molecular alterations, and to contribute the genic resources, of wheat plants encountering various environmental conditions. Among these approaches, metabolomics was considered to be the bridge between genotypes and phenotypes (Fiehn, 2002) and has been widely applied to profile numerous wheat samples (Saia et al., 2019). However, early reports largely depicted metabolic adjustments rather than effectively pinpointing the underlying responsible genes, as has long been achieved for other crops (Albinsky et al., 2010; Gong et al., 2013; Wen et al., 2014; Chen et al., 2018). One way to accomplish this at large scale would be the combination of metabolomics and genetics, providing numerous candidate loci that may affect the relative contents of metabolites, and the potential responsible genes have thus been postulated and validated (Fernie and Tohge, 2017). In this review, we intend to introduce this predominant application of metabolomics in common wheat. The subsequently unveiled candidate genes and delineated metabolic pathways will help us to achieve the ultimate goal of wheat crop improvement.

The application of metabolomics in wheat should deliver the swift detection of a vast category of metabolites

The term “metabolomics” was coined at the end of the last century in parallel to the terms like “genomics” and “transcriptomics” (Oliver, 1998), and was subsequently defined as the investigation of all metabolites in an organism that requires simultaneous measurement of all metabolites in a given biological system (Dixon and Strack, 2003). However, it has somehow been estimated that there are in excess of 200 000 metabolites in the plant kingdom (Windsor et al., 2005), with some recent estimates suggesting that the number is close to a million (Wang et al., 2019). These metabolites are, however, distributed in a species-specific manner, with any given species predicted to contain over 5000 metabolites (Fernie et al., 2004). The uncertainty concerning specificity, numbers, and contents of metabolites within a given species/organism at a certain developmental stage renders comprehensive detection and identification a tough task. For instance, it has long been recognized that glucosinolates were mainly found in the Brassicaceae species (or the “cabbage clade” of Brassicales) (Halkier and Du, 1997); however, the total number of glucosinolate metabolites remains uncertain (Blažević et al., 2020). Likewise, the varied chemo-decorative forms of benzoxazinoids, which represent an important class of plant defense metabolites that are distributed mainly in the Poaceae species, including wheat, maize, and rye, require extensive efforts to unveil their metabolic routes within these species (Bruijn et al., 2018). Ideally, metabolomics should be rapid, unbiased, and comprehensive to fulfill the complete detection and identification of the metabolome from a given species. Regrettably, no single current analytical technique meets all these criteria at the same time. Of the two main technological platforms, NMR-based methodologies require the metabolites to be highly abundant or, alternatively, to be extracted from a large amount of tissue samples (Alseekh and Fernie, 2018). By contrast, mass spectrometry (MS)-based techniques can more sensitively detect the analytes. In addition, it can be coupled to either gas chromatography (GC) or liquid chromatography (LC), adding another dimension to differentiating the metabolite signals (Alseekh and Fernie, 2018). By employing the above-mentioned technological platforms, one could acquire hundreds of metabolic analytes from wheat samples using the NMR-based (Byeon et al., 2020) or MS-based techniques (Ameye et al., 2020; Wang et al., 2020). Although this number is far from covering the whole metabolome of a given species (hundreds detected versus thousands predicted), this is likely reflective of what can be currently achieved in wheat. Indeed, numerous experiments have been conducted to profile the wheat metabolic landscape in recent years (for review, see Saia et al., 2019); these surveys may be inspired by pre-existing metabolomics reports. For instance, the Arabidopsis case revealed the detailed gene clusters responsible for root-specific metabolic pathways, wherein the specific metabolites involved selectively influence the root rhizosphere microbes (Huang et al., 2019). Following this clue, the interactions of metabolites in the wheat root rhizosphere exudates with soil microbes were respectively inspected using an LC–MS-based technique (Rieusset et al., 2021) and GC–MS platform (Iannucci et al., 2021). Although these wheat metabolomics examinations seldom provide candidate genes for swift functional validation, as commented elsewhere (Saia et al., 2019), recognizing the metabolic profiles of wheat samples under various environmental conditions or developmental stages provides a requisite knowledge for parallel and in-depth experimental designs. In the current review, we will not simply compile the wheat metabolomics reports to date, since this has been recently accomplished (Saia et al., 2019). Instead, we intend to probe how the underlying genes responsible for the content variation of metabolites were unveiled, and introduce some details of efficiently revealing the gene–metabolite linkage by combining the metabolomics approaches with genetic designs.

Metabolite genome-wide association studies and metabolite quantitative trait loci are powerful ways to dissect the genetic architectures underlying a highly varied metabolome

Plant metabolites are indispensable for the plant itself, both for supporting growth and for its interactions with the surrounding environment (Saito and Matsuda, 2010; Huang et al., 2019). Some of these metabolites are also necessary for humans, as they constitute our nutritional supply (Luca et al., 2012; Martin and Li, 2017). Hence, it is of vital importance to elucidate the functional genes associated with plant metabolism. In comparison with the complexity of phenotyping only a small number of field phenotypes (Yang et al., 2020), one can easily profile the relative contents of multiple metabolites, which may directly or indirectly relate to the phenotypes of interest. Therefore, identifying candidate functional genes for content variation of metabolites would lead us to a better understanding of important crop phenotypes. For instance, the metabolite trigonelline (N-methyl nicotinic acid) and the biosynthetic gene underlying its production were found to be responsible for rice grain width (Chen et al., 2016). Similarly, in tomato, a key flavonoid-regulating MYB transcription factor that is responsible for the fruit color (Adato et al., 2009) was found to be indirectly selected during the tomato improvement process (Zhu et al., 2018). A wheat MYB transcription factor not only positively regulates the flavonoid pathway that forms red kernels (Himi and Noda, 2005), but also positively elevates the relative content of abscisic acid that normally suppresses seed germination (Ju et al., 2019). The work of Lang et al. (2021) added an explanation for the long-questioned association between red-kernel and pre-harvest sprouting traits of wheat and in doing so provided a new potential target for breeding. To unveil the candidate genes underlying metabolic routes, a reverse genetic strategy via homologous sequence alignment would be a straightforward scheme. An example is that of the benzoxazinoids, a class of plant defense metabolites that are distributed mainly in the Poaceae species. Several Bx genes responsible for the biosynthesis of benzoxazinoids have been cloned in wheat (Nomura et al., 2002, 2003; Sue et al., 2006, 2011; Li et al., 2018a), mainly via a homologous sequence alignment strategy against their respective maize orthologs. However, this procedure is not workable when wheat Bx genes have low sequence homology to their maize orthologs (Li et al., 2018a), nor is it likely to be applicable if the Poaceae species have specialized benzoxazinoid metabolites (Bruijn et al., 2018). Meanwhile, the homologous sequence alignment-based procedure would likely cause an unintended increase in labor. For instance, a total of 29 potential SNAT genes from grapevine that contain the key structural domain (Pfam00583) were generated, and of these, the authors tested five, and only one of the gene products was validated to have the designated enzymatic activity (Yu et al., 2019). To this end, we may get an even smaller chance in common wheat, given that it is a hexaploid species harboring an approximately 16 Gb genome. One way to potentially circumvent such an obstacle is the combination of metabolomics and forward genetics designs, as the enormous tested samples may potentially link genic alterations to metabolic variations of interest. Indeed, combinations of genetic approaches such as quantitative trait loci (QTL) and genome-wide association study (GWAS) with metabolomic profiling have been widely applied in order to identify the functional genes underlying the content variation of metabolites (Table 1). These methodologies are termed as mQTL (metabolite QTL) or mGWAS (metabolite GWAS), respectively. mQTL tools were first utilized in tomato (Schauer et al., 2006) and Arabidopsis (Keurentjes et al., 2006). Following these pioneering studies, the mQTL approach was broadly adapted to address the diverse aspects of genetics toward plant metabolomes (Rowe et al., 2008; Gong et al., 2013; Joseph et al., 2015), providing valuable insights into the genetic and biochemical bases of metabolic traits.

Table 1

Recent examples of functional genes identified from mGWAS and mQTL.

Species	Method	Trait	Candidate	Associated metabolites	Reference
Blueberry	mGWAS	volatile organic compounds	linalool synthase and α-terpineol synthase genes	linalool, limonene, and eucalyptol	(Ferrão et al., 2020)
Barley	mGWAS	grain oligosaccharide content	acid β-fructofuranosidase genes	fructan	(Matros et al., 2021)
Arabidopsis	mGWAS	resistance to Ralstonia solanacearum	CYP735A1, CKX2, and CKX4	cytokinin	(Alonso-Díaz et al., 2021)
Maize	mGWAS	salt-induced osmotic stress	ZmCS3, ZmUGT, and ZmCYP709B2	citrate and flavonoids	(Liang et al., 2021)
Tomato	mGWAS	vitamin E content	genes involved in the chorismate–tyrosine pathway	tocochromanols	(Burgos et al., 2021)
Soybean	mQTL	seed oil content	diacylglycerol lipase, phospholipase, and acyl-CoA dehydrogenase genes	fatty acids	(Li et al., 2021b)
Rice	mQTL	insect resistance	Glucosyl-transferase genes	flavonoid O-glucosides	(Yang et al., 2021)
Carrot	mQTL	flavor-associated sabinene	Terpene synthase genes	sabinene, α-thujene, α-terpinene, γ-terpinene, terpinen-4-ol, and 4-carene	(Reichardt et al., 2020)

Recent examples of functional genes identified from mGWAS and mQTL. However, given that linkage mapping populations are usually derived from merely two parents, the results from their study are clearly not scalable to investigate the tremendous chemical and content diversity of metabolites from diverse germplasm resources of a given species. This prompted researchers to utilize mGWAS to probe the genetic contributions to metabolic diversity by simultaneously evaluating vast numbers of accessions. Pioneers in this field inspected the quantitative and qualitative variations in glucosinolates in Arabidopsis (Kliebenstein et al., 2001), finding that variations in 34 glucosinolate metabolites occurred in 39 Arabidopsis ecotypes. A subsequent, follow-up study further examined over 40 glucosinolates in 96 Arabidopsis accessions, revealing additional genes that may control natural variation in these traits (Chan et al., 2011). In recent years, mGWAS methodology has been widely utilized in major crops, including tomato (Tieman et al., 2017), maize (Wen et al., 2014), rice (Matsuda et al., 2015), barley (Zeng et al., 2020), and tea (Zhang et al., 2020). These studies provided numerous candidates underlying substantial taste and nutritional traits.

Annotated genomes, genotyped populations, and requisite knowledge are key elements to identify candidate genes from mGWAS/mQTL output

Technically, the direct output generated from mQTL/mGWAS is merely linkages/associations between chromosomal locations and metabolite contents, and as such, further steps should be executed to possibly identify the potential candidate genes underlying their respective linkages/associations. The first and most facile element for candidate gene identification is the availability of genomic information, which is an essential, if not the only, way to sift through the possible genes included within the genomic interval (in the case of mQTL results) or adjacent to the associated marker loci (in the case of mGWAS results). For example, wheat mQTL surveys have been deployed for several years (Hill et al., 2013, 2015). In these two cases, 205 compounds (93 of which were unknown metabolites) (Hill et al., 2013) and 558 mass features (197 were identified) (Hill et al., 2015) were respectively profiled in 179 doubled haploid lines using GC–MS and LC–MS, respectively. However, these two investigations did not yield any candidate genes regarding the content variations of profiled metabolites, largely due to the absence of reference genomic information, which was publicly available only afterward (IWGSC, 2018). Following these pioneering wheat mQTL studies, a wheat mGWAS revealed possible candidate genes for the involved metabolic traits (Matros et al., 2017), benefiting from partially pre-released wheat genomic data. Recently, a wheat mGWAS successfully identified 26 candidate genes with high confidence, of which candidates the authors enzymatically validated two and accordingly unveiled the flavonoid metabolism pathway of wheat (Chen et al., 2020). In parallel, this team conducted a wheat mQTL investigation (Shi et al., 2020c), yielded and substantiated candidates, and probed the possibility of linkage between the wheat yield and the contents of certain metabolites. The brief history and progress of wheat mGWAS/mQTL studies (Hill et al., 2013, 2015; Matros et al., 2017; Chen et al., 2020; Shi et al., 2020c) clearly revealed the indispensable role of the availability of genomic reference information for assigning possible candidates. However, access to a reference genome does not necessarily render the candidate gene assignment process facile. It is important to note that when we assign potential candidate genes to mQTL/mGWAS outcomes, we are postulating the probable genes encoding products that may enzymatically catalyze, translocate, or in some other way regulate the linked/associated metabolites. For this purpose, genes located within or around the mapped intervals/associated linkage disequilibrium blocks were evaluated, including their annotations and reports regarding their orthologs in other species, along with the chemical structures and pathway architectures of designated metabolites (Chen et al., 2014, 2020). For instance, trigonelline has long been thought to be synthesized by methylation from nicotinic acid (Joshi and Handler, 1960), indicating that the biosynthetic enzyme for this key step is probably a methyl transferase. In a rice mGWAS (Chen et al., 2014), LOC_Os02g57760 was annotated as a methyl-transferase-encoding gene and was located within a linkage disequilibrium block of the leading SNP associated with the content variation of trigonelline. Thereafter, this gene was designated and proved as a candidate for trigonelline (Chen et al., 2014, 2016). Likewise, a flavonoid glycosyl-transferase-encoding gene located within the QTL interval was assigned and enzymatically validated to be the candidate for apigenin 7-O-rutinoside, a glycosyl-decorated flavonoid, in a recent wheat mQTL study (Shi et al., 2020c). Two homoeologous wheat genes (TraesCS4B01G371700 and TraesCS4D01G365800) were selected as candidates behind the accumulation of the metabolite sucrose (Chen et al., 2020). Their ortholog in Arabidopsis AT3G19940, which is also known as STP10, encodes a high-affinity hexose transporter (Rottmann et al., 2016). These examples thus collectively demonstrate that the approach enables us to identify possible candidate genes from mQTL/mGWAS outputs, once we have a reference genome, a genotyped population, and requisite biochemical knowledge concerning the target metabolites. For wheat, the reference genomes of hexaploid wheat (IWGSC, 2018) and its progenitors (Avni et al., 2017; Ling et al., 2018; Maccaferri et al., 2019) have been released; numerous genotyped segregating populations/natural accessions are available (He et al., 2016; Guo et al., 2017; Chen et al., 2019, 2020); and an enormous range of functional genes for diverse metabolites can be postulated via reference to species such as Arabidopsis (http://www.arabidopsis.org/), rice (http://rice.plantbiology.msu.edu), and maize (http://www.maizegdb.org/). Thus, the stage is clearly set to perform mQTL analyses and mGWASs in wheat (Figure 1).

Figure 1

A brief workflow for applying metabolomics in wheat to identify candidate genes on a large scale and to elucidate the metabolic pathway.

First, specific wheat organisms are collected, genotyped, and profiled. Candidate genes responsible for the mGWAS/mQTL assay loci are then identified by combining the necessary information regarding the chemical structures and pathway architectures of associated/linked metabolites and the reported ortholog functions on metabolite analogs. Finally, the metabolic pathways are elucidated by integrating the validated enzymatic functions. Parts of the elements involved are unpublished or adapted from previous studies (Chen et al., 2020; Shi et al., 2020c).

A brief workflow for applying metabolomics in wheat to identify candidate genes on a large scale and to elucidate the metabolic pathway. First, specific wheat organisms are collected, genotyped, and profiled. Candidate genes responsible for the mGWAS/mQTL assay loci are then identified by combining the necessary information regarding the chemical structures and pathway architectures of associated/linked metabolites and the reported ortholog functions on metabolite analogs. Finally, the metabolic pathways are elucidated by integrating the validated enzymatic functions. Parts of the elements involved are unpublished or adapted from previous studies (Chen et al., 2020; Shi et al., 2020c).

Delineating the metabolic pathways is essential to explain or explore the connections between metabolites and the end phenotypes

As stated above, numerous candidate genes underlying the content variation of metabolite abundance could be obtained. This raises the question of how to systemically evaluate these candidates/metabolites and finally link them to end phenotypes. For phenotype-driven studies, on one hand, we need to explain how metabolite contents are linked to the designated traits. For example, copious Arabidopsis colorless seed mutants have been generated, suggesting their seed coat pigment abundance is probably impaired. These mutants were named transparent testa (tt), and the responsible mutated genes (TT genes) were demonstrated to encode candidates in the flavonoid/anthocyanin pathways (Lepiniec et al., 2006). The indication that flavonoids/anthocyanins are one major class of plant chromogenic compounds is probably why researchers tended to conduct targeted metabolomics measurements covering the flavonoids and/or anthocyanins in plant-color-related cases (Shi et al., 2020b; Jiao et al., 2020; Qiu et al., 2020). Subsequently, differential metabolites and corresponding responsible candidate genes were collected to explain the molecular mechanisms of color variation at the metabolomic and, probably, the transcriptomic levels (Li et al., 2020; Zhan et al., 2020). For phenotype-independent metabolomics surveys, on the other hand, we need to establish the pathway through experimental output and further explore the possible biological functions of the metabolites themselves. The establishment of a pathway is essentially achieved in the same manner as that for identifying candidate genes illustrated above. We need to take the existing knowledge of pathway architectures, chemical structures of metabolites, and reported candidate genes into consideration. In a recent wheat mGWAS (Chen et al., 2020), the authors assigned a glycosyl-transferase-coding gene as a candidate gene regulating tricin abundance, and an acyl-transferase-coding gene for tricin O-malonyl hexoside abundance. The pathway was primarily established to be the transformation from tricin to tricin 7-O-glucoside and finally to tricin 7-O-malonyl glucoside, which was sequentially catalyzed by the two designated gene products. The authors also explored the possible involvement of this pathway in lignin formation through the by-product tricin 4′-O-glucoside (Chen et al., 2020). Similarly, Chen et al. (2018) established a flavonoid decoration pathway in a rice mQTL survey and explored the O-methylated apigenin that may, similar to its structural analog sakuranetin (Kodama et al., 1992), confer disease-resistance activity in rice (Chen et al., 2018). Collectively, the metabolic pathways that were established from the specified metabolites and underlying candidates could be used to explain the phenotypic differences, or to explore the potential connections between comprised metabolites and seemingly relevant traits.

Genetic statistical power and metabolic detection capacity need to be improved in future mGWAS/mQTL studies

Although stand-alone mQTL and mGWAS can be powerful tools in the large-scale identification of candidate genes and are almost instantly applicable in wheat, it would be imprudent to neglect their drawbacks or the use of other multi-omics approaches. The main disadvantages of mQTL and mGWAS, at the level of genetic design, are the insufficient variant coverage beyond the two parents from biparental populations (Chen et al., 2018), hidden population structures (Flint-Garcia et al., 2003), and inadequate power to detect rare variations (Breseghello and Sorrells, 2006) within the natural accessions. Accordingly, utilizing multi-parental populations such as the NAM (nested association mapping) population (Tian et al., 2011; Jordan et al., 2018) or the MAGIC (multi-parent advanced generation intercross) population (Dell'Acqua et al., 2015) could compensate for the deficient variant source of biparental populations at some degree. Alternatively, one could combine the mGWAS and mQTL (Shi et al., 2020a) to enable a more powerful dissection of candidate genes. Aside from utilizing varied combinations of populations to maximize the statistical power that can be leveraged in gene identification, the detection capacity of metabolomics tools affords the overall signals to be analyzed. Two strategies are usually available, including the targeted metabolomics that normally detects hundreds of known metabolites and the untargeted metabolomics that can generate more signals, yet most of the mass features cannot be assigned to exact chemical structures (Roepenack-Lahaye et al., 2004). Given that knowledge concerning the chemical skeletons of metabolites and the underlying pathway architectures is indispensable for identifying responsible candidate genes, the low level of identity of the detected analytes leads to a relatively low overall efficacy when it comes to assigning responsible candidates. Meanwhile, the low coverage (hundreds detected versus thousands predicted) of targeted metabolomics over the metabolome of given plant organisms has lessened the efficacy relative to uncovering the metabolic pathways and underlying genes. Given this, we have developed a widely targeted metabolomics method that combines merits from both targeted and untargeted metabolomics (Chen et al., 2013), which has been successfully utilized in numerous species (Chen et al., 2014; Wen et al., 2014; Song et al., 2015; Geng et al., 2016; Li et al., 2017, 2021a; Wang et al., 2017; Ghareeb et al., 2018; Cao et al., 2019; Lee et al., 2020; Liu et al., 2020; Zeng et al., 2020; Fu et al., 2021; Nie et al., 2021). By applying this protocol, we are now able to detect some 5000 metabolites from bread wheat samples within a single run that typically takes 20 min (data not published). However, two major drawbacks are hindering the instant implementation of this methodology to other species. The first would be the nature of the species or organism specificity of metabolites (Fernie et al., 2004). Accordingly, we have constructed the MS2T (MS2 spectral tag) library covering a collection of wheat samples spanning the whole growth period to constitute a maximum metabolic detection capacity, which is time consuming and profession dependent. Misuse of MS2T libraries will likely cause a significant decrease in metabolites detected. The other issue that needs to be improved on is deciphering the chemistry. We are capable of identifying some 2000 metabolites within the above-mentioned wheat MS2T library (data not published), indicating that the remaining 3000 are currently unknown analytes. However, this does not indicate that we should pre-exclude all machine reads that could not be allocated to exact chemical structures in wheat metabolomics studies. Indeed, genetics can be a powerful tool by which to identify unknown metabolites. For example, in a rice mGWAS report (Chen et al., 2016), the authors showed that the identification of unknown metabolites and the assignment of candidate genes could be complementary to one another. The principal logic to instruct this is that the different decorations derived from the same chemical skeletons (tryptamine derivatives in this case) could be present in proportional levels and comprise the same major ion fragments, which enables us to link the unknown metabolites to the known ones. In accordance, we have discovered several sub-networks, including flavonoids, amino acids, nucleotides, and indole skeleton-involved metabolites, in a recent wheat mGWAS (Chen et al., 2020). Further downstream efforts could be paid to these clusters of metabolites in order to fully delineate their positions within metabolic networks.

Breeding wheat to be a better crop assisted by metabolomics approaches as indication, prediction, and navigation tools

Wheat, as an indispensable food crop, is facing the eternal breeding goals concerning yield and quality. However, the conventional breeding processes of wheat have been constrained by linkage drags and relatively low recombination frequencies rendering them time consuming and of low efficiency and predictability (Holland, 2007). Metabolomics has long been regarded as a bridge between genotypes and phenotypes (Fiehn, 2002). One form of the underlying contact is that certain metabolites contribute largely to phenotypes. In this regard a widely accepted concept is the pivotal position of phytohormones in plant growth and in response to stress (Yu et al., 2020), wherein the different decorative forms of phytohormones would lead to role transition (active or not) of these metabolites (Korasick et al., 2013). To a lesser extent, trehalose 6-phosphate has recently been implicated to play roles in a wide range of developmental processes (Figueroa and Lunn, 2016; Li et al., 2019), while a specific glycosyl-decorated flavonoid (kaempferol 3-O-rhamnoside-7-O-rhamnoside) was demonstrated to be an endogenous inhibitor of polar auxin transport, thus greatly affecting the architecture of Arabidopsis (Yin et al., 2013). Furthermore, two independent studies have proposed that glycosylated flavones (Peng et al., 2017) and phenylacylated flavonoids (Tohge et al., 2016) confer UV resistance and associate with the distribution latitudes of rice and Arabidopsis. More recently, a wheat work illustrated how the anthocyanin-related red-kernel trait was linked to the abscisic acid-related pre-harvest sprouting trait (Lang et al., 2021), although the association between these two agronomic traits has long been observed (Groos et al., 2002). At this scope, the pivotal metabolites and the underlying metabolic pathways could be subjected to direct detection and selection at early stages of plant development instead of tedious physiological tests when breeders intend to acquire the related agronomic traits. Theoretically, the metabotype–phenotype ties (Adato et al., 2009; Zhu et al., 2018) enable the prediction of discernable yet complex phenotypes through the numerous and less discernable machine reads, which is bridged by proper modeling of the metabolomics data. For instance, a good performance prediction was revealed following the analysis of the metabolic profiles of maize testcrosses (Riedelsheimer et al., 2012). Similarly, a recent study showed that metabolic profiling of the wheat leaf and spike bracts metabolomes independently revealed that they were good predictors of grain yield (Vergara-Diaz et al., 2020b). Another study showed that remote hyperspectral imaging of wheat allows one to estimate metabolite content, potentially rendering the development of biomarkers even more powerful (Vergara-Diaz et al., 2020a). That said, the prediction power of metabolomics data in wheat is still controversial (Zhao et al., 2015; Shi et al., 2020c) and it is thus perhaps too early to discuss now. Aside from contributing to various agronomic traits, the contents of metabolites themselves could be direct breeding goals (Martin and Li, 2017). One pre-eminent example is the introduction of provitamin A into the carotenoid-free rice endosperm via metabolic engineering (Ye et al., 2000; Paine et al., 2005). Following this, several crop species, such as cassava, maize, and potato, have been targeted for vitamin biofortification (Bai et al., 2011; Sayre et al., 2011). The improvement of vitamin contents has also been a concern in wheat breeding (Table 2). Indeed, for instance, wheat germ oils contain the highest concentration of vitamin E of all species tested (Trela and Szymańska, 2019). Although substantial efforts have been made in elucidating the metabolic pathway of vitamin metabolism in other crops (Zhou et al., 2012; Quadrana et al., 2014; Liu et al., 2017; Wang et al., 2018; Schuy et al., 2019; Zhang et al., 2019), less has been achieved in wheat (Li et al., 2018b; Watkins et al., 2019). In view of this, key candidate genes could be swiftly selected and validated through multi-omics approaches including metabolomics, providing molecular resources for breeding wheat to be a more nutritional food. Alternatively, the introgression of elevated vitamin contents (or other less discernable metabolites as immediate breeding targets) could be directly conducted, without knowing the precise genes or markers in advance (Figure 2). Following this route, combinations of widely planted cultivars as acceptors and high-vitamin accessions as donors are first established. Subsequently, several backcrosses to restore the genetic background and finally self-crossing to gain homozygosity are conducted (Wing et al., 2018). During these steps, metabolic measurements represent a type of functional selection that can be applied to half of the wheat kernels, ensuring that we can maintain the high-vitamin phenotypes. Meanwhile, the counterparts of the wheat kernels are grown for the next phase of breeding. The whole process can be further accelerated by speed breeding techniques (Ghosh et al., 2018), wherein the duration of a generation is merely 3 months rather than a full year. Finally, simultaneous utilization of wheat SNP chips (Cavanagh et al., 2013; Wang et al., 2014; Allen et al., 2016; Boeven et al., 2016; Winfield et al., 2016; Rasheed and Xia, 2019) to genotype the acceptors, donors, and progeny would enable the development of functional markers for the desired metabolic traits, and thereby allow the underlying candidate genes to be subsequently cloned (Figure 2).

Table 2

Key genes underlying the vitamin contents that may subjected to wheat grain nutritional improvement.

Vitamin	Related metabolites	Key candidate genes	Orthologs
VA	β-carotene, provitamin A, β-cryptoxanthin, retinol derivatives	RALDH, retinal dehydrogenase; REH, retinyl ester hydrolase; PSY∗, phytoene synthase	7A1357000; 7B1296800; 7D1285000
VB1	thiamine, thiamin pyrophosphate	TMP-PPase, thiamin-phosphate pyrophosphorylase; THI∗, thiamine thiazole synthase	7A0916800; 7B0760700; 7D0879600
VB2	riboflavin, flavin adenine dinucleotide, flavin mononucleotide	RibA, GTP cyclohydrolase II; RibB, 3,4-dihydroxy-2-butanone 4-phosphate synthase; PyrR∗, pyrimidine reductase	6A0988100; 6B1211700; 6D0873600
VB3	niacin, niacinamide, nicotinamide adenine dinucleotide (phosphate)	NadA, quinolinate synthase; nitrate reductase∗	6A0038200; 6B0056200; 6D0042700
VB5	pantothenic acid, pantetheine, pantethine	PanB, 3-methyl-2-oxobutanoate hydroxymethyltransferase; PanK∗, pantothenate kinase	5A0779300; 5B0811000; 5D0737600
VB6	pyridoxine/pyridoxal/pyridoxamine 5′-phosphate	PDX∗, pyridoxal 5′-phosphate synthase	2A0661600; 2B0746500; 2D0617200
VB7	biotin, biocytin	DTBS, desthiobiotin synthetase; BIO∗, biotin synthase	6A0389100; 6B0491900; 6D0338900
VB9	folic acid, tetrahydrofolic acid derivatives	FPGS, folylpolyglutamate synthetase; DHFR∗, dihydrofolate reductase; DHFS, dihydrofolate synthase; DHPS, dihydropteroate synthase	2A1204700; 2B1374200; 2D1159900
VC	ascorbic acid	PMI, phosphomannose isomerase; Alase, aldonolactonase; GGP∗, GDP-L-galactose phosphorylase	4A0537700; 4B0239000; 4D0202300
VE	tocopherols and tocotrienols	VTE1∗, tocopherol cyclase; VTE2, homogentisate phytyl transferase; VTE3, dimethyl-phytylquinol methyl transferase; VTE4, γ-tocopherol C-methyl transferase	1A0584500; 1B0677200; 1D0555800

The wheat candidate genes are generated by sequence alignment against reported genes marked by asterisks. Wheat gene IDs are abbreviations based on the IWGSC Chinese Spring genome v.2.1 annotation. For instance, 7A1357000 denotes TraesCS03G7A1357000.

Figure 2

Employing metabolomics to improve wheat cultivars.

For less discernable metabolic traits as direct breeding targets (take vitamin E as an example), metabolic approaches are utilized to quantify the desired metabolite using half of the kernel. Simultaneously, the other half is planted and genotyped, producing the next generation. The overall cultivar improvement procedure (Wing et al., 2018) is independent of the pre-requisite perception of candidate genes or the linked molecular markers, and can be accelerated through the speed breeding system (Ghosh et al., 2018).

Key genes underlying the vitamin contents that may subjected to wheat grain nutritional improvement. The wheat candidate genes are generated by sequence alignment against reported genes marked by asterisks. Wheat gene IDs are abbreviations based on the IWGSC Chinese Spring genome v.2.1 annotation. For instance, 7A1357000 denotes TraesCS03G7A1357000. Employing metabolomics to improve wheat cultivars. For less discernable metabolic traits as direct breeding targets (take vitamin E as an example), metabolic approaches are utilized to quantify the desired metabolite using half of the kernel. Simultaneously, the other half is planted and genotyped, producing the next generation. The overall cultivar improvement procedure (Wing et al., 2018) is independent of the pre-requisite perception of candidate genes or the linked molecular markers, and can be accelerated through the speed breeding system (Ghosh et al., 2018).

Conclusions and perspectives

Fundamental wheat research has been considerably boosted since the availability of a reference genomic sequence, which greatly benefited candidate gene identification. In view of the mQTL/mGWAS protocols, the inherent flaws of segregating populations or natural collections require a more comprehensive genetic design, particularly in the case of a species with a complex genome such as bread wheat, as well as a more powerful statistical method, to enable a more accurate detection of inherited loci for metabolites. The demands awaiting improved computational capacity toward large-scale deciphering of the unknown analytes into possible chemical structures (Blaženović et al., 2018; Kind et al., 2018), which are also pivotal for candidate gene identification. Meanwhile, varied wheat samples encompassing different spatiotemporal organisms should be included to cover the tissue-specific metabolites and the underlying candidate genes. Finally, new metabolic technologies such as metabolome-scale labeling (Tsugawa et al., 2019) or single-cell metabolomics (Souza et al., 2020), which has been similarly applied in other -omics fields, will likely improve our understanding of the metabolic pathways of more specialized organisms. Thus, two major challenges lie ahead. The first is the improvement of the coverage of the metabolome. Second, we need to move toward functional metabolomics that delimits the biological roles of the molecules themselves. The framework presented here should allow advances toward both of these. Indeed, collectively, one could anticipate an acceleration in candidate gene identification and comprehensive metabolic pathway construction and that these molecular resources will ultimately assist in wheat crop improvement.

Funding

This work was supported by the (91935304, 31770328, and 32001541), the Huazhong Agricultural University Scientific & Technological Self-Innovation Foundation (2017RC006), the (2018M642866 and 2021T140246), and the (2020CFB149).

134 in total

1. Genetic analysis of the metabolome exemplified using a rice population.

Authors: Liang Gong; Wei Chen; Yanqiang Gao; Xianqing Liu; Hongyan Zhang; Caiguo Xu; Sibin Yu; Qifa Zhang; Jie Luo
Journal: Proc Natl Acad Sci U S A Date: 2013-11-20 Impact factor: 11.205

2. ZmcrtRB3 encodes a carotenoid hydroxylase that affects the accumulation of α-carotene in maize kernel.

Authors: Yi Zhou; Yingjia Han; Zhigang Li; Yang Fu; Zhiyuan Fu; Shutu Xu; Jiansheng Li; Jianbing Yan; Xiaohong Yang
Journal: J Integr Plant Biol Date: 2012-04 Impact factor: 7.061

3. Metabolomic and Transcriptomic Analyses of Anthocyanin Biosynthesis Mechanisms in the Color Mutant Ziziphus jujuba cv. Tailihong.

Authors: Qianqian Shi; Jiangtao Du; Dajun Zhu; Xi Li; Xingang Li
Journal: J Agric Food Chem Date: 2020-12-10 Impact factor: 5.279

Review 4. Auxin biosynthesis and storage forms.

Authors: David A Korasick; Tara A Enders; Lucia C Strader
Journal: J Exp Bot Date: 2013-04-11 Impact factor: 6.992

5. 1H NMR metabolite fingerprinting and metabolomic analysis of perchloric acid extracts from plant tissues.

Authors: Nicholas J Kruger; M Adrian Troncoso-Ponce; R George Ratcliffe
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

6. Metabolome-genome-wide association study dissects genetic architecture for generating natural variation in rice secondary metabolism.

Authors: Fumio Matsuda; Ryo Nakabayashi; Zhigang Yang; Yozo Okazaki; Jun-ichi Yonemaru; Kaworu Ebana; Masahiro Yano; Kazuki Saito
Journal: Plant J Date: 2014-11-03 Impact factor: 6.417

7. Characterization of a recently evolved flavonol-phenylacyltransferase gene provides signatures of natural light selection in Brassicaceae.

Authors: Takayuki Tohge; Regina Wendenburg; Hirofumi Ishihara; Ryo Nakabayashi; Mutsumi Watanabe; Ronan Sulpice; Rainer Hoefgen; Hiromitsu Takayama; Kazuki Saito; Mark Stitt; Alisdair R Fernie
Journal: Nat Commun Date: 2016-08-22 Impact factor: 14.919

8. Metabolome and transcriptome analyses of the molecular mechanisms of flower color mutation in tobacco.

Authors: Fangchan Jiao; Lu Zhao; Xingfu Wu; Zhongbang Song; Yongping Li
Journal: BMC Genomics Date: 2020-09-07 Impact factor: 3.969

4 in total

Review 1. Research Progress and Trends in Metabolomics of Fruit Trees.

Authors: Jing Li; Guohua Yan; Xuwei Duan; Kaichun Zhang; Xiaoming Zhang; Yu Zhou; Chuanbao Wu; Xin Zhang; Shengnan Tan; Xin Hua; Jing Wang
Journal: Front Plant Sci Date: 2022-04-29 Impact factor: 6.627

Review 2. Lipidomics-Assisted GWAS (lGWAS) Approach for Improving High-Temperature Stress Tolerance of Crops.

Authors: Velumani Pranneshraj; Manjeet Kaur Sangha; Ivica Djalovic; Jegor Miladinovic; Maduraimuthu Djanaguiraman
Journal: Int J Mol Sci Date: 2022-08-20 Impact factor: 6.208

3. Pan-transcriptome assembly combined with multiple association analysis provides new insights into the regulatory network of specialized metabolites in the tea plant Camellia sinensis.

Authors: Weilong Kong; Mengwei Jiang; Yibin Wang; Shuai Chen; Shengcheng Zhang; Wenlong Lei; Kun Chai; Pengjie Wang; Renyi Liu; Xingtan Zhang
Journal: Hortic Res Date: 2022-07-02 Impact factor: 7.291

Review 4. Dissection of Crop Metabolome Responses to Nitrogen, Phosphorus, Potassium, and Other Nutrient Deficiencies.

Authors: Yingbin Xue; Shengnan Zhu; Rainer Schultze-Kraft; Guodao Liu; Zhijian Chen
Journal: Int J Mol Sci Date: 2022-08-13 Impact factor: 6.208

4 in total