Literature DB >> 31845400

Metabolite database for root, tuber, and banana crops to facilitate modern breeding in understudied crops.

Elliott J Price¹, Margit Drapal¹, Laura Perez-Fons¹, Delphine Amah², Ranjana Bhattacharjee², Bettina Heider³, Mathieu Rouard⁴, Rony Swennen^5,6,7, Luis Augusto Becerra Lopez-Lavalle⁸, Paul D Fraser¹.

Abstract

Roots, tubers, and bananas (RTB) are vital staples for food security in the world's poorest nations. A major constraint to current RTB breeding programmes is limited knowledge on the available diversity due to lack of efficient germplasm characterization and structure. In recent years large-scale efforts have begun to elucidate the genetic and phenotypic diversity of germplasm collections and populations and, yet, biochemical measurements have often been overlooked despite metabolite composition being directly associated with agronomic and consumer traits. Here we present a compound database and concentration range for metabolites detected in the major RTB crops: banana (Musa spp.), cassava (Manihot esculenta), potato (Solanum tuberosum), sweet potato (Ipomoea batatas), and yam (Dioscorea spp.), following metabolomics-based diversity screening of global collections held within the CGIAR institutes. The dataset including 711 chemical features provides a valuable resource regarding the comparative biochemical composition of each RTB crop and highlights the potential diversity available for incorporation into crop improvement programmes. Particularly, the tropical crops cassava, sweet potato and banana displayed more complex compositional metabolite profiles with representations of up to 22 chemical classes (unknowns excluded) than that of potato, for which only metabolites from 10 chemical classes were detected. Additionally, over 20% of biochemical signatures remained unidentified for every crop analyzed. Integration of metabolomics with the on-going genomic and phenotypic studies will enhance 'omics-wide associations of molecular signatures with agronomic and consumer traits via easily quantifiable biochemical markers to aid gene discovery and functional characterization.

Entities: Chemical Disease Species

Keywords: Banana and plantain (Musa spp.); cassava (Manihot esculenta); genebanks; metabolomics; modern breeding; potato (Solanum tuberosum); sweet potato (Ipomoea batatas); yam (Dioscorea spp.)

Mesh：

Year: 2020 PMID： 31845400 PMCID： PMC7383867 DOI： 10.1111/tpj.14649

Source DB: PubMed Journal: Plant J ISSN： 0960-7412 Impact factor: 6.417

Root, tuber and banana (RTB) crops are consumed by over 2 billion people. A comparative metabolomics workflow is applied to RTB crops. Biochemical diversity of understudied species was captured and is a freely available data resource. Potential application in breeding programmes, for example bio‐fortification, disease resistance mechanisms, and stress tolerance. Integration into multiomic workflows.

Introduction

Importance of RTB crops

The annual global production of root, tuber, and banana (RTB) crops exceeds 1000 million tonnes (Food and Agriculture Organization of the United Nations, 2019) and feeds over 2 billion people worldwide (Scott et al., 2000) (Figure 1). RTBs are especially vital in the least developed countries where they provide ≥15% of daily calories and are a source of economic subsistence to over 750 million people (Kennedy et al., 2019). In Africa, the production of RTBs exceeds that for all other staples combined (Sanginga, 2015) and are the most important crops for direct human consumption. Over 30 000 RTB crop accessions are currently held in the genebanks of four CGIAR institutes with many further accessions in national and regional collections, representing the diversity currently available for breeding (Tay, 2013). Whilst the RTB crops are cited to have high yield potential (especially regarding calories per hectare production) when compared with other staples (cereals), the extent of diversity available for breeding cannot be capitalized upon due to limited knowledge on the biological potential of these accessions. In addition to the dearth of genetic resources, basic characterization such as phenotypic and agronomic traits, including growth and yield parameters, are scarce for a large proportion of accessions. Consequently, insufficient germplasm characterization and evaluation has hindered the exploitation of the available diversity within breeding programmes (Jansky et al., 2015). Depending on the RTB crop three factors have contributed, to a varying degree, to the current situation: (i) poor or under‐representation of crop wild relatives in germplasm collections (Castañeda‐Álvarez et al., 2016); (ii) high levels of accession duplication and misidentifications in the collections, particularly prevalent in clonal crop collections (yam up to 30% (Girma et al., 2012), potato varies from c. 4.5 % (Ellis et al., 2018) to c. 75 % (Huamán et al., 2000) across different subsets); and (iii) the poorly recorded assessment of germplasm diversity, which is especially complex in RTB crops due to crop wild gene flow via ennoblement, hybridization from overlapping natural and cultivation habitats, and genetic assimilation from vegetative propagation (Scarcelli et al., 2017).

Figure 1

Production of root, tuber, and banana (RTB) crops. Global and continental production of RTB crops highlights their importance as a staple food and livelihood for billions of people especially in Low Income Food Deficit Countries (LIFDCs). Data taken from FAOSTAT (production data for 2017, value data for 2016) (Food and Agriculture Organization of the United Nations, 2019). World map image modified from www.freevectormaps.com In recent years many large‐scale efforts have sought to further understand these crops using genome sequences (Xu et al., 2011; D'Hont et al., 2012; Wang et al., 2014; Tamiru et al., 2017; Yang et al., 2017; Li et al., 2019) and genome diversity studies (Bredeson et al., 2016; Hardigan et al., 2017; Nyine et al., 2017; Christelová et al., 2017; Muñoz‐Rodríguez et al., 2018; Němečková et al., 2018), genetic selection (Wolfe et al., 2016), molecular markers (QTLs) (Monden and Tahara, 2017; Kim et al., 2017; Sharma and Bryan, 2017), and comparative transcriptome resources (Kundapura Venkataramana et al., 2015; Sarah et al., 2017; van Wesemael et al., 2019; Cenci et al., 2019) widely developed alongside morphologic, agronomic and phenotypic classifications (Oliveira et al., 2015; Rahajeng and Rahayuningsih, 2017; Dépigny et al., 2018; Girma et al., 2018; van Wesemael et al., 2019). The progress of the CGIAR Research Program on Roots, Tubers and Bananas (www.rtb.cgiar.org), applying genomics‐assisted breeding to RTBs, has recently been reviewed (Friedmann et al., 2018). Although typically in the early stages, the authors noted that success will be dependent upon the quality of phenotypic characterization.

Why metabolomics in breeding?

Agronomic and consumer traits can often be directly associated with metabolite composition (Bino et al., 2004), which favours the use of metabolomics to generate measurable biochemical signatures for characterization. Metabolomics approaches can provide a standalone technique when genetic mechanisms are not well understood (Price et al., 2017), as evident in RTB crops. Phenotypic evaluation of materials is required multiple times along the breeding pipeline and integration of metabolomics into current practices is advocated to greatly shorten the development time of new varieties, reduce costs, and provide unbiased phenotypic profiles for validation of genetic parameters (Fernie and Schauer, 2009), and has the potential of being a powerful approach for future precision breeding (Zivy et al., 2015). Various different metabolomics approaches can be undertaken, generally encompassing untargeted metabolite profiling including broad‐scale relative quantification of known and unknown metabolites and targeted profiling and absolute quantification of identified metabolites. As the accuracy of identification and quantification increases, so does the time required for analysis. Through integration with other ’omics to associate genotype with phenotype, the regulation of agronomic/ phenotypic traits (phenomics) at the genetic (genomics, epigenomics), transcriptional (transcriptomics), translational (proteomic) and metabolic level (metabolomics) can be dissected in a holistic systems biology manner to enhance the understanding of crop development and its responses to biotic and abiotic changes. The development of bioinformatics tools and resources has rapidly progressed alongside ’omics technologies to facilitate the integration and management of these large and complex datasets. However, the interpretation of integrated datasets is complex, requiring expertise and collaboration across many scientific fields, and remains the major challenge for multiomics investigations (Pinu et al., 2019; Misra et al., 2019). This system biology approach has already been applied to model crops such as tomato, rice, and wheat, in which metabolomics analyses have provided a richness of resources (Grennan, 2009; Perez‐Fons et al., 2014) available to integrate with genetic breeding approaches. These resources rapidly accelerated progress for identifying trait markers (Schwahn et al., 2014; Li et al., 2016a; Sprenger et al., 2018), elucidation of biosynthetic pathways contributing to traits (Schwahn et al., 2014; Daygon et al., 2017), and validation of genetic/ metabolic prediction (Wei et al., 2018). For example, integrating genetic and metabolite markers for phenotypic traits of wheat has provided more robust signatures than either alone (Ward et al., 2015), and both were equally predictive for complex traits (Riedelsheimer et al., 2012). Furthermore, metabolite markers are inherently affected by environmental factors and can provide more precise measures for crop trait variation compared with genetic markers. Metabolite markers can be stably inherited (Chan et al., 2010) and, as such, the metabolome can be viewed in an analogous manner to the epigenome, acting as a dynamic yet conserved network comprised from genetic and environmental influence. Consequently, when performing comparative analyses of crop growth under different environments, quantifying the contributions of biochemical signatures towards phenotype is often simpler than for genetic markers, especially in highly heterozygous crops, like RTBs. This gives rise to the potential to generate chemotype core collections (CCC) for use in breeding, in which material selection is based on fixation of a complement of biochemical signatures that could confer the desired characteristics more robust to environmental variation. This is contrary to genotypic core collections, in which breeding tries to fix gene variants that can then often harbour different traits under different environments. Furthermore, increased trait stability of CCCs would provide a suitable base for comparative GxE (Genotype × Environment) studies to elucidate environmental effects on crop production (Xu, 2016). CCCs would therefore complement genotypic core collections to facilitate localized precision breeding in the future. Despite these advantages, the deployment of enhanced cultivars directly from metabolomics‐directed breeding is still limited, largely based on the slow uptake by breeders and the limited access to this technology, with the field still being listed as prospective but with the potential to be game‐changing for future agricultural practice (Kumar et al., 2017).

Prospective societal impact

Given the role that RTB crops play in the livelihoods of millions of people in the least developed nations, improvement is paramount. On the whole, RTBs are primarily grown through small‐holder farms with a large proportion of child and female labour and, therefore, the crops hold extreme importance for the most vulnerable portions of society. Increasing the precision and speed of phenotyping during the breeding ladder (Figure 2) would enable faster crop improvements and, therefore, a multitude of benefits: (i) enhanced agronomic, breeding efficiency and consumer traits (e.g. increased yields, increased flowering, reduced dormancy and bio‐fortification) to tackle food insecurity and malnutrition, which are more prevalent in RTB growing regions; (ii) decreased fertilizer inputs and improved pest and disease resistance to lower production costs and increase incomes; (iii) increased abiotic stress tolerance to improve climate change adaptation and yields on marginal, saline or drought prone soils; and (iv) facilitate a better understanding of basic phenomena such as crop evolution/domestication, ploidy, and inheritance mechanisms for understudied clonal crops.

Figure 2

Workflow of metabolomics analysis established to screen biochemical diversity of root, tuber, and banana crops. The use of numerous and complementary analytical platforms provides a more comprehensive coverage of the metabolome; customized libraries specific for each crop reduce matrix effects. Metabolic fingerprint analysis typically takes c. 20 min per sample and generates c. 10 000 features, with data analysis being c. 1 h per 100 samples. Library creation is on‐going but requires c. 20 h per crop before implementing automation, inclusive of machine time. Untargeted metabolite profiling takes c. 60 min per sample per analytical platform and data analysis plus manual curation takes c. 10 h per 100 samples. Example statistical visualizations created using SIMCA‐P (Umetrics), Metscape (Basu et al., 2017) in Cytoscape (Shannon, 2003), and an in‐house pathway mapper, Biosynlab (Royal Holloway University of London, UK).

Results and Discussion

Metabolomics approach – general screening

The metabolomics workflow implemented and optimized for each crop was based on a general concept (Figure 2). All plant materials collected were flash‐frozen, lyophilized, and ground to a homogenous powder before undergoing metabolite profiling workflow to ensure consistent reproducibility. A common two‐phase solvent extraction method was implemented to extract a broad range of metabolites from each type of sample. This standardized and widely used method also allowed rapid optimization of different tissue types. Furthermore, the partition into aqueous and organic phase allowed the independent analysis of polar and non‐polar extracts, which simplified sample handling, chromatographic method development, and metabolite identification. During analysis, the requirements for extraction blanks, quality controls and internal standards were implemented to maintain consistency and good laboratory practices and enable normalization and batch correction (Fernie and Klee, 2011).

Database curation

The data generated can be deposited in public repositories addressing metabolomics in general (Metabolights, Dataverse, Metabolomics Workbench, Metexplore or Metabolonote) and/or crop specific database such as CassavaBase and MusaBase or PlantCyc. Initial fingerprinting via LC‐MS was conducted on materials to enable a rapid screen of biochemical diversity, especially focussed on secondary metabolism as this is typically where the largest proportion of chemical diversity resides (De Luca et al., 2012). The bottleneck in many LC‐MS based metabolomics studies is compound identification and use of the same chromatographic method meant data generated could also be used to guide the purchase of metabolite standards for LC‐MS library generation. Typical fingerprinting screens were performed on methanol extracts and measured only one biological replicate for speed. A minimum of three biological replicates and at least two analytical platforms were used for untargeted studies, including study of both aqueous and organic extracts for more comprehensive coverage of the metabolome. For the identification of features/compounds detected during the untargeted analysis, quality controls representing a pool of samples for each species were used. Peaks detected during GC‐MS and LC‐MS analyses were identified using published libraries (e.g. NIST, GMD (Kopka et al., 2005), MassBank (Horai et al., 2010) etc.) and confirmed by authentic commercial standards to build a crop specific library. After database curation, automated analysis was possible for the whole dataset of each species and the identification process integrated as an element of the metabolomics data analysis pipeline. Nevertheless, manual curation was undertaken for each dataset to reduce matching errors. The analysis of isoprenoid derived metabolites, such as carotenoids and chlorophylls, was carried out using ultra high or high performance liquid chromatography coupled with a diode array detector (U/HPLC‐DAD). As the composition of leaf and tuber materials has been reported extensively (Burns et al., 2003; Drapal et al., 2017; Price et al., 2018; Drapal et al., 2019b; Drapal et al., 2019c) and methods previously validated (Fraser et al., 2000; Nogueira et al., 2013), this was performed in a semitargeted mode in which the majority of compounds was quantified absolutely. This approach remains essential due to the intrinsic chemical nature of the photosynthetic pigments displaying a lack of amenability to MS.

Current progress in defining the metabolome of RTB crops

The database curated for banana, cassava, potato, sweet potato, and yam, currently includes over 300 identified metabolites (Table S1). Additionally, significant numbers of reoccurring unidentified features summarized as ‘unknowns’ were measured (Figure 3 and Table S2). The metabolites identified in each crop present a broad range of the plant metabolome including amino acids, organic acids, compounds of the tricarboxylic acid (TCA) cycle, isoprenoid derived compounds, phenylpropanoids, sugars, fatty acids, sterols, and corresponding subfamilies. The metabolite libraries have been implemented in the current projects of the RTB programme, facilitating the assessment of biochemical diversity, with future intentions to aid the identification of trait biomarkers in the RTB crops. The limits of metabolite concentrations have been reported to include all the available quantitative range for use in targeted breeding. This is exploitable because extremes are often favoured in crop breeding to achieve the maximum gains and enhancements above the average range and contrasts with other databases reporting the average and/or standard deviation.

Figure 3

Pie‐charts showing total number of annotated compounds in RTB crops following the metabolomics workflow (Figure 2) and displayed (a) per crop and (b) for all RTB crops combined. Colours represent different compound classes and colouration follows the legend clock‐wise per each pie chart. Potato had the simplest biochemical profile with the presence of just 10 chemical classes (excluding unknowns); four of these related to primary metabolism. Sweet potato and banana comprised 13 and 16 chemical families, respectively, whilst the cassava and yam chemo‐libraries sum up over 20 families of compounds (Figure 3a). Sugars was the largest annotated chemical class in all crops. This is expected in sink/storage organs as for the tissues analyzed in the collection. Similarly, chemical classes related to primary metabolism (namely amino acids, organic acids and components of the TCA cycle) were also well annotated in all species. Potato's chemical composition presented the largest proportion of these primary metabolite sectors with sugars comprising more than the other crops representing the presence of higher starch quantity. The divergence between crop compositions resided mostly in components related to secondary metabolism. For example, yams had a greater proportion of odd‐chain fatty acids, which are rare in plants. Also characteristic of yam was the higher content and diversity of nitrogen‐containing compounds such as amines, nucleobases, and catecholamines. Nevertheless, the catecholamine dopamine was vastly more abundant (up to one order of magnitude) in Musa. Triterpenoids also constituted a source of chemical diversity within the RTB crops with a more complex composition found in both cassava and yam. Whilst typically these compounds were detected in the leaf tissue of the accessions, yam tubers also presented significant amounts of sterols. Crude extracts of yam presented a range of triterpenoids, including cholesterol, reflecting the production of glycosylated steroidal saponins within this crop (Sautour et al., 2007). Similarly, cassava leaves showed an accumulation of amyrins and isomers, which are likely to represent the glycosylated pentacyclic saponins. High levels of β‐carotene and xanthophylls were also observed for orange‐fleshed lines of sweet potato and yam tubers, cassava roots, and Musa fruit, as to be expected. The largest diversity of phenolic compounds such as phenylpropanoids, coumarins, flavonoids and lignin/lignin oligomers was encountered in cassava and sweet potato, although for sweet potato many phenolics remain structurally elusive (level 3 unknown). Unknowns comprised over half of all metabolites measured (Figure 3b) and ranged from approximately one‐quarter to one‐third of features recorded, for each individual crop following the analysis of crude extracts (Figure 3a). Distinguishing the chemical features detected via LC‐MS, and turning these into distinct compounds was challenging and will require further work to determine whether each peak is of biological origin. Given that in typical LC‐MS screening over 90% of features detected are not true metabolites (Mahieu and Patti, 2017; Aksenov et al., 2017), a conservative approach to limit false positives was chosen in which only unknowns that are well characterized (e.g. via MS/MS, clear UV–vis spectra) were included in the database. The drawback to this is that the true level of unknowns may be greatly underestimated in the current database. As to be expected, the unknowns that could be assigned to a compound class were predominantly secondary metabolites (Table S2). Unknowns have been given unique identifiers to allow on‐going annotations of compounds for libraries and curation and updating of the database (Table S2). The diversity of compound classes recorded was highest in yam and cassava, then banana, sweet potato, and lowest in potato (Figure 3a). This finding is not unsurprising, given that cassava was most intensively studied (most accessions and on all platforms) and yam is a multispecies crop and large biochemical diversity has previously been evidenced across the genus (Price et al., 2016). In line with this, yam presented the highest proportion of unknowns (c. 50%, Figure 3a); despite not undergoing LC‐MS study as per the other crops. Sweet potato also had a comparably large proportion of unknowns (c. 45%) mostly comprising phenolic‐derived compounds, which are likely to be conjugates (Drapal et al., 2019c). Accurate identification of such compounds has been shown to require comprehensive MS3 fragmentation and is therefore beyond that typically conducted in current metabolite screening practices (Akimoto et al., 2017). Interestingly, even with the relatively extensive application of metabolomics to potato (Puzanskiy et al., 2017), a large number of unknowns still exists and is mostly sugars (Table S2). Carbohydrate analysis is particularly complex, with high numbers of isomers and complex polymers that are likely to contribute to the lack of conclusive annotation. Level 3 unknowns detected in banana extracts were mostly sugars and phenolics. Furthermore, cassava had the lowest proportion of unidentified metabolites. Cassava material has been the most intensively studied area (subjected to all three analytical platforms and the largest number of tissues and accessions analyzed). This highlights that extensive analysis via diverse methods can elucidate unknowns and slowly conquer the challenge of identification, commonly touted as metabolomics’ biggest hurdle. Overall, the observed differences between crops' metabolite databases may be the result of the application of different analytical platforms to each crop within the modular pipeline. However, current observations match that expected from literature. Dominance of particular classes of compounds in each crop reflected the plasticity of plants metabolism to develop physiological features than can be linked to particular phenotypes.

Future developments

Presenting the ranges of metabolites recorded in a simple spreadsheet format enables the easy use of information regarding the comparative biochemical diversity of these under‐characterized crops. All compounds detected represent a portion of the steady‐state metabolome of the plant samples and can be used for untargeted data analysis to unravel the great amount of variation that can be used to guide breeding decisions. The system has proven to be robust over datasets even when measured months apart. Therefore, it is possible for future work to extend the platform from relative to proximate absolute quantification for many compounds through the generation of relative response factors to the internal standard (Cifkova et al., 2012) and subsequent correction following testing of extraction recovery. Therefore, the next step will represent the transition of the untargeted pipeline to a holistic semitargeted system. From this, data can be more informative for use in flux modelling and genome‐wide reconstructions, which are essential for understanding the fundamental processes governing plant physiology (Kruger and Ratcliffe, 2015). More elaborate sample preparations, such as solid phase extraction (SPE) and molecular recognition, via immunoaffinity, or imprinting, can be used to extend the breadth of metabolites captured and increase metabolome coverage. However, this would concurrently increase the number of unidentified compounds, which already represent a considerable proportion of the dataset (Figure 3b). Extensive structural elucidation via multistage MS fragmentation (MSn) and/or coupling of LC to NMR platforms (e.g. LC‐SPE‐MS/NMR) or ion mobility (e.g. LC‐IMS‐MS) has not yet become routine, largely hindered by the high capital costs at outset, and expert knowledge required for data interpretation, which is labour intensive. That said, in recent years a great deal of progress has been made towards the accessibility of tools for computational interpretation of such data (Spicer et al., 2017; Tsugawa, 2018). Investments in automated structural elucidation of unidentified compounds have the potential to revolutionize metabolomics workflows by overcoming the current bottleneck of structural elucidation. However, knowing the structure of a compound does not allow one to fully assess biological relevance. Recent years have seen a shift towards increased spatial resolution via mass spectrometry imaging and localization through cell sorting and laser microdissection etc., alongside flux‐omics and longitudinal (time‐series/developmental) applications. These applications evidence that contextualizing metabolomics data requires a detailed understanding of metabolic network dynamics and functional activity, which will become the next hurdle for the field. Screening of complete germplasm collections will allow the establishment of a CCC that comprises the majority of biochemical diversity available. CCCs would therefore represent an advance in precision over morphological core collections and can be overlaid with genotypic collections to reduce and focus the selection on accessions with the highest prospects for successful transfer of desired traits, that is through overcoming genetic differences that do not translate through to phenotype and by encompassing biochemical traits not observed at the morphological level.

Conclusion

Outlook for metabolomics in breeding of RTBs

Future work appears set to capitalize on the synergy of pursuing a multiple ’omics platform for rapid progress during crop improvement and breeding. At the forefront of this pursuit is the combination of genomics and transcriptomics for breeding and trait understanding. Moreover, recently, metabolomics has been favoured to enhance precision during molecular phenotyping, and the utilization of such methods looks set to increase. Metabolomics can prove especially useful when tackling complex traits, that is those with many determinants, as the metabolome inherently reflects environmental factors and other stimuli such as chemical interactions. This is evidenced by the preference for elucidation of ‘interactomes’ such as the rhizosphere and volatile‐ome of plants by incorporating deep sequencing of the microbiome (Hu et al., 2018; Jacoby and Kopriva, 2019) or atmospheric transformation of volatiles (Blande et al., 2014; Li et al., 2016b), respectively. Combining these measurements expands the biological system to the complete local environment and therefore characterization occurs at the ecosystem level. Improvement of RTB crops is vital for the attainment of the UN Sustainable Development Goals and improving livelihoods in the most deprived regions of the globe. In addition, the RTB crops show potential as scientific models for the analysis of complex genetic architectures, revealing the interplay between evolution and domestication in clonal crops. Breeding and development for each of the RTB crops shows unique pitfalls and problems, yet each is widely grown due to the unique traits they present. The complexities that have hindered crop improvement and agronomic development for production of RTBs to date may also be the crops’ largest saviours. In light of climate change, the large morphological plasticity, limited genetic assimilation, and resilience of these crops to extreme conditions and low technology agricultural systems provide the potential to adapt and overcome the impacts of global warming and, therefore, provide the incentive to increase research efforts towards these critically important understudied RTB crops. To ensure this, the breeding community needs to move beyond viewing metabolomics and other ’omics as a hypothesis‐free service science to techniques that can be integrated to solve complex biological questions in a rapid, large‐scale manner. Ironically, the initial characterization of plant genetic resources and diversity available is crucial to pose the biological questions for investigation and, as such, metabolomics can progress on both fronts.

Experimental procedures

Samples from in vitro cultures and plants grown in the field were harvested, flash‐frozen with liquid nitrogen, and lyophilized to remove all water content. The samples comprised a collection of different tissues, for example leaf, root, tuber, stem, and fruit from each crop. The tissue samples were then ground to a fine powder and metabolites extracted. Sample preparation and extraction and the profiling procedure of the extracts was based on previously published protocols and optimized for each crop to account for the matrix effects of the respective tissue (Perez‐Fons et al., 2014; Price et al., 2016; Drapal et al., 2017; Price et al., 2017; Price et al., 2018; Drapal et al., 2019a; Drapal et al., 2019b; Drapal et al., 2019c). To account for the difference in chemical properties of the metabolites, three different platforms were utilized in a modular manner for the screening process: ultra/high performance liquid chromatography with diode array detector (U/HPLC‐DAD), liquid chromatography‐mass spectrometry (LC‐MS) and gas chromatography‐mass spectrometry (GC‐MS). The yam materials underwent GC‐MS of both polar and non‐polar extracts alongside HPLC‐DAD of the non‐polar phase. All other crops underwent GC‐MS and LC‐MS analysis on polar extracts and UPLC‐DAD of non‐polar extracts. Non‐polar extracts from cassava and sweet potato were also subjected to GC‐MS analysis. The curation of crop specific libraries with identified metabolites followed the same workflow for both the GC‐MS and LC‐MS analytical platforms (Figure 3), whereas an established UPLC‐DAD library was used for all crops (Fraser et al., 2000; Burns et al., 2003) with an extended version used for yam and sweet potato (Price et al., 2018). All features detected in the generated sample set were aligned and following statistical analysis, significant features were identified and confirmed with standards (Fernie and Klee, 2011). GC‐MS data were processed via AMDIS (v2.71, NIST) whereas the alignment and filtering of chromatograms for LC‐MS was achieved via metaMS (Wehrens et al., 2014; Franceschi et al., 2014). U/HPLC‐PDA data were analyzed via Empower 2TM software (Waters Corp.). Manual confirmation of the identified compounds was carried out (Table S1) and recurrent unidentified features that represent hypothetical compounds have been reported with unique identifiers per species (Table S2) (Bino et al., 2004). Normalization to internal standards and sample weight allowed relative quantification, concatenation of data from the platforms, and subsequent comparison between tissue types and species. For the UPLC, absolute quantification for the major photosynthetic compounds (β‐carotene, violaxanthin, neoxanthin, phytoene, phytofluene, chlorophyll a, chlorophyll b, β‐cryptoxanthin, lutein, antheraxanthin, and zeaxanthin) was achieved via comparison with dose–response curves of authentic commercially available standards. For carotenoids, for which an authentic standard was not available, quantification was based on standard curves of carotenoids with the closest chemical structure and spectral properties similarity. When compounds were detected on more than one analytical platform, the values reported in the database represent that of the maxima recorded and the analytical technique that proved to be more amenable was cited first. The database and pie‐charts were created in Microsoft Excel 2013. As the compiled dataset was comprised of numerous independent analyses undertaken over a three‐year time‐frame, the metabolite ranges reported for each crop differed in the number of samples analyzed and replicate measurements made. However, for each metabolite reported per crop a minimum of 12 measurements were taken and the validity and repeatability of measures were controlled within each independent study. Furthermore, analytical drift and different response factors were controlled platform‐to‐platform, batch–to‐batch and study‐to‐study via the analysis of both reference sample (quality control) and reference metabolite (internal standard) to ensure robustness.

Conflicts of interest

The authors declare that they have no conflicts of interest in accordance with journal policy.

Author contributions

EP, MD and LP‐F generated the datasets, assembled the figures, compiled supplementary tables, and drafted the manuscript and devised the concept. DA, RB, BH, MR and RS selected plant materials, aided interpretation of results, and elaborated the manuscript. LABL‐L selected plant materials, aided interpretation of results, coordinated across centres, and elaborated the manuscript. PDF aided interpretation of results, drafted and edited the manuscript, secured funding and devised the concept. Table S1. Database of metabolite concentration range per crop. Click here for additional data file. Table S2. Lists of recurrent unknowns identified per crop. Click here for additional data file. Click here for additional data file.

66 in total

Review 1. Metabolic niches in the rhizosphere microbiome: new tools and approaches to analyse metabolic mechanisms of plant-microbe nutrient exchange.

Authors: Richard P Jacoby; Stanislav Kopriva
Journal: J Exp Bot Date: 2019-02-20 Impact factor: 6.992

2. Genomic and metabolic prediction of complex heterotic traits in hybrid maize.

Authors: Christian Riedelsheimer; Angelika Czedik-Eysenberg; Christoph Grieder; Jan Lisec; Frank Technow; Ronan Sulpice; Thomas Altmann; Mark Stitt; Lothar Willmitzer; Albrecht E Melchinger
Journal: Nat Genet Date: 2012-01-15 Impact factor: 38.330

3. Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity.

Authors: Jessen V Bredeson; Jessica B Lyons; Simon E Prochnik; G Albert Wu; Cindy M Ha; Eric Edsinger-Gonzales; Jane Grimwood; Jeremy Schmutz; Ismail Y Rabbi; Chiedozie Egesi; Poasa Nauluvula; Vincent Lebot; Joseph Ndunguru; Geoffrey Mkamilo; Rebecca S Bart; Tim L Setter; Roslyn M Gleadow; Peter Kulakow; Morag E Ferguson; Steve Rounsley; Daniel S Rokhsar
Journal: Nat Biotechnol Date: 2016-04-18 Impact factor: 54.908

4. Sparse network modeling and metscape-based visualization methods for the analysis of large-scale metabolomics data.

Authors: Sumanta Basu; William Duren; Charles R Evans; Charles F Burant; George Michailidis; Alla Karnovsky
Journal: Bioinformatics Date: 2017-05-15 Impact factor: 6.937

Review 5. Genetic linkage analysis using DNA markers in sweetpotato.

Authors: Yuki Monden; Makoto Tahara
Journal: Breed Sci Date: 2017-02-11 Impact factor: 2.086

6. Trait variation and genetic diversity in a banana genomic selection training population.

Authors: Moses Nyine; Brigitte Uwimana; Rony Swennen; Michael Batte; Allan Brown; Pavla Christelová; Eva Hřibová; Jim Lorenzen; Jaroslav Doležel
Journal: PLoS One Date: 2017-06-06 Impact factor: 3.240

7. An Integrative Genetic Study of Rice Metabolism, Growth and Stochastic Variation Reveals Potential C/N Partitioning Loci.

Authors: Baohua Li; Yuanyuan Zhang; Seyed Abolghasem Mohammadi; Dongxin Huai; Yongming Zhou; Daniel J Kliebenstein
Journal: Sci Rep Date: 2016-07-21 Impact factor: 4.379

8. The 'Plantain-Optim' dataset: Agronomic traits of 405 plantains every 15 days from planting to harvest.

Authors: Sylvain Dépigny; Frédéric Tchotang; Médard Talla; Désirée Fofack; David Essomé; Jean-Pierre Ebongué; Bernard Kengni; Thierry Lescot
Journal: Data Brief Date: 2018-02-02

9. Molecular and Cytogenetic Study of East African Highland Banana.

Authors: Alžběta Němečková; Pavla Christelová; Jana Čížková; Moses Nyine; Ines Van den Houwe; Radim Svačina; Brigitte Uwimana; Rony Swennen; Jaroslav Doležel; Eva Hřibová
Journal: Front Plant Sci Date: 2018-10-04 Impact factor: 5.753

10. Metabolite profiling of yam (Dioscorea spp.) accessions for use in crop improvement programmes.

Authors: Elliott J Price; Ranjana Bhattacharjee; Antonio Lopez-Montes; Paul D Fraser
Journal: Metabolomics Date: 2017-10-14 Impact factor: 4.290

10 in total

1. Correction.

Authors:
Journal: Plant J Date: 2020-08 Impact factor: 6.417

2. Recent applications of metabolomics in plant breeding.

Authors: Nozomu Sakurai
Journal: Breed Sci Date: 2022-02-03 Impact factor: 2.014

3. Using precision phenotyping to inform de novo domestication.

Authors: Alisdair R Fernie; Saleh Alseekh; Jie Liu; Jianbing Yan
Journal: Plant Physiol Date: 2021-07-06 Impact factor: 8.340

Review 4. Mass spectrometry-based metabolomics: a guide for annotation, quantification and best reporting practices.

Authors: Saleh Alseekh; Asaph Aharoni; Yariv Brotman; Kévin Contrepois; John D'Auria; Jan Ewald; Jennifer C Ewald; Paul D Fraser; Patrick Giavalisco; Robert D Hall; Matthias Heinemann; Hannes Link; Jie Luo; Steffen Neumann; Jens Nielsen; Leonardo Perez de Souza; Kazuki Saito; Uwe Sauer; Frank C Schroeder; Stefan Schuster; Gary Siuzdak; Aleksandra Skirycz; Lloyd W Sumner; Michael P Snyder; Huiru Tang; Takayuki Tohge; Yulan Wang; Weiwei Wen; Si Wu; Guowang Xu; Nicola Zamboni; Alisdair R Fernie
Journal: Nat Methods Date: 2021-07-08 Impact factor: 47.990

5. Cross-Species Comparison of Fruit-Metabolomics to Elucidate Metabolic Regulation of Fruit Polyphenolics Among Solanaceous Crops.

Authors: Carla Lenore F Calumpang; Tomoki Saigo; Mutsumi Watanabe; Takayuki Tohge
Journal: Metabolites Date: 2020-05-19

6. Optimized Protocol for In Vitro Pollen Germination in Yam (Dioscorea spp.).

Authors: Jean M Mondo; Paterne A Agre; Robert Asiedu; Malachy O Akoroda; Asrat Asfaw
Journal: Plants (Basel) Date: 2021-04-18

Review 7. Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species.

Authors: Cassandria Geraldine Tay Fernandez; Benjamin John Nestor; Monica Furaste Danilevicz; Mitchell Gill; Jakob Petereit; Philipp Emanuel Bayer; Patrick Michael Finnegan; Jacqueline Batley; David Edwards
Journal: Int J Mol Sci Date: 2022-02-28 Impact factor: 5.923

8. Datasets from harmonised metabolic phenotyping of root, tuber and banana crop.

Authors: Margit Drapal; Laura Perez-Fons; Elliott J Price; Delphine Amah; Ranjana Bhattacharjee; Bettina Heider; Mathieu Rouard; Rony Swennen; Luis Augusto Becerra Lopez-Lavalle; Paul D Fraser
Journal: Data Brief Date: 2022-03-12

9. The metabotyping of an East African cassava diversity panel: A core collection for developing biotic stress tolerance in cassava.

Authors: Laura Perez-Fons; Tatiana M Ovalle; M N Maruthi; John Colvin; Luis Augusto Becerra Lopez-Lavalle; Paul D Fraser
Journal: PLoS One Date: 2020-11-18 Impact factor: 3.240

10. The chemotype core collection of genus Nicotiana.

Authors: Margit Drapal; Eugenia M A Enfissi; Paul D Fraser
Journal: Plant J Date: 2022-04-07 Impact factor: 7.091

10 in total