Literature DB >> 35131943

Metabolomic selection for enhanced fruit flavor.

Vincent Colantonio¹, Luis Felipe V Ferrão¹, Denise M Tieman¹, Nikolay Bliznyuk^2,3,4, Charles Sims⁵, Harry J Klee⁶, Patricio Munoz⁶, Marcio F R Resende⁶.

Abstract

Although they are staple foods in cuisines globally, many commercial fruit varieties have become progressively less flavorful over time. Due to the cost and difficulty associated with flavor phenotyping, breeding programs have long been challenged in selecting for this complex trait. To address this issue, we leveraged targeted metabolomics of diverse tomato and blueberry accessions and their corresponding consumer panel ratings to create statistical and machine learning models that can predict sensory perceptions of fruit flavor. Using these models, a breeding program can assess flavor ratings for a large number of genotypes, previously limited by the low throughput of consumer sensory panels. The ability to predict consumer ratings of liking, sweet, sour, umami, and flavor intensity was evaluated by a 10-fold cross-validation, and the accuracies of 18 different models were assessed. The prediction accuracies were high for most attributes and ranged from 0.87 for sourness intensity in blueberry using XGBoost to 0.46 for overall liking in tomato using linear regression. Further, the best-performing models were used to infer the flavor compounds (sugars, acids, and volatiles) that contribute most to each flavor attribute. We found that the variance decomposition of overall liking score estimates that 42% and 56% of the variance was explained by volatile organic compounds in tomato and blueberry, respectively. We expect that these models will enable an earlier incorporation of flavor as breeding targets and encourage selection and release of more flavorful fruit varieties.

Entities: Chemical

Keywords: artificial intelligence; flavor; fruit quality

Mesh：

Substances：

Year: 2022 PMID： 35131943 PMCID： PMC8860002 DOI： 10.1073/pnas.2115865119

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Plant breeders and geneticists have made continuous and substantial progress in the development of varieties that are more resilient and higher-yielding—much to the benefit of producers worldwide. Yet, during this extended period of progress, consumer-oriented quality traits such as flavor have been often neglected or treated as low-priority breeding targets, contributing to widespread consumer dissatisfaction with modern varieties of fruits and vegetables (1). An important reason for this low priority is a reward system that pays growers based on crop yield, leading to prioritization of breeding targets that are mainly producer-oriented. However, as consumer willingness to pay premiums for higher-quality products rises, demand for consumer-oriented traits in food production systems is increasing (2). Accordingly, a reemerging interest in fruit and vegetable flavor quality creates the need for high-yielding varieties with exceptional flavor profiles. Fruit flavor is the product of the complex interactions between the chemical composition of a fruit and the taste, olfaction, and psychology of the consumer (3–5). To breed and develop varieties with improved flavor properties, the genetic complexities of fruit flavor must be captured and assessed. Flavor is currently evaluated by consumer sensory panels or individually by breeders. Field evaluations are generally subjective and error-prone as they typically consist of the sensory preferences of one or few individuals. However, field evaluation has an advantage in that many varieties can be evaluated in a given day. In contrast, population-based sensory panels are more objective, accurate, and well-established, but they can be costly, time-consuming, and difficult to scale to a large breeding program. The difficulties associated with accurate flavor phenotyping have contributed to the lack of selection for fruit flavor and thereby contributed to the widespread consumer belief that commercial fruit flavor has declined (6, 7). Cheap and scalable flavor selection methods would greatly benefit the breeding process. The main driver of fruit flavor perception is its chemical composition. Fruits contain a diverse array of sugars, acids, and volatiles whose concentrations are driven by genetic and environmental effects. Sugars and acids are largely perceived by taste receptors on the tongue and the volatiles by receptors located in the olfactory epithelium (8). We hypothesized that by quantifying the chemical profile of a fruit and its corresponding consumer perception, models predicting consumer flavor preferences can be created. These prediction models can increase throughput of flavor phenotyping, allowing a breeder to make selections for improved flavor on hundreds of genotypes per season. This approach is analogous to the concept of genomic selection (9), where DNA markers are used in plant breeding programs to predict the genetic merit of individuals for highly complex traits. Here, we propose the use of statistical methods to model the metabolomic profile in a breeding population and predict flavor perception. Additionally, by leveraging the trained models for inference, specific metabolites that underlie consumer flavor preferences can be elucidated, identifying targets for marker-assisted selection (MAS) and for the food industry to enhance flavor in its products. In flavor studies, the most widespread statistical modeling approaches to date include multiple linear regression and partial least squares (PLS) regression (10, 11). However, the process of developing metabolomic-based prediction models can be challenging due to the large number of chemical compounds present in a fruit and the fact that the concentrations of many of the flavor-associated chemicals are correlated with one another due to shared biosynthetic pathways. Fortunately, breeders and quantitative geneticists are already dealing with similar types of data in the area of genomic selection to make selection of complex traits using genomic information. With the advent of genomic selection, for example, a variety of Bayesian linear regression models with different priors were proposed to predict complex phenotypes using DNA markers. These models included Bayes A and Bayes B (9), Bayes Cπ (12), Bayesian LASSO (13), and Bayesian ridge regression (14, 15), among others. Recently, there has been increasing interest in machine learning models applied to genomic (16) and metabolomic data (11) as well as metabolomic data applied to trait biomarker research (17–21). However, few empirical studies have applied machine learning models at the metabolome level, or specifically for the enhancement of fruit flavor. Here we address the limitations in flavor phenotyping and propose an indirect phenotyping approach that has significantly higher throughput compared to current standards. We assessed a range of statistical and machine learning models that take the chemical profile of a fruit and make predictions of its consumer flavor perception. To this end, we combined information at the metabolome and sensory panel level for two important horticultural crops, tomato and blueberry, and demonstrate that metabolomic prediction models can be employed in a breeding program to make simpler and more accurate selections for flavorful varieties. Additionally, we leverage the trained models to infer the contributions of volatiles, sugars, and acids to sensory perceptions and consumer likeability. Our results suggest that up to 56% of the variance associated with overall consumer liking can be attributed to volatile compounds. Furthermore, we demonstrate that machine learning approaches are generally the best predictors of consumer flavor preferences and metabolomic selection accuracies are superior to genomic selection models, highlighting the potential in breeding applications.

Results

Data.

In order to study the capacity of different prediction models and the importance of different metabolites in flavor perception, we performed an analysis of previously published data (4, 5, 10, 22) combined with new data added in this study for two fruit species: tomato (Solanum lycopersicum) and blueberry (Vaccinium spp.). For each fruit, targeted sets of sugars, acids, and volatiles were quantified in diverse accessions including commercial cultivars, heirloom varieties, and germplasm selections from the University of Florida tomato and blueberry breeding programs. The tomato population includes a greater range of genetically diverse materials than previously analyzed, while the blueberry population is more representative of an elite breeding population. Consumer sensory panels rated each accession for flavor attributes including sweetness, sourness, flavor intensity, and overall liking. Additionally, sensory perceptions of umami were quantified solely for tomato.

Network Analysis Recapitulates Metabolic Pathways.

A weighted correlation network analysis was performed on the metabolite concentrations across all fruit accessions for tomato and blueberry (Figs. 1 and 2). The results are largely consistent with knowledge of the individual biosynthetic pathways and provide insights into the relationships between pathways. For example, there are strong associations between the apocarotenoid volatiles (e.g., geranial and β-cyclocitral) and the fatty acid-derived volatiles (e.g., 1-pentanol and E-2-heptenal). Apocarotenoid volatiles are derived from precursors localized in plastids and their contents substantially increase during the conversion of chloroplasts to chromoplasts (23), while the precursors of fatty-acid-derived volatiles are membrane lipids. TomloxC is an essential enzyme for the synthesis of five- and six-carbon fatty acid-derived volatiles (24, 25). Recently, a potential link between these pathways was proposed with a quantitative trait locus (QTL) analysis that implicated a role for TomloxC in apocarotenoid synthesis, possibly by a cooxidation mechanism (26).

Fig. 1.

Fig. 2.

(A) Weighted correlation network analysis of blueberry metabolites and their assigned clusters based on their known biochemical classification. The size of each metabolite node indicates betweenness centrality. The thickness of the lines connecting metabolites is scaled relative to the correlation between the metabolites. The identity of each metabolite is denoted by number in the legend. (B) Distribution of metabolite concentrations for each volatile group across the blueberry population. Volatile concentrations are reported in nanograms per gram fresh weight per hour (ng/gfw/h) on a log 10 scale.

(A) Weighted correlation network analysis of tomato metabolites and their assigned clusters based on their known biochemical classification. The size of each metabolite node indicates betweenness centrality, a measure of how often a node exists on the shortest path between other nodes. The thickness of the lines connecting metabolites is scaled relative to the correlation between the metabolites. The identity of each metabolite is denoted by number in the legend. (B) Distribution of metabolite concentrations for each volatile group across the tomato population. Volatile concentrations are reported in nanograms per gram fresh weight per hour (ng/gfw/h) on a log 10 scale. (A) Weighted correlation network analysis of blueberry metabolites and their assigned clusters based on their known biochemical classification. The size of each metabolite node indicates betweenness centrality. The thickness of the lines connecting metabolites is scaled relative to the correlation between the metabolites. The identity of each metabolite is denoted by number in the legend. (B) Distribution of metabolite concentrations for each volatile group across the blueberry population. Volatile concentrations are reported in nanograms per gram fresh weight per hour (ng/gfw/h) on a log 10 scale.

Contributions of Sugars, Acids, and Volatiles to Flavor Perceptions.

In order to determine if the fruit metabolome could explain variation in consumer sensory panel ratings, we partitioned the metabolites into modules according to their biochemical classifications (Figs. 1 and 2 and Datasets S1 and S2). We then separated the consumer sensory variance into aggregated components explained by each module (Fig. 3, , and Dataset S3). We further combined the individual variance components into two main groups for analysis: sugars/acids and volatiles (Fig. 3). In both tomato and blueberry, a large proportion of the variance was explained by sugars/acids and volatiles, while little variance was attributed to the residuals (Fig. 3). Furthermore, the proportion of variance explained by sugars/acids varied across the flavor attributes and contrasted between the two species. For instance, 77% of the tomato sourness variance was explained by the content of sugars/acids, while these compounds only explained 43% of blueberry sourness. Similarly, while sugars/acids predominantly (60%) explained blueberry sweetness, a larger portion of the tomato phenotypic variance (62%) could be explained by the volatiles. As previously described (3), the results indicate the large influence that volatile compounds can have on sensory attributes in both species, which in turn highlights how important these compounds are to breeding programs for improvement of fruit flavor. For example, the variance decomposition of overall liking score estimates that 42% and 56% of the variance was explained by volatile organic compounds in tomato and blueberry, respectively.

Fig. 3.

Variation in sensory panel ratings explained by sugars/acids and volatiles overall, and by groups of metabolites of known biochemical classification in tomato and blueberry.

Variation in sensory panel ratings explained by sugars/acids and volatiles overall, and by groups of metabolites of known biochemical classification in tomato and blueberry. To further understand how the fruit chemical profile affected consumer flavor, we analyzed the variance explained by each metabolite module. The sugar module was a strong driver of liking (43% in tomato and 18% in blueberry) and sweetness (29% in tomato and 27% in blueberry), while the module representing acids drove sourness (54% in tomato and 38% in blueberry) in both fruits. Some volatile modules were found to make large contributions to flavor ratings. For instance, phenylalanine-derived and lipid-derived compounds contributed to sweetness perception (34 and 16%, respectively) and overall liking score (16 and 13%, respectively) in tomato. Lipid-derived volatiles and compounds grouped as carotenoid/terpenes explained 15 and 21% of blueberry overall liking score, respectively. These results are consistent with previous results that showed strong positive correlations of specific volatiles with fruit sweetness (4, 10).

Predicting Consumer Preferences.

Eighteen statistical and machine-learning methods were employed to predict sensory traits from sugar, acid, and volatile concentrations. Each model was evaluated in a 10-fold cross-validation and each fold was assessed by the correlation between predicted and observed consumer taste panel ratings (Fig. 4, , and Datasets S4 and S5). The cross-validation was repeated 10 times and results were averaged for the final prediction accuracies. We observed the highest prediction accuracies from the XGBoost, gradient-boosting machines, and random-forest models. The XGBoost model showed an average improvement of 20% over the linear regression and 11% over PLS, models traditionally used in food science applications. The accuracy for the model that, on average, performed the best (XGBoost) ranged from 0.62 to 0.87 across all traits and in both species. We found the most predictable traits in tomato to be sweetness (0.8), flavor intensity (0.77), and sourness (0.69) and the most predictable traits in blueberry to be sourness (0.87) and sweetness (0.75). The improvement of the full model that accounted for all the compounds over the model that included only sugars and acids ranged from 3.2 to 36.7% ().

Fig. 4.

(A) Accuracy of predicting flavor ratings from metabolome data across a range of statistical and machine learning models for tomato and blueberry. Averages and model rankings are inclusive of umami accuracies depicted in and Datasets S4 and S5. (B) Accuracy of predicting perception traits for tomato using 70 individuals with genomic and metabolomic data. (C) Accuracies for tomato flavor prediction with a consistent test set of 39 samples and increasing training set sizes ranging from 50 to 170 samples. To further evaluate the opportunity to use metabolomic selection in breeding and to understand its prediction potential compared to genomic selection, we compiled information from 70 varieties of tomato for which we had whole-genome sequence, chemical profile, and sensory panel data (5). Using a 10-fold cross-validation, we applied the genomic selection gBLUP (genomic best linear unbiased prediction) method (27) to predict the consumer sensory ratings from a subset of 79,821 single-nucleotide polymorphisms (SNPs) (Fig. 4 and Dataset S6). We then used the metabolomic information for the same 70 varieties and the same cross-validation partitioning to predict the panel ratings. These 70 genotypes represent a subset of the total 147 accessions. We found that metabolomic selection outperformed genomic selection in the prediction of all these complex traits, especially for sweetness and overall flavor liking. For these traits, the accuracies of metabolomic selection using 70 genotypes were 0.68 and 0.45 for sweetness and overall flavor liking, respectively. These traits were poorly predicted by gBLUP with accuracies of 0.16 and −0.11 for sweetness and overall flavor liking, respectively. While these results are not surprising, given the small population size and the fact that the metabolite data are capturing both genetic and environmental components of fruit flavor, they highlight the complexity of flavor perception as a breeding target and the potential of metabolomic selection as a phenotyping tool to support breeding programs compared with other available methods such as genomic selection, for example. Next, in order to test how many samples are needed to train metabolomic selection models, we performed a subsampling analysis in tomato. For this analysis we randomly selected 39 samples as the test set and trained the model with increasing training set sample sizes from 50 to 170 in steps of 10. We repeated this process 10 times and averaged the accuracies at each sample size (Fig. 4 and Dataset S7). We found that the accuracies predominantly increased with increasing sample sizes but note that the accuracies can be relatively high for certain traits using as few as 50 samples. Sourness was more accurately predicted with the gBLUP and Bayes A models, while gradient-boosting machines achieved higher accuracies when predicting the more complex traits like overall liking.

Metabolites Associated with Desirable Flavor.

In order to find sugars, acids, and volatiles that enhance or suppress consumer sensory perceptions of flavor, models for each fruit were trained using all samples for which we had metabolome and sensory panel data (209 for tomato and 244 for blueberry). Two contrasting modeling approaches, Bayes A and gradient-boosting machines, were chosen for further inference analysis. In Bayes A, the beta coefficients indicate the individual additive effect of that chemical free of interactions. This coefficient predicts if a chemical is important for enhancing the flavor attribute (positive value) or decreasing the flavor attribute (negative value). For gradient-boosting machines the variable importance represents the marginal effect of that chemical including the interaction effects with other chemicals. This value is scaled between 0 and 100 where 0 is a not an important predictor and 100 is an important predictor (Fig. 5).

Fig. 5.

Relative importance estimated with gradient-boost machine (x axis) and β coefficients estimated with Bayes A (y axis) of each metabolite in predicting ratings of flavor attributes in blueberry and tomato. Colors indicate the type of metabolite. TA, titratable acidity. For sweetness in tomato, we found glucose and fructose to be the most important sensory perception enhancers. The gradient boosting machines also estimated 1-penten-3-one and 2-phenylethanol to be important for perceived sweetness, while the Bayes A model highly ranked two volatiles (E-2-pentenal and 4-carene) to be important for sweetness enhancement. E-2-pentenal was also found to be an important contributor to overall flavor intensity and umami (). In blueberry, components important for liking included soluble solids, fructose, and glucose. Additionally, volatiles found to be important for enhancing liking included 2-undecanone, 2-hexenyl-butyrate, and ethyl propionate, while volatiles that were negative to liking included eucalyptol and phenylacetaldehyde. Interestingly, two lipid-derived volatiles (2-hexen-1-al and 2-pentenal) had a high positive contribution and the highest negative contributions to sourness in blueberry, respectively. Glutamic acid was highly ranked by both methods as influencing umami perception for tomato, which by definition represents the taste of the amino acid l-glutamate. Three phenylalanine-derived compounds (benzyl cyanide, 2-phenyl ethanol, and 1-nitro-2-phenylethane) were also highly ranked by gradient-boosting machines as umami influencers (). It is important to note, however, that this targeted metabolomic panel is enriched for putative sugar-enhancing compounds and may be limited in the characterization of compounds affecting umami.

Discussion

Fruit flavor is a complex trait at the intersection between the fruit biochemistry and the consumer sensory perception. Quantification of sensory perception using consumer flavor panels is time- and resource-consuming and not readily amenable to a high-throughput assay, which has hindered plant breeders from selecting for fruit flavor for many years. This has contributed to the widespread decline of consumer satisfaction of many commercial fruit varieties (2, 7). Recently, different high-throughput phenotyping applications were proposed to use two-dimensional visible light imaging as proxies for plant biomass (28), reflectance ratios as proxies for yield (29), hyperspectral reflectance as proxies for leaf chlorophyll and nitrogen content (30), and canopy temperature as proxies for drought response (31). Here, in order to create higher-throughput flavor phenotyping methods, we applied statistical and machine learning models that can predict consumer sensory panel ratings from the chemistry of a fruit.

Chemical Profiling of a Fruit.

Although flavor is a complex trait, relatively simple metrics have historically been used to quantify flavor preferences in most breeding programs, including titratable acidity, soluble solids, firmness, and the breeder “bite tests” (32, 33). Using two independent fruit species we showed that volatiles play an important role in consumer flavor perception and should therefore be broadly assayed when selecting for enhanced flavor profiles. In this case, a metabolomic approach will achieve a higher selection accuracy by identifying metabolites with small but nonnegligible effects. In recent years, the cost of targeted and untargeted metabolomics has decreased (34) and the throughput for metabolite profiling has increased. The largest cost in this system is labor to process the fruit and to analyze the data. For the profiling described here, we estimate an in-house cost per sample similar to the per-sample genotyping costs used for genomic selection in many species, which in turn can be an order of magnitude cheaper than sensory evaluations. While these estimates assume an in-house analysis and do not consider the capital expenditure to acquire the instrumentation, it highlights the per-sample cost reduction over the years and the feasibility of its high-throughput application in plant breeding programs. Network analysis of tomato flavor compounds demonstrated correlations among biochemicals in the same biosynthetic pathways (Fig. 1). The associations are consistent with the postulated biochemical groupings identified by Buttery and Ling (35), Baldwin et al. (36), and Mathieu et al. (37). The chemistry of blueberry flavor has not been as extensively studied as that of tomato. Our results offer insights into the biochemical pathways for blueberry volatile synthesis. For instance, the long chain lipid–derived volatiles (denoted in Fig. 2 as numbers 34, 35, 36, 37, and 38) are found linked together within the lipid-derived volatile cluster. Also, linalool levels are not correlated with levels of other terpenes, suggesting that linalool biosynthesis may not be regulated in the same manner as other terpenes.

Applications of Metabolomic Selection to Breeding of Fruits and Vegetables.

One alternative to phenotype fruits and vegetables for flavor quality is the establishment of consumer sensory panels. This approach has low throughput, from a breeding standpoint, because a sensory panel can usually only taste a limited number of samples (4–6) per day. The tomato and blueberry breeding programs at the University of Florida have been using consumer sensory panels to guide breeding decisions for several years (5, 10). To do this, selections currently in development by the breeding programs are subjected to biochemical analysis and simultaneous consumer evaluations each year. However, due to the low throughput of the assay, sensory characterization is typically performed in the final stages of selection prior to release, when favorable alleles may no longer be segregating in the population. The use of metabolomic profiling as a phenotyping assay can enable accurate characterization of flavor profiles in earlier stages of a breeding program, when more genetic variability is available for selection (Fig. 6). Metabolic profiling at earlier stages of the breeding program opens up the possibility of identification of superior flavor genotypes that may otherwise have been discarded. Chemical profiling of fruits can capture the genetic potential of the variety as well as environmental variability. To reduce this variability and generate a phenotype that better represents the genetic potential of the individual, the breeder can characterize replicates from different environments and/or harvests. Replicates within plots in a single experiment, within harvest dates within a season, or even within environments can be pooled prior to running the instrument, maintaining the per-sample cost and resulting in an average prediction of the genotypic effect. Furthermore, in situations where fruit quality is under the influence of genotype-by-environment interactions, the breeder may choose to estimate fruit quality stability by profiling the chemical composition in each environment. While this additional analysis would increase the per-genotype cost of the analysis, the information would facilitate selection of stable genotypes that perform well across multiple environments.

Fig. 6.

Schematic representation of how the use of metabolomic selection could be applied in earlier stages of a breeding program, when compared to sensory panels.

Schematic representation of how the use of metabolomic selection could be applied in earlier stages of a breeding program, when compared to sensory panels. Moreover, the use of metabolomic selection to estimate flavor perception complements a molecular breeding program. A by-product of applying metabolomic selection is the metabolomic profiling of many breeding lines, which in turn enables QTL mapping or genome-wide association study (GWAS) against metabolomic datasets. Thus, flavor-related metabolites identified by metabolomic selection could then be further used in GWAS analysis to identify the genes/loci contributing and create markers for molecular breeding (5, 22). This two-step approach can enable the use of MAS at the earliest stages of a breeding program and thereby speed up the genetic enrichment for flavor associated traits (Fig. 6, step 2). This approach is especially useful for fruit crops where there is much less available information on markers affecting flavor chemical composition. It is important to note that the chemical composition of a fruit can be highly affected by weather and agronomic practices. Like other quantitative phenotypes currently evaluated in breeding programs, flavor-related traits have large variability and low heritability and are subjected to complex interaction effects (38–40). With the availability of data from multiple environments, the MAS or genomic selection application can also be tailored to selecting early for stability of important metabolic classes. However, MAS alone cannot address the complexity of flavor perception. Hence, the value of metabolomic selection is derived from including all metabolites in the prediction models, even those with small effects, leading to better overall estimates of flavor perception. Thus, the most practical application of metabolomic selection is in the middle stages of a breeding program where genes involved in the biosynthesis of inferred volatiles still retain enough genetic variability to select flavorful cultivars (Fig. 6, steps 3 and 4). Finally, consumer sensory analyses can be restricted to the final stages, in which a few target genotypes will be subject to consumer evaluations prior to release (Fig. 6, step 5).

Machine Learning Models Can Accurately Predict Flavor Attributes.

The use of metabolomics to predict flavor attributes has important implications not only in plant breeding but also in food science and genetics research. Prediction using metabolomics is challenging due to the correlated nature of the metabolomic predictors since it requires a large number of sensory panels for model calibration. Flavor prediction has been attempted before using linear regression models (41, 42), random-forest models (39), and PLS regression (10), achieving variable levels of prediction accuracies. One of the objectives of our work was to evaluate a range of statistical and machine learning models to determine the best performers for metabolite-based phenotype prediction of flavor quality traits. Importantly, we wanted to access the predictive power of methods known to handle correlated features well and thus simultaneously predict the effect of all metabolites. Identification of the most accurate predictive models provides a simple way to improve phenotyping accuracy with the same available dataset. Here, machine learning models such as gradient-boosting machines and XGBoost were the most predictive across all the traits and in both species, whereas multiple linear regression and PLS methods were found to be the least predictive. Considering that PLS is still the standard in food science applications (43), these proposed predictive models show marked improvements with increases relative to PLS ranging from 3.3% for sweetness in blueberry to 44.6% for umami in tomato. Furthermore, the fact that the models worked well in two entirely different systems (blueberry and tomato; breeding population and diversity panel) supports the effectiveness of the models proposed. To better understand the factors affecting flavor perception in each fruit species, we grouped compounds based on their biochemical classification and estimated a proportion of the phenotypic variance associated with each group. Biochemical pathways that were represented by a small number of chemicals were also grouped to minimize the effect of sampling variance in the creation of distance (variance/covariance) matrices (44). As would be expected, sugars (glucose, fructose, and soluble solids) were important predictors of sweetness as well as overall liking in both crops, while acids explained a large portion of the sourness variance. By grouping volatiles by their biochemical pathway, we were able to estimate a proportion of the total variance jointly explained by the chemicals within each group. Phenylalanine- and lipid-derived volatiles explained a large fraction of flavor variance in tomato, while lipid-derived, esters, and carotenoid/terpenoid volatiles explained most of the blueberry variance for liking score. Interestingly, ester compounds were shown to be negatively selected in red-fruited tomato as compared to related green-fruited species (45, 46), which potentially explains the lack of contribution to liking score in tomato contrasting to blueberry. Although tomato is botanically a fruit, it is not used as such in most cuisines. Thus, the fruity esters that are so important for flavor in most fruits do not serve the same function in a tomato. The statistical models were used to infer which volatiles contribute to each flavor attribute (Fig. 5). Although many of these compounds have been shown to contribute to tomato liking and flavor intensity, our results show that several compounds including E-3-hexen-1-ol; (E,E)-2-,4-decadienal; and benzyl alcohol are important flavor components. Although benzyl alcohol and (E,E)-2-,4-decadienal were shown to contribute to flavor intensity, they did not contribute to overall liking when a simple regression or multivariate analysis was used (4, 5). Also, the contributions of methional and benzothiazole to sourness intensity are interesting, as these compounds have not typically been associated with sour flavor. Methional odor is described as malty or cooked potato-like, while benzothiazole odor is described as sulfurous or meaty. Multiple linear regression analysis of volatiles associated with sweetness identified three that contributed to sweetness independently of sugars, but the relative contribution of these volatiles was not determined (4). By grouping volatiles by biochemical pathway and using a linear mixed model, the important role of volatiles in sweetness perception was highlighted. These results also emphasize the need to include volatile detection in breeding programs because by focusing only on sugars and acids during breeding, part of the flavor profile may be lost (Fig. 3). On the other hand, the results suggest that the magnitude of the effect of each individual volatile is much smaller than the individual effect of sugar compounds, highlighting the complexity of breeding for fruit flavor and the challenges to improve flavor using MAS. The important contribution of volatiles to overall liking of tomatoes is illustrated by the negative effects of extended refrigeration on volatile contents and consumer preferences (47). Refrigeration substantially reduces volatile contents but not sugars or acids (48).

Conclusions and Future Directions.

In this work we demonstrate the comparison of different algorithms to predict consumer preferences. This information can benefit plant breeding programs to improve flavor perception of new varieties. It is important to note that while we believe that the approach outlined here is generally useful, the specific chemical contributions to overall liking will likely vary with the ethnic and geographic makeup of the consumer panel. Future extensions of this approach could include the modeling of information and parameters for each individual in the panel, such as the inclusion of demographic parameters to predict more nuanced variations in taste preferences. In summary, by creating predictive models for consumer perceptions of flavor we are able to increase the throughput of flavor phenotyping and provide new tools to make more informed, flavorful selections in breeding programs. Through inference, candidate flavor enhancers and suppressors were identified, indicating the possibility of their use as natural food additives in the food industry. Furthermore, genes involved in biosynthesis of these flavor enhancing/suppressing metabolites can now be targets for marker assisted selection or direct engineering of more flavorful fruit varieties.

Materials and Methods

Prediction analysis was carried in two fruit species: tomato (S. lycopersicum) and blueberry (Vaccinium spp.). For tomato, 68 sugars, acids, and volatiles were analyzed in 147 genotypes grown and evaluated in multiple seasons. A total of 209 samples were used, with 160 samples having been previously evaluated (4, 5). For blueberry, firmness and 55 sugars, acids, and volatiles were analyzed. Firmness was only available for a small number of genotypes in blueberry, but it was kept in the model since it is an important component of blueberry quality (49). Sixty-three genotypes were grown and evaluated in multiple seasons for a total of 244 samples, of which 164 were evaluated previously (10). Fruit flavor of tomato and blueberry accessions was assessed by consumer sensory panels. Our sensory panels averaged ∼80 participants sampled from a diverse university population (Datasets S8 and S9) with the intention to represent for potential person-to-person variation in flavor preferences. This study was approved by the University of Florida Institutional Review Board 2 (case #2003-U-0491). All participants provided informed consent. Panels were conducted in the Food Science and Human Nutrition Department at the University of Florida in Gainesville, FL. Flavor attributes including sweetness, sourness, flavor intensity, and overall liking were rated. Additionally, sensory perceptions of umami were quantified solely for tomato. Overall liking was rated on a scale from −100 to 100, while the remaining attributes were rated from 0 to 100 (3). All data were normalized to a mean of 0 and a variance of 1 for further analyses. Missing data were imputed by the mean value per metabolite. Volatile concentrations were quantified by gas chromatography as described in ref. 23, while sugars, soluble solids, and acids were quantified as described in ref. 50. Sensory analysis was conducted as described in Tieman et al. (4), and scaled data can be found in Datasets S1 and S2. All blueberry data collection was described in Gilbert et al. (10). Network analysis was performed using the R package WGCNA (51). Briefly, the pairwise Pearson correlation coefficient between each pair of metabolites was used to construct a weighted metabolite coexpression network. The process assumed an unsigned network and the network was visualized and represented using Cytoscape 3.7.1 (52). The network for each species is provided as a Cytoscape file in the GitHub repository.

Calculating Contributions of Metabolites to Flavor Ratings.

To estimate the proportion of variance in flavor ratings that each metabolite group explains, we divided the metabolites identified in tomato and blueberry in six (Nonaromatic Amino Acid-derived, Carotenoid/Terpenes, Lipid derived, Phe-derived, Sugars, and Acids) and seven groups (Nonaromatic Amino Acid-derived, Carotenoid/Terpenes, Lipid-derived, Ester, Phe-derived, Sugars, and Acids), respectively. In tomato, for example, we fit a linear model in whichwhere is the averaged consumer rating for cultivar , is the fixed model intercept, is a normally distributed and independent random residual effect; are design matrices for random effects associated to each biochemical group, and are the random terms associated to the chemical groups. For each random term , we assumed , where represents the Gaussian kernel matrices built as the pairwise Euclidean distance between each chemical in a given group (MVN; multivariate normal distribution). The proportion of the variance explained by a given metabolomic group was determined by , where is the variance component estimated for a given metabolomic group and the denominator is the variance represented by the sum of the variance explained by all other chemical groups () and the residual term () . To further represent the contribution of sugar/acids versus volatiles, we also presented it separately by summing the variance components estimated within each group. All analyses were carried out using the ASReml-R package (53).

Comparing Genomic and Metabolomic Selection.

In order to compare prediction performance between genomic and metabolomic selection models, we organized a group of 70 tomato accessions that had whole genome sequencing, metabolomic evaluation, and sensory panel information. The genomic data comprised 26,262,280 SNPs, which were mapped to the S. lycopersicum reference genome SL3.0 as described in ref. 5. We applied additional quality filters and retained only biallelic SNPs with minor alleles frequencies ≥0.1, excluded markers mapped on the chromosome 0 (unassigned scaffolds), and considered no more than 30% missing data and 20% heterozygosity rate. Using the SNPRelate R package (54). we removed redundant SNPs by pruning markers defined as r2 ≥ 0.9 in a 100-kilobase genome window. After this step, we retained 79,821 SNPs used in the genomic prediction steps. Sensory traits were predicted using gBLUP models and compared to metabolomic predictions. The general model for genetic values is , where is the vector of observed values, is the fixed model intercept, and is a design matrix that relates the vector of random genetic effects. For genomic prediction, the random effect has null mean and a kernel covariance matrix () that represents the realized relationship among individuals computed as described by ref. 27. For metabolomic prediction, the kernel was defined in the Euclidean space as described by ref. 55. Residuals were defined as normally distributed and independent. As the number of accessions with metabolomic data are larger than the number of genotyped individuals, we considered the same number of individuals for the genomic and metabolomic prediction (70 individuals). To evaluate the prediction performance of each model, a 10-fold cross-validation was employed, described in the following section.

Cross-Validation.

To evaluate the predictive performance of each model, a 10-fold cross-validation was employed. In this way, the dataset was randomly split into 10 equal groups of varieties. For each of the ten iterations, nine groups of varieties were used to train the model (training set), and one unseen group of varieties was used as a “holdout” group to test the model (test set). During training, a secondary, nested 10-fold cross-validation was used to calibrate the tuning parameters of the machine learning models. The root-mean-squared error between predicted and observed flavor ratings in the secondary test set was minimized to obtain the optimal parameter values for the primary model. The trained models were then applied to the metabolite concentrations of the varieties within the primary test set and predicted flavor ratings were obtained. The correlation between predicted and observed flavor ratings is recorded. The average correlation of predicted and observed flavor ratings in the test set is referred to here as the accuracy of the model.

Statistical Models.

A diverse sample of 18 statistical and machine learning models representing a range of regression, regularization, genomic selection, decision tree, and neural network models were chosen for assessment. These include a linear model and PLS as our baseline models; regularization methods such as ridge regression, elastic net, and LASSO; kernel methods such as support vector machines, relevant vector machines, and reproducing kernel Hilbert space; neural network models such as a multilayer perceptron neural network and a Bayesian neural network; decision tree-based models such as random forest, gradient boosting machines, and XGBoost; and models frequently used in genomic selection such as Bayes A, Bayes B, and Bayes Cπ. Each model has its individual strengths, weaknesses, and assumptions. Here we assess which models are most useful for the application of flavor phenotyping by metabolomic selection. All models were implemented in R (56). Each model is described in more detail in . The Bayesian models were implemented in BGLR (57) and the machine learning models were implemented with caret (58) and a package specific to each model (Dataset S10).

41 in total

1. Prediction of total genetic value using genome-wide dense marker maps.

Authors: T H Meuwissen; B J Hayes; M E Goddard
Journal: Genetics Date: 2001-04 Impact factor: 4.562

Review 2. Metabolomics-assisted breeding: a viable option for crop improvement?

Authors: Alisdair R Fernie; Nicolas Schauer
Journal: Trends Genet Date: 2008-11-21 Impact factor: 11.639

3. A chemical genetic roadmap to improved tomato flavor.

Authors: Denise Tieman; Guangtao Zhu; Marcio F R Resende; Tao Lin; Cuong Nguyen; Dawn Bies; Jose Luis Rambla; Kristty Stephanie Ortiz Beltran; Mark Taylor; Bo Zhang; Hiroki Ikeda; Zhongyuan Liu; Josef Fisher; Itay Zemach; Antonio Monforte; Dani Zamir; Antonio Granell; Matias Kirst; Sanwen Huang; Harry Klee
Journal: Science Date: 2017-01-27 Impact factor: 47.728

Review 4. Better fruits and vegetables through sensory analysis.

Authors: Linda M Bartoshuk; Harry J Klee
Journal: Curr Biol Date: 2013-05-06 Impact factor: 10.834

Review 5. The genetics of fruit flavour preferences.

Authors: Harry J Klee; Denise M Tieman
Journal: Nat Rev Genet Date: 2018-06 Impact factor: 53.242

6. Aroma and quality of breads baked from old and modern wheat varieties and their prediction from genomic and flour-based metabolite profiles.

Authors: Friedrich Longin; Heiner Beck; Hermann Gütler; Wendelin Heilig; Michael Kleinert; Matthias Rapp; Norman Philipp; Alexander Erban; Dominik Brilhaus; Tabea Mettler-Altmann; Benjamin Stich
Journal: Food Res Int Date: 2019-12-09 Impact factor: 6.475

7. The chemical interactions underlying tomato flavor preferences.

Authors: Denise Tieman; Peter Bliss; Lauren M McIntyre; Adilia Blandon-Ubeda; Dawn Bies; Asli Z Odabasi; Gustavo R Rodríguez; Esther van der Knaap; Mark G Taylor; Charles Goulet; Melissa H Mageroy; Derek J Snyder; Thomas Colquhoun; Howard Moskowitz; David G Clark; Charles Sims; Linda Bartoshuk; Harry J Klee
Journal: Curr Biol Date: 2012-05-24 Impact factor: 10.834

8. Identification of a specific isoform of tomato lipoxygenase (TomloxC) involved in the generation of fatty acid-derived flavor compounds.

Authors: Guoping Chen; Rachel Hackett; David Walker; Andy Taylor; Zhefeng Lin; Donald Grierson
Journal: Plant Physiol Date: 2004-09-03 Impact factor: 8.340

9. Identification of loci affecting flavour volatile emissions in tomato fruits.

Authors: Denise M Tieman; Michelle Zeigler; Eric A Schmelz; Mark G Taylor; Peter Bliss; Matias Kirst; Harry J Klee
Journal: J Exp Bot Date: 2006-02-10 Impact factor: 6.992

10. Exploring Blueberry Aroma Complexity by Chromatographic and Direct-Injection Spectrometric Techniques.

Authors: Brian Farneti; Iuliia Khomenko; Marcella Grisenti; Matteo Ajelli; Emanuela Betta; Alberto Alarcon Algarra; Luca Cappellin; Eugenio Aprea; Flavia Gasperi; Franco Biasioli; Lara Giongo
Journal: Front Plant Sci Date: 2017-04-26 Impact factor: 5.753

3 in total

1. Metabolomic selection-based machine learning improves fruit taste prediction.

Authors: Alisdair R Fernie; Saleh Alseekh
Journal: Proc Natl Acad Sci U S A Date: 2022-03-01 Impact factor: 11.205

2. Changes of Sensory Quality, Flavor-Related Metabolites and Gene Expression in Peach Fruit Treated by Controlled Atmosphere (CA) under Cold Storage.

Authors: Hongru Liu; Hui He; Chenxia Liu; Chunfang Wang; Yongjin Qiao; Bo Zhang
Journal: Int J Mol Sci Date: 2022-06-27 Impact factor: 6.208

3. Flavor and Other Quality Traits of Tomato Cultivars Bred for Diverse Production Systems as Revealed in Organic Low-Input Management.

Authors: Cut Erika; Detlef Ulrich; Marcel Naumann; Inga Smit; Bernd Horneburg; Elke Pawelzik
Journal: Front Nutr Date: 2022-07-14

3 in total