Literature DB >> 33325641

Discriminating Dietary Responses by Combining Transcriptomics and Metabolomics Data in Nutrition Intervention Studies.

Kathryn J Burton-Pimentel¹, Grégory Pimentel¹, Maria Hughes^2,3,4, Charlotte Cjr Michielsen⁵, Attia Fatima^2,4, Nathalie Vionnet⁶, Lydia A Afman⁵, Helen M Roche^2,3,4,7, Lorraine Brennan⁸, Mark Ibberson^9,10, Guy Vergères¹.

Abstract

SCOPE: Combining different "omics" data types in a single, integrated analysis may better characterize the effects of diet on human health. METHODS AND
RESULTS: The performance of two data integration tools, similarity network fusion tool (SNFtool) and Data Integration Analysis for Biomarker discovery using Latent variable approaches for "Omics" (DIABLO; MixOmics), in discriminating responses to diet and metabolic phenotypes is investigated by combining transcriptomics and metabolomics datasets from three human intervention studies: a postprandial crossover study testing dairy foods (n = 7; study 1), a postprandial challenge study comparing obese and non-obese subjects (n = 13; study 2); and an 8-week parallel intervention study that assessed three diets with variable lipid content on fasting parameters (n = 39; study 3). In study 1, combining datasets using SNF or DIABLO significantly improve sample classification. For studies 2 and 3, the value of SNF integration depends on the dietary groups being compared, while DIABLO discriminates samples well but does not perform better than transcriptomic data alone.
CONCLUSION: The integration of associated "omics" datasets can help clarify the subtle signals observed in nutritional interventions. The performance of each integration tool is differently influenced by study design, size of the datasets, and sample size.

Entities: Chemical Disease Gene Mutation Species

Keywords: Data Integration Analysis for Biomarker discovery using Latent variable approaches for “Omics”; Similarity Network Fusion tool; classification; data integration; nutritional intervention

Year: 2021 PMID： 33325641 PMCID： PMC8221028 DOI： 10.1002/mnfr.202000647

Source DB: PubMed Journal: Mol Nutr Food Res ISSN： 1613-4125 Impact factor: 5.914

Introduction

Nutrigenomics approaches are increasingly applied in human nutritional sciences to comprehensively model the complex mechanisms that link diet to health. Large multi‐omics datasets have been generated in nutritional studies, including whole‐genome gene expression (transcriptome) and the profiling of small molecules detectable in biofluids (metabolome). To date, analysis of data is usually performed separately for each “omics” layer.[ , ] However, the combination of “omics” datasets, may be key to understand how the system functions as a whole as each dataset contributes to a larger, common biological system.[ , , ] This integration may identify related changes in gene expression and metabolite flux that would be difficult to detect when analyzed independently. Indeed, integrating transcriptomic and metabolomic data from human blood samples has already revealed novel insights into the molecular mechanisms of clinical traits underlying normal physiology and disease by highlighting the cross‐talk between biological layers at the pathway level.[ ] In human nutrition intervention studies, data integration techniques could also advance understanding of the metabolic changes attributable to diet or food compounds. Such signals of metabolic change are often subtle and difficult to elucidate against a complex background of biological and environmental variation. Several studies have successfully used data integration tools to identify groups of biomarkers that are associated with diet‐related outcomes, for example the modulation of insulin sensitivity induced by caloric restriction intervention[ ] or change in body weight.[ ] However, data integration of nutritional “omics” datasets remains underexploited. Different methods have been developed for integrating “omics” layers. This study evaluates two recently developed methods for integrating “omics” data that can be used to group samples and identify biomarkers, themes that are relevant to nutrition and disease‐focused studies. The first method is the similarity network fusion tool (SNF),[ ] a correlation‐based, unsupervised tool that previously performed well in detecting subtle signatures in datasets compared to alternative unsupervised methods.[ ] SNF uses correlation matrices, generated separately for the related “omics” datasets, to create a “fused”, integrated network that models the relationship between individual samples. The second method is the Data Integration Analysis for Biomarker discovery using Latent variable approaches for “Omics” studies (DIABLO),[ , ] a supervised, multivariate method (component‐based) that maximizes the discrimination between predefined sample groups while associating each pair of “omics” datasets.[ , , , ] Therefore DIABLO can identify a set of correlated variables (such as genes or metabolites) that could discriminate different responses to diet. This study evaluates the performance of the SNF and DIABLO methods for the integration of nutritional “omics” data in three independent nutrition studies. The studies represent some of the common types of approaches that are used in nutritional studies, including crossover and parallel design, postprandial tests (to assess the immediate response to food intake), and fasting measurements (to assess the longer‐term consequences of diet on metabolism).

Experimental Section

Study Designs

SNF and DIABLO data integration methods were applied in an exploratory analysis of data obtained from three clinical human studies in which the effects of diet on transcriptomics and metabolomics or lipidomic signatures were evaluated in blood samples (Figure ). Study 1 (the F3 study), a randomized controlled crossover design, assessed the dynamic postprandial effects of two dairy products (800 g yogurt or acidified milk) on the whole blood transcriptome (Illumina RNAseq) and serum metabolome (untargeted UHPLC/Q‐TOF‐MS analysis) of seven healthy young men.[ , , ] Study 2 (the MECHE study) also used postprandial testing (response at 4h post‐challenge) to evaluate the transcriptome (human whole‐genome GeneChip microarray, in PBMCs) and serum lipid profile (Orbitrap LC‐MS, identification of a selection of lipids) of thirteen men with different metabolic phenotypes (obese/non‐obese) to an oral lipid tolerance test (OLTT).[ , , , ] Study 3 (the MARIS study) used a randomized controlled parallel study design to evaluate an 8 week dietary intervention, testing three dietary patterns: a Western‐type diet high in saturated fatty acids (SFA diet), a Western‐type diet high in MUFA from olive oil (MUFA diet), and a Mediterranean‐type diet (MED diet) with equivalent MUFA amounts to the MUFA diet.[ ] The transcriptome (human whole‐genome GeneChip microarray, in PBMCs) and serum metabolome (NMR, identification of a selection of metabolites from various classes) were assessed in 39 overweight or obese men and women using fasting samples collected after a 2 week run‐in period (SFA diet) and after the dietary interventions.[ , ] All three studies included in this analysis were completed in accordance with the ethical standards of the responsible committee on human experimentation and with the guidelines laid down in the 1975 Helsinki Declaration, as revised in 1983. The studies are registered at ClinicalTrials.gov (study 1: NCT02230345, study 2: NCT01172951, and study 3: NCT00405197). A full description of study designs is given in Sections S1‐S3, Supporting Information.

Figure 1

Overview of study designs used for data integration. A) Study 1 used a randomized, controlled, crossover study design to test the postprandial and short‐term effects of acidified milk and probiotic yogurt.[ , On dairy test days (D1 and D2) blood samples were collected fasting and postprandially for transcriptome and untargeted metabolome profiling (n = 7 subjects). Dietary restriction applied 3 days before the postprandial tests and diet was controlled throughout the study. B) Study 2 evaluated the response of obese (n = 5) and non‐obese (n = 8) participants to a standard metabolic challenge comprised of a lipid overload that was completed after an overnight fast. The postprandial response to the challenge was evaluated by blood sampling over 5 h for lipid profiling and transcriptome analysis.[ ] C) Study 3 used a parallel, controlled study design to evaluate the effect of different dietary patterns on the fasting metabolic profile and transcriptome. After a 2 week run‐in phase of SFA diet, participants were randomly assigned to test for 8 weeks either a MED diet (n = 17), SFA diet (n = 16), or MUFA diet (n = 14).[ ] D1/D2, dairy test; MED, Mediterranean; SFA, saturated fatty acid.

Preparation of Data

Each dataset was preprocessed to filter artifacts and low‐level signal according to the original study protocols (see Sections S1‐S3, Supporting Information). Additional filtering of the data with baseline measures deducted was applied to refine data integration performance [ ] ( Section S4, Supporting Information). Total filtered features retained for data integration analysis were: study 1, n = 5043 genes, n = 1100 metabolites (untargeted, not identified); study 2, n = 1040 genes, n = 42 lipids (identified and quantified); study 3, n = 1012 genes, n = 129 metabolites (identified and quantified). The datasets were then processed according to a common analysis pipeline (Figure S1, Supporting Information) in the R environment (R v 3.5.3; R Foundation for Statistical Computing, Vienna, Austria).

SNF Analysis

The SNF analysis protocol was based on the methods described by Wang et al.[ ] using the R package “SNFtool” (v 2.3.0) (Section S4, Supporting Information). SNF first calculates dissimilarity distances between samples for each separate dataset to create affinity/similarity matrices that can be interpreted as networks. Each individual “network” is then iteratively modified by SNF, finally converging on a unique, integrated network. Transcriptome, metabolome/lipidome and integrated SNF networks were visualized by extracting the strongest connections between the samples. SNF integrated models were validated using bootstrapping tests that compared the model classification to models using randomized data (p < 0.05) (see Section S4, Supporting Information). Classification performance of the models was evaluated by the classification error rate (CER) (ranging from 0 to 1), which was estimated by an internal (M‐fold) cross‐validation analogous to that of the mixOmics “perf” function. CERs for the integrated and non‐integrated models were compared using linear mixed‐effects models. Post hoc pairwise comparisons were estimated, where indicated (p < 0.05), using marginal means (emmeans)[ ] (significant effects considered where p adj < 0.05).

PLS‐DA and DIABLO Analyses

For each study, separate partial least squares discriminant analysis (PLS‐DA) models for transcriptomics and metabolomics datasets were created using mixOmics (v 6.6.2).[ ] Classification performance of the models was evaluated by the CERs using the same cross‐validation test as described for SNF. Goodness‐of‐fit and predictability were verified with R2 and Q2 parameters. The integration of the two “omics” datasets using DIABLO applied the workflow proposed by Rohart et al.[ ] and Singh et al.[ ] using mixOmics. For each study, a DIABLO model was built with settings to maximize the separation between treatment groups while accounting for correlation between the “omics” datasets. Validity of the DIABLO models was assessed by a permutation test that compared the CERs of the original DIABLO models to CERs of models built with randomly permuted samples (significance where p < 0.05). Relative “weights” of each dataset in the final integrated model were compared to evaluate the relative importance of each dataset.[ ] Finally, CERs of DIABLO models were compared to those of PLS‐DA models using the same method as defined for SNF to evaluate whether the integration of the two “omics” datasets improved classification performance. A multilevel decomposition was applied to PLS‐DA and DIABLO analyses for study 1 to account for the crossover design.[ , ] A detailed description of the PLS‐DA and DIABLO workflows is given in Section S4, Supporting Information and model settings are presented in Table S1, Supporting Information.

Comparison of SNF and DIABLO

For each study, the CERs of the SNF and DIABLO models could be directly compared using paired, two sample t‐tests (p < 0.05), as identical test and training sets were used in M‐fold cross‐validations. When both methods showed good performance (validated CER < 0.05), further evaluation of the features (genes and metabolites/lipids) selected by the models was carried out including network and enrichment analyses. The top 5% most important features for each integrated models were extracted and the strongest connections between selected features (ρ < −0.9 and ρ > 0.9, Spearman's correlation) were represented in networks. Over‐representation analysis (ORA) using Fisher's exact test was used to assess the functional relevance of the top 5% most important genes selected by integrated models. Full details are found in Section S4, Supporting Information.

Results

SNF and DIABLO Performed Well to Discriminate the Two Postprandial Dairy Tests (Study 1)

Study 1 sought to characterize the postprandial response to acidified milk and yogurt. The SNF models generated for the transcriptome or metabolome data alone (respectively shown in the form of affinity matrices in Figure ) did not completely distinguish the responses to the dairy products, despite the observation of some grouping of samples especially for the metabolome. However, the integration of the datasets using the SNF tool revealed two distinct clusters in the integrated SNF affinity matrix, comprised only of samples representing the postprandial response to either acidified milk or yogurt (Figure 2C). The grouping of the samples in the SNF model were statistically validated by bootstrapping permutation tests (p = 0.0002). In line with the strength of the SNF model in separating the samples, the model performed well in prediction tests, with a statistically significantly lower CER than either model created with the separate datasets (Table ).

Figure 2

Table 1

Classification error rates (CER) for SNF models (separate and integrated models) and for PLS‐DA and DIABLO models presented for all studies.CER is validated by M‐fold cross‐validation tests (respectively, 7‐, 5‐, and 10‐fold for studies 1, 2, and 3)

	Study 1 CER ± SEM	Study 2 CER ± SEM	Study 3 CER ± SEM
	Milk versus yogurt	Non‐obese versus obese	All diets	SFA diet versus MED diet	SFA diet versus MUFA diet	MUFA diet versus MED diet
SNF analysis
Metabolome/lipidome model	0.21^a ± 0.004	0.15^a ± 0.003	0.41^a ± 0.005	0.13^b ± 0.003	0.29^a ± 0.006	0.30^a ± 0.006
Transcriptome model	0.14^b ± 0	0.03^c ± 0.006	0.38^b ± 0.004	0.19^a ± 0.005	0.21^b ± 0.004	0.33^b ± 0.004
SNF integrated model	0.02^c ± 0	0.08^b ± 0.002	0.30^c ± 0.005	0.08^c ± 0.002	0.23^b ± 0.005	0.32^b ± 0.004
DIABLO analysis
Metabolome/lipidome (PLS‐DA)	0.14^a ± 0 ^#	0.13^a ± 0.007 ^#	0.27^a ± 0.003 ^#	0.11^a ± 0.004 ^#	0.18^a ± 0.002 ^#	0.19^a ± 0.004 ^#
Transcriptome (PLS‐DA)	0.14^a ± 0	0^b ± 0 ^#	0.06^c ± 0.002 ^#	0.07^b ± 0.003 ^#	0.01^b ± 0.003 ^#	0.07^b ± 0.005 ^#
DIABLO model	0.03^b ± 0.007 ^#	0^b ± 0 ^#	0.08^b ± 0.003 ^#	0.06^b ± 0.004 ^#	0.02^b ± 0.003 ^#	0.08^b ± 0.005 ^#

Different letters (a–c) indicate significant differences between CERs for comparisons between models for each study (as assessed by linear mixed‐effect models with post hoc pairwise comparisons, p adj < 0.05). #indicates a difference comparing equivalent models created by the SNF tool and DIABLO (as assessed by paired t‐test, p < 0.05).CER, classification error rate; DIABLO, data integration analysis for biomarker discovery using latent variable approaches for “omics” studies; MED, Mediterranean; PLS‐DA, partial least squares discriminant analysis; SFA, saturated fatty acid; SNF, similarity network fusion.

Visualization of models constructed with SNF for study 1. Affinity matrices for net iAUC show metabolite, gene, and SNF integrated models (respectively, A–C); SNF network showing the top 30% connections between samples using Cytoscape with edge weighted spring embedded layout in panel D. Samples are labeled by subject (S) number and intervention type milk (M, n = 7) or yogurt (Y, n = 7), with colors to indicate test meal: M (blue), Y (red). Diagonal of heatmaps shows median similarity across all samples. Network connections are colored according to whether the connection was identified in the top 30% connections for networks created with the metabolome (turquoise), transcriptome (purple), in both metabolome and transcriptome separate networks (black) or only with the datasets combined (SNF model) (yellow). iAUC, net incremental area under curve; SNF, similarity network fusion. Classification error rates (CER) for SNF models (separate and integrated models) and for PLS‐DA and DIABLO models presented for all studies.CER is validated by M‐fold cross‐validation tests (respectively, 7‐, 5‐, and 10‐fold for studies 1, 2, and 3) Different letters (a–c) indicate significant differences between CERs for comparisons between models for each study (as assessed by linear mixed‐effect models with post hoc pairwise comparisons, p adj < 0.05). #indicates a difference comparing equivalent models created by the SNF tool and DIABLO (as assessed by paired t‐test, p < 0.05).CER, classification error rate; DIABLO, data integration analysis for biomarker discovery using latent variable approaches for “omics” studies; MED, Mediterranean; PLS‐DA, partial least squares discriminant analysis; SFA, saturated fatty acid; SNF, similarity network fusion. A network representing the most important connections between samples in the SNF model shows spatial separation of milk and yogurt postprandial samples (Figure 2D). The network connections were generally specific to the metabolome or the transcriptome rather than both, while some connections were only found when both datasets are combined. The multivariate analyses of each datasets separately (PLS‐DA) and successfully discriminated the postprandial dairy responses, as shown by the score plots (Figure S2, Supporting Information; R2 and Q2 values in Table S2, Supporting Information). The integration using DIABLO resulted in a valid model (permutation test, p = 2.2×10–16, Figure S3, Supporting Information) and performed better in classifying the samples than when the datasets were analyzed separately (Table 1). A strong association was observed between the first components of the two data types (Pearson correlation, ρ = 0.86, p = 8.0×10–5) and correspondingly the relative contribution (“weights”) of the metabolome and transcriptome datasets in the integrated DIABLO model were not significantly different (Table S3, Supporting Information). The integration of study 1 datasets with DIABLO and SNF both resulted in models showing similar performance in the prediction of type of dairy consumed (CER < 0.05), although the CER was slightly better for SNF (Table 1). Comparison of the top 5% most important metabolites and genes selected by each integrated model revealed some similarities despite the different strategies used to construct the models. Of these features, 25% were present in both models, with more common genes (27%) than common metabolites (16%). The similarity between the models was greater when considering only the most correlated genes and metabolites (e.g., those with |ρ| > 0.90), as shown in the gene‐metabolite networks (Figure ); 41% of all selected features were found in both networks. Pathway analysis did not reveal a significant enrichment among the genes identified as discriminant for either model.

Figure 3

Network plots for the top 5% most important genes and metabolites for differentiating blood samples taken after milk intake or yogurt intake in study 1, selected by A) SNF and B) DIABLO. Connections between nodes (metabolites, green; genes, purple) are shown for the strongest associations (ρ < −0.90 or ρ > 0.90, Spearman's correlation) (SNF, n = 199 nodes; DIABLO, n = 209 nodes). Nodes present in both networks are highlighted by a black outline and a larger size (metabolites, n = 12; genes, n = 80). DIABLO, data integration analysis for biomarker discovery using latent variable approaches for “Omics” studies; PLS‐DA, partial least squares discriminant analysis; SNF, similarity network fusion.

Using Postprandial Transcriptome and Lipid Data to Separate Responses to an OLTT (Study 2)

The MECHE metabolic challenge study investigated the metabolic impact of an OLTT in the postprandial state, focusing on the differences in this response between obese and non‐obese individuals. The affinity matrix for the transcriptome alone grouped together the five OLTT responses of the obese subjects correctly while the lipidome misclassified one obese individual (Figure S4A,B, Supporting Information). Integration of transcriptome and lipidome datasets for MECHE using SNF resulted in a valid model (p = 0.003) that broadly grouped the samples into OLTT responses of obese and non‐obese subjects, with one misclassification of a response from an obese individual (BMI 34.4 kg m−2), which was grouped with the responses from non‐obese participants (Figure S4C, Supporting Information). The CER of the integrated model was thus a little lower than the separate lipidome model though higher than the transcriptome alone (Table 1). The network derived from the SNF model (Figure S4, Supporting Information) shows the BMI groups were well separated when only visualizing the strongest associations between samples, with the lipidome showing particular importance in defining connections within the non‐obese while associations within the obese group were captured by the integrated datasets. PLS‐DA analysis of the transcriptome dataset alone correctly classified all phenotypic responses to the OLTT in all M‐fold tests, while the lipidome PLS‐DA model showed significantly weaker predictive performance (Table 1). PLS‐DA performance parameters R2 and Q2 values are presented in Table S2, Supporting Information. The integrated DIABLO model also correctly predicted all phenotypic responses to the OLTT (permutation tests, p = 2.2×10–16, Figure S3, Supporting Information) (Table 1). Despite a strong association between the first components of the two data types (Pearson correlation, ρ = 0.77, p = 0.002), comparison of the relative “weights” of the datasets in the final DIABLO model showed that the transcriptome dataset was given significantly more importance than the lipidome dataset (Table S3, Supporting Information). The spatial separation of the samples and their groupings for each model are shown in the score plots (Figure S5, Supporting Information). For this study, the integration of datasets using DIABLO resulted in better classification of the phenotype groups, with lower CER for DIABLO than SNF (Table 1). Comparisons of the top features selected for each model were not completed due to the high CER obtained for the SNF model, which implies that extracted features based on this model would not characterize the groups well.

Data Integration for the Long‐Term Dietary Intervention (Study 3)

MARIS was an 8 week dietary fat and dietary quality modification intervention study. When the three dietary interventions were included in the SNF analysis, regardless of whether metabolome, transcriptome, or integrated datasets were used, affinity matrices (Figures S6A, S7A, and S8A, Supporting Information) and CERs (Table 1) did not show good classification of samples. Conversely, when only two diets were included per analysis, sample clustering was improved for the metabolome and transcriptome data (Figures S6B–D and S7B–D, Supporting Information). Moreover, the integration of the datasets using SNF enabled clustering that separated the SFA diet group from that of the MED diet group (p = 0.0005). However, other comparisons using SNF integrated datasets were not significant (MUFA vs MED: p = 0.07; MUFA vs SFA, p = 0.06) (Figure S8B–D, Supporting Information). The comparison between MED and SFA diets also showed the best spatial separation in the sample networks (Figure S9, Supporting Information), with only one sample that was misplaced in both groups. The metabolome was important in defining the similarities within each group (Figure S9A, Supporting Information, turquoise connections). PLS‐DA analyses of study 3 showed that the transcriptome performed significantly better in classifying the samples than the metabolome for all dietary comparisons (Table 1; score plots available in Figure S10, Supporting Information, R2 and Q2 values in Table S2, Supporting Information). Furthermore, the DIABLO integration of the datasets did not significantly improve the CER compared to the transcriptome alone for any of the comparisons although the integrated models were validated (p < 0.0001 for all permuted tests, Figure S3C–F, Supporting Information). The relative importance of the transcriptomics dataset was also shown by the significantly greater weight of the transcriptome dataset in the DIABLO model as compared to the metabolome dataset (Table S3, Supporting Information). In addition, the “omics” datasets were not strongly associated except for the SFA v. MED model (Pearson correlations: all diets ρ = 0.50, p = 0.001, MUFA vs SFA ρ = 0.56, p = 0.001, MUFA vs MED ρ = 0.58, p = 0.002, SFA vs MED ρ = 0.81, p < 0.0001), confirming the different contribution of each dataset to the final model. The lowest CER was observed for the SFA versus MUFA comparison using the transcriptome PLS‐DA model (Table 1). For study 3, the integration of datasets using DIABLO resulted in lower CER than the integration using SNF, for all comparisons of the three diets (Table 1). As for study 2, due to the high CERs obtained for the SNF models, comparisons of the top features selected for each model were not completed.

Discussion

In our data integration analyses of three different nutritional studies, we show that the integration of related “omics” datasets, by DIABLO or SNF, can help to differentiate responses to diet. The inherent differences of using a supervised (e.g., DIABLO) or unsupervised (e.g., SNF) data integration approach have implications for their utility for nutritional studies. As expected, DIABLO performed well in extracting common discriminating signals of diet or metabolic phenotype, even where the dietary effects were small. Conversely, SNF performed well in classifying samples where the dietary effects were more marked but also enabled the detection of outliers or novel groups.

DIABLO could Discriminate Sample Groups for all Three Study Designs

DIABLO applies a powerful supervised method that discriminates predefined groups and thus was expected to be relevant for nutritional datasets as the effects of diet can be subtle. One feature of DIABLO is its ability to handle “omics” datasets that are “unbalanced” (i.e., one dataset carries more discriminatory information than the other(s)). Specifically, the capacity to attribute a “weight” to datasets is critical in allowing DIABLO to combine datasets while maintaining the importance of the discriminatory features. In both studies 2 and 3, the transcriptome was assigned a significantly greater “weight” in the models than the metabolome. Consequently, the integration of “omics” datasets using DIABLO was very similar to the separate analysis using the transcriptome only and did not further improve sample classification. The lower importance of the lipidome/metabolome in the DIABLO model for studies 2 and 3 could be explained by the reduced number of compounds. In contrast, the untargeted approach used in study 1 captured the broad spectrum of metabolites that could respond to diet similarly to the approach used for the transcriptome. In the current analyses, DIABLO models were built using a “full weighted design matrix,”[ ] that maximizes the separation between samples groups while taking into account the correlation between “omics” datasets (design matrix parameter set to 0.1). However, by using a “full design matrix” instead (design matrix parameter set to 1), the model would maximize the correlation between “omics” datasets, prioritizing the association between features of the metabolome with features of the transcriptome. Thus, although the integration of data with DIABLO did not improve sample classification in study 2 and 3 compared to transcriptome data alone, DIABLO remains useful in offering a method to associate features of the metabolome and transcriptome that similarly discriminate diet.

SNF Showed Potential to Reveal Novel Sample Groups

In contrast to DIABLO, SNF models are unsupervised and thus do not use any a priori information to separate sample groupings. For this reason, the use of a classification measure (i.e., CER) as a performance criterion to compare the two methods might be expected to favor DIABLO. Nevertheless, SNF performed slightly better than DIABLO in study 1. Interestingly, while DIABLO gives an estimate of the overall contribution of each dataset (by the relative weights), the SNF networks provide a deeper insight into the importance of each dataset by specifying the dataset(s) used to build each connection between samples. The lack of connections defined by both individual datasets (e.g., black connections in Figure 2D), confirmed the value of integrating the datasets to model the two postprandial dairy responses. The SNF model for the postprandial metabolic challenge in study 2 failed to improve classification compared to the transcriptome alone. One explanation for the weaker performance for the SNF in this study is suggested in the characteristics of the obese outlier for the SNF model, who was the oldest participant in the study. This participant may have a metabolic response profile distinct from both non‐obese and younger obese participants; indeed the broad descriptor “obesity” may comprise multiple subtypes.[ , ] A potential advantage of the SNF tool for nutritional studies is the ability to separate samples by undefined factors (for example environmental or biological factors). This could lead to the identification of new metabolic subgroups, but it would require either a stronger signal or a larger cohort than used in study 2 to clarify the groupings. The results from study 3 suggest two important aspects of study design that can affect the performance of the SNF models: the number of dietary groups studied and the use of fasting blood samples to assess long‐term effects of diet. It was noteworthy that a significant improvement was observed for the integrated SFA versus MED SNF model, whereas the evaluation of all three diets together suggested no clear discrimination. The nutritional content of the SFA and MED diets were also the most differing of the three diets, testifying the tendency of SNF to identify the strongest signals in the data. The discrimination of more than two dietary intervention groups by SNF using fasting, long‐term data was challenging in this study. The effect of the diet might be more visible if the inter‐variation between participants was reduced (e.g., crossover study design). However, the study diet was well controlled (all food was provided to participants) and, the SFA and MED diet could be differentiated by SNF. Moreover, the strong performance of the DIABLO models for all comparisons of study 3 show the value of the tool in giving weight to the features that discriminate the intervention groups, even if the diets are relatively similar or if the long‐term, physiological effects of diet are being assessed.

Strengths and Limitations

We assessed two different data integration tools that hold promise for nutritional datasets using three independent, distinct dietary studies. The use of the same M‐fold validation test with identical samples per fold allowed a direct comparison of the CERs for the tools in each study although it is acknowledged that the methods were not designed with the same purpose. We chose to include a filtering step in the analysis pipeline to limit the noise in the data as advocated in the previous analysis of Tini et al.,[ ] and this was an essential step in the preparation of our data due to limited sample size in our studies. In larger nutritional datasets, models could be explored using filtered and unfiltered data to confirm the utility of the filtering step for clarifying the dietary signals. Although the results were validated internally by the M‐fold validation test, an external validation of the results would be useful to further confirm the robustness of these models. While the number of participants in our studies was a limitation of our work, the strength of the models in the internal validation tests despite this limitation underlines the potential of these tools for integrating dietary datasets. Another consideration in our study was the differences in the choice of “omic” platform and the resulting variation in number of features per dataset. These differences most likely explain why transcriptomic datasets carried most of the information when integrating the datasets in studies 2 and 3. Moreover, such differences, by influencing the performance of the integration tools, might limit the direct comparison of the data integration for different study designs. However, given that these studies reflected the variation of techniques used across existing nutritional studies it was considered relevant to apply data integration approaches to studies with different “omic” data types. It is also noteworthy that the choice of omics data as well as the omics platform selected could affect the success of data integration. While we evaluated commonly available blood “omics” datasets, depending on the research question, the use of other “omics” approaches (e.g., proteomics) may be considered more relevant. The potential to use SNF and DIABLO to support the biological interpretation of the combined “omics” signal was explored for study 1 in which both models performed well in classifying the samples. Different features were found to be important in defining the models, which may reflect the inherent differences in the methods, although very discriminatory features were found for both models. Discriminatory genes did not represent an enriched metabolic pathway in either model though the limited identification of the untargeted metabolic dataset prevented an integrated pathway analysis of the two datasets.

Conclusions and Perspectives

The application of data integration methods to combine related “omics” datasets may help to discriminate responses to different diets and identify related biological signals that are regulated by diet. SNF and DIABLO data integration methods seem to offer different advantages for the analysis of human nutrition intervention/challenge datasets. Generally, DIABLO performed well in our relatively small datasets to identify the features that can differentiate diets or metabolic phenotypes. However, given the complex responses of humans to diet, SNF may be relevant for the identification and the investigation of new metabolic phenotypes.

Conflict of Interest

The authors declare no conflict of interest.

Author Contributions

K.J.B.‐P., G.P., M.H., C.C.J.R.M., A.F., N.V., F.P.P., L.A.A., H.M.R., M.I., and G.V., designed the research (project initiation, project conception, development of overall research plan, and study oversight); K.J.B.‐P., G.P., M.H., C.C.J.R.M., and N.V., conducted the research (hands‐on conduct of the experiments and data collection); K.J.B.‐P., G.P., M.H., and C.C.J.R.M., analyzed data or performed statistical analysis; K.J.B.‐P., G.P., M.H., C.C.J.R.M., H.M.R., and G.V., wrote the paper, K.J.B.‐P., G.P., M.H., C.C.J.R.M., N.V., L.A.A., H.M.R., L.B., M.I., and G.V., critically reviewed the manuscript, K.J.B.‐P had primary responsibility for final content. All authors read and approved the final manuscript. Supporting information Click here for additional data file.

30 in total

Review 1. Nutrigenomics: exploiting systems biology in the nutrition and health arena.

Authors: Ben van Ommen; Rob Stierum
Journal: Curr Opin Biotechnol Date: 2002-10 Impact factor: 9.740

2. Weighted Gene Co-Expression Network Analysis Identifies Gender Specific Modules and Hub Genes Related to Metabolism and Inflammation in Response to an Acute Lipid Challenge.

Authors: Attia Fatima; Ruth M Connaughton; Anna Weiser; Aoife M Murphy; Colm O'Grada; Miriam Ryan; Lorraine Brennan; Peadar O'Gaora; Helen M Roche
Journal: Mol Nutr Food Res Date: 2017-12-11 Impact factor: 5.914

3. Probiotic yogurt and acidified milk similarly reduce postprandial inflammation and both alter the gut microbiota of healthy, young men.

Authors: Kathryn J Burton; Marta Rosikiewicz; Grégory Pimentel; Ueli Bütikofer; Ueli von Ah; Marie-Jeanne Voirol; Antony Croxatto; Sébastien Aeby; Jocelyne Drai; Philip G McTernan; Gilbert Greub; François P Pralong; Guy Vergères; Nathalie Vionnet
Journal: Br J Nutr Date: 2017-05-31 Impact factor: 3.718

4. Mixomics analysis of breast cancer: Long non-coding RNA linc01561 acts as ceRNA involved in the progression of breast cancer.

Authors: Rui Jiang; Chunming Zhao; Binbin Gao; Jiawen Xu; Wei Song; Peng Shi
Journal: Int J Biochem Cell Biol Date: 2018-06-08 Impact factor: 5.085

5. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays.

Authors: Amrit Singh; Casey P Shannon; Benoît Gautier; Florian Rohart; Michaël Vacher; Scott J Tebbutt; Kim-Anh Lê Cao
Journal: Bioinformatics Date: 2019-09-01 Impact factor: 6.937

6. Consumption of a high monounsaturated fat diet reduces oxidative phosphorylation gene expression in peripheral blood mononuclear cells of abdominally overweight men and women.

Authors: Susan J van Dijk; Edith J M Feskens; Marieke B Bos; Lisette C P G M de Groot; Jeanne H M de Vries; Michael Müller; Lydia A Afman
Journal: J Nutr Date: 2012-05-23 Impact factor: 4.798

7. Effect of a high monounsaturated fatty acids diet and a Mediterranean diet on serum lipids and insulin sensitivity in adults with mild abdominal obesity.

Authors: M B Bos; J H M de Vries; E J M Feskens; S J van Dijk; D W M Hoelen; E Siebelink; R Heijligenberg; L C P G M de Groot
Journal: Nutr Metab Cardiovasc Dis Date: 2009-08-18 Impact factor: 4.222

8. Multi-omic signature of body weight change: results from a population-based cohort study.

Authors: Simone Wahl; Susanne Vogt; Ferdinand Stückler; Jan Krumsiek; Jörg Bartel; Tim Kacprowski; Katharina Schramm; Maren Carstensen; Wolfgang Rathmann; Michael Roden; Carolin Jourdan; Antti J Kangas; Pasi Soininen; Mika Ala-Korpela; Ute Nöthlings; Heiner Boeing; Fabian J Theis; Christa Meisinger; Melanie Waldenberger; Karsten Suhre; Georg Homuth; Christian Gieger; Gabi Kastenmüller; Thomas Illig; Jakob Linseisen; Annette Peters; Holger Prokisch; Christian Herder; Barbara Thorand; Harald Grallert
Journal: BMC Med Date: 2015-03-09 Impact factor: 8.775

9. mixOmics: An R package for 'omics feature selection and multiple data integration.

Authors: Florian Rohart; Benoît Gautier; Amrit Singh; Kim-Anh Lê Cao
Journal: PLoS Comput Biol Date: 2017-11-03 Impact factor: 4.475

10. Disentangling the Effects of Monounsaturated Fatty Acids from Other Components of a Mediterranean Diet on Serum Metabolite Profiles: A Randomized Fully Controlled Dietary Intervention in Healthy Subjects at Risk of the Metabolic Syndrome.

Authors: Charlotte C J R Michielsen; Roland W J Hangelbroek; Edith J M Feskens; Lydia A Afman
Journal: Mol Nutr Food Res Date: 2019-02-21 Impact factor: 5.914

1 in total

1. Discriminating Dietary Responses by Combining Transcriptomics and Metabolomics Data in Nutrition Intervention Studies.

Authors: Kathryn J Burton-Pimentel; Grégory Pimentel; Maria Hughes; Charlotte Cjr Michielsen; Attia Fatima; Nathalie Vionnet; Lydia A Afman; Helen M Roche; Lorraine Brennan; Mark Ibberson; Guy Vergères
Journal: Mol Nutr Food Res Date: 2021-01-29 Impact factor: 5.914

1 in total