Literature DB >> 29982629

Comparison of metabolite networks from four German population-based studies.

Khalid Iqbal¹, Stefan Dietrich^2,3, Clemens Wittenbecher^2,3, Jan Krumsiek^3,4, Tilman Kühn⁵, Maria Elena Lacruz⁶, Alexander Kluttig^3,6, Cornelia Prehn⁷, Jerzy Adamski^3,7,8, Martin von Bergen⁹, Rudolf Kaaks⁵, Matthias B Schulze^2,3, Heiner Boeing¹, Anna Floegel^1,10.

Abstract

Background: Metabolite networks are suggested to reflect biological pathways in health and disease. However, it is unknown whether such metabolite networks are reproducible across different populations. Therefore, the current study aimed to investigate similarity of metabolite networks in four German population-based studies.
Methods: One hundred serum metabolites were quantified in European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam (n = 2458), EPIC-Heidelberg (n = 812), KORA (Cooperative Health Research in the Augsburg Region) (n = 3029) and CARLA (Cardiovascular Disease, Living and Ageing in Halle) (n = 1427) with targeted metabolomics. In a cross-sectional analysis, Gaussian graphical models were used to construct similar networks of 100 edges each, based on partial correlations of these metabolites. The four metabolite networks of the top 100 edges were compared based on (i) common features, i.e. number of common edges, Pearson correlation (r) and hamming distance (h); and (ii) meta-analysis of the four networks.
Results: Among the four networks, 57 common edges and 66 common nodes (metabolites) were identified. Pairwise network comparisons showed moderate to high similarity (r = 63-0.96, h = 7-72), among the networks. Meta-analysis of the networks showed that, among the 100 edges and 89 nodes of the meta-analytic network, 57 edges and 66 metabolites were present in all the four networks, 58-76 edges and 75-89 nodes were present in at least three networks, and 63-84 edges and 76-87 edges were present in at least two networks. The meta-analytic network showed clear grouping of 10 sphingolipids, 8 lyso-phosphatidylcholines, 31 acyl-alkyl-phosphatidylcholines, 30 diacyl-phosphatidylcholines, 8 amino acids and 2 acylcarnitines. Conclusions: We found structural similarity in metabolite networks from four large studies. Using a meta-analytic network, as a new approach for combining metabolite data from different studies, closely related metabolites could be identified, for some of which the biological relationships in metabolic pathways have been previously described. They are candidates for further investigation to explore their potential role in biological processes.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29982629 PMCID： PMC6280930 DOI： 10.1093/ije/dyy119

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Metabolite networks constructed with Gaussian graphical models showed similar structures across four population-based studies. We suggest meta-analysis of metabolite networks as a novel approach to identifying biological pathways. The identified associations between metabolites in the meta-analytic network, particularly for phospholipids and amino acids, are candidates for further investigation to explore their role in health and disease.

Introduction

Metabolomic profiling is increasingly used to discover biomarkers that reflect early perturbations linked to disease risk or to objectively measure food intake and other environmental exposures. Thereby, many novel biomarkers have been identified that may improve assessment of various exposures or predict disease risk., One important step in the process of biomarker discovery is usually the replication of results in different study populations to reduce the chance of type one error. High-throughput metabolomics is often analysed using correlation-based networks to infer biological relationships in the data., This approach has been successfully applied in several single studies to identify novel metabolic pathways. However, little is known about whether these metabolite networks can be replicated across different populations. So, the question arises as to whether the correlation structure of the identified metabolites is similar across different studies that include study participants with different characteristics (e.g. age and lifestyle). This should be a prerequisite to replicating metabolomic results in different populations. Moreover, metabolic profiles from different studies are frequently assessed, but meta-analysis of metabolite networks has not been conducted in the metabolomics field. Partial-correlation-based network comparisons and meta-analysis of such networks can help to identify consistent relationships between metabolites, which may be further investigated for their potential role in biological processes. Probabilistic graphical models such as Gaussian graphical models (GGMs) are interesting methods proposed for analysis of metabolomics data. A GGM is an undirected graph that identifies independence between two variables conditional on all others and has been suggested as an effective tool to recover metabolic pathways from metabolite concentrations. This approach can be further used to combine metabolomics data of different studies by meta-analysing network edges (partial correlations between two metabolites adjusted for the other metabolites) and constructing a meta-analytic metabolite network to represent the association between metabolites and their underlying metabolic pathways. This meta-analytic network may identify metabolites that are linked in certain metabolic pathways. Against this background, the present study aimed to compare and meta-analyse the metabolite correlation networks to assess their stability and identify closely related metabolites in four large German population studies, including the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam, EPIC-Heidelberg, KORA (Cooperative Health Research in the Augsburg Region) and CARLA (Cardiovascular Disease, Living and Ageing in Halle).

Methods

This study was based on metabolomic measurements of participants from four German population-based studies (EPIC-Potsdam, EPIC-Heidelberg, KORA and CARLA). Ethical approval for all four studies was obtained from relevant ethical-approval committees. Written informed consent was obtained from all participants in the included studies.

Description of the study populations

EPIC-Potsdam and EPIC-Heidelberg comprise 27 548 and 25 540 study participants, respectively. Study design and methods in EPIC-Potsdam and EPIC-Heidelberg were similar and have been described in detail elsewhere., For measurements of serum metabolites, a random subcohort was established in 2006 in EPIC-Potsdam (n = 2483) and 2009 in EPIC-Heidelberg (n = 843). The KORA study is conducted in Southern Germany and included 3044 participants, who took part in the survey (KORA F4) from 2006 to 2008. The CARLA study included 1779 participants with baseline examinations between 2002 and 2006. Serum metabolites were assessed for 1427 participants. Details of KORA and CARLA were described elsewhere., After exclusion of participants with missing data on any covariate (n = 23) or metabolites (n = 48), 2458 participants in EPIC-Potsdam, 812 in EPIC-Heidelberg, 3029 in KORA and 1427 in CARLA were available for analysis.

Blood-sample collection and assessment of covariates

Blood samples from all participants were collected at baseline or follow-up (KORA) using standard protocols as described elsewhere., Age, sex, weight and height were collected at baseline in all studies. Body mass index (BMI) was estimated as: (weight in kilogrammes)/(height in metres).

Metabolomic profiling

Metabolites were quantified in all four populations in serum blood samples using the AbsoluteIDQTM p150 and p150 Kits (Biocrates Life Scienes AG, Innsbruck Austria) together with FIA- and LC-ESI-MS/MS (flow injection analysis/liquid chromatography-electrospray ionization-tandem mass spectrometry) as described in detail by Römisch-Margl et al. and Zukunft et al. The AbsoluteIDQTM p150 Kit was applied for samples of EPIC-Potsdam and CARLA, the AbsoluteIDQTM p180 Kit for samples of the KORA F4 study and the MetaDisIDQ™ Kit for samples of EPIC-Heidelberg. Metabolite measurements of EPIC-Potsdam, KORA and CARLA samples were performed in the Genome Analysis Center at the Helmholtz Zentrum München and for EPIC-Heidelberg in Leipzig. To ensure comparability, only those metabolites were included in the analysis, which were quantified by all three metabolite kits. In addition, metabolites below the limit of detection and those with very high analytic variance in any of the four studies were excluded, leaving 100 metabolites for the present analysis. The final metabolite set contained hexose (sum of six-carbon monosaccharides without distinction of isomers), 2 acylcarnitines (Cx:y; x = number of carbon atoms, y = number of double bonds), 10 sphingolipids, 12 amino acids, 35 acyl-alkyl-, 32 diacyl- and 8 lyso-phosphatidylcholines (PC)s (Supplementary Table 1, available as Supplementary data at IJE online).

Statistical analysis

Metabolite concentrations were log-transformed to approximate normality. Distributions of the metabolite concentrations were visually assessed using QQ-Plot and Histogram, which showed approximately normal distribution. However, long tails and potential outliers were detected for some metabolites. QQ-Plots for the top 20 metabolites from the study sample are shown in Supplementary Figures 1–5, available as Supplementary data at IJE online. Therefore, a non-parametric approach was also used to confirm that major findings do not vary by choice of method. Means, standard deviations and coefficients of variation (CV) were calculated for each metabolite in all studies adjusted for age, sex and BMI. For comparison of metabolic profiles, each individual study metabolite network was estimated using the GGM approach. In the first step, a partial-correlation matrix of the 100 metabolites was estimated for each study sample. In the second step, the top 100 highly correlated metabolite pairs (edges) were selected to construct networks in the respective samples. The minimum partial correlation was 0.19 in the network of Heidelberg, 0.25 in the networks of Potsdam, 0.24 in the network of KORA and 0.26 in the network of CARLA. We selected the first 100 edges with the highest correlation, so that the identified networks are similar but interpretable, and the correlations are high enough to have biological relevance, since only highly correlated metabolites have been suggested to be biologically related. Identified networks were exported to Cytoscape for visualization. The same analyses were repeated using Spearman’s rank partial correlation. First, Spearman’s rank correlation for all the metabolites was estimated and then the top 100 highly correlated metabolite pairs (edges) were selected from each sample to construct respective networks. To assess the similarity between network structures, correlations between each pair of networks were estimated using gcor function from R-package sna. For this purpose, the networks (edge lists) were converted into adjacency matrices, which in turn were used to estimate product–moment correlation. To estimate structural similarity between the four networks, hamming distance was determined using the same R-package. Hamming distance is the number of changes required to transform one network into another, e.g. if the hamming distance between two networks X and Y is 1, then one change (i.e. an addition or deletion of one edge) will result in an identical structure of the two (X and Y) networks. A lower hamming distance reflects a similarity in network structures. The hamming distance was estimated by transforming the networks into adjacency matrices. The adjacency matrices were then used to estimate the hamming distance with (code) hdist in sna R-package. For easier comparison, the numbers of common edges in each combination of the four metabolite networks were visualized using a Venn diagram, which was constructed using R-package VennDiagram. Commonality of the four networks was reflected by visualizing the common edges of the four networks estimated by both the Pearson and Spearman partial correlations. As metabolites in EPIC-Heidelberg were quantified in a different laboratory, though using a standardized approach, a second network of common edges was constructed for EPIC-Potsdam, CARLA and KORA only. For meta-analysis of the four networks, a random-effect meta-analysis of partial-correlation coefficients was conducted using all common edges. For meta-analysis, partial-correlation coefficients were transformed to fisher Z-scores and back-transformed after analysis. The correlation coefficients from meta-analysis were used to construct a meta-analytic metabolite network by selecting 100 highly correlated metabolite pairs, as was done for the individual networks. Due to high heterogeneity among studies, a combined metabolite network of all studies was constructed and common edges of all the metabolites observed in each study were visualized over the meta-analytic network in Cytoscape. Network analyses were adjusted for age, sex and BMI, which are related to metabolite differences.

Results

EPIC-Potsdam and EPIC-Heidelberg study populations were similar with respect to age, sex and BMI, whereas the study populations in KORA and CARLA were older and had a lower percentage of women and a higher BMI compared with the two EPIC studies (Table 1).

Table 1

Sample characteristics of the included studies

Characteristics^b	EPIC-Potsdam	EPIC-Heidelberg	KORA	CARLA
Characteristics^b	(n = 2458)	(n = 812)	(n = 3029)	(n = 1427)
Age (years)	50.3 (9.0)	50.7 (7.9)	56 (13.3)	63.3 (9.7)
Sex (women %)	61.2	54.9	51.5	44.9
BMI (kg/m²)	26.1 (4.3)	25.6 (4.2)	27.6 (4.8)	28.1 (4.5)

Shown are mean values and standard deviations.

Blood samples from EPIC-Potsdam, KORA and CARLA were analysed in the same laboratory. Samples from KORA were analysed using a different kit.

Sample characteristics of the included studies Shown are mean values and standard deviations. Blood samples from EPIC-Potsdam, KORA and CARLA were analysed in the same laboratory. Samples from KORA were analysed using a different kit. Considerable differences were found for metabolite concentrations (mean and CV) between the four studies (Supplementary Table 1, available as Supplementary data at IJE online). Overall, 29 metabolites in Potsdam, 50 metabolites in Heidelberg, 20 metabolites in KORA and 59 metabolites in CARLA showed high variation (≥ 30% CV) in concentration. The metabolite networks of the four studies are shown in Supplementary Figures 5–8, available as Supplementary data at IJE online. All networks identified clusters of sphingolipids, lyso-PCs, diacyl-PCs and acyl-alkyl-PCs, albeit with large variation in network topologies, i.e. connection between metabolites. Amino acids showed the highest variation in network connectivity, although with consistent clustering of tryptophan, tyrosine and phenylalanine in all networks. Hexoses (represented as a single metabolite) were connected with amino acids valine and tryptophan only in CARLA. Two acylcarnitines were connected as a pair in all the studies except in EPIC-Potsdam. The highest variation in metabolites topology was observed in the network of EPIC-Heidelberg as compared with other networks (Figure 1a and Table 2). Pairwise comparison of the networks showed the greatest similarity represented by the lowest hamming distance and the highest correlation between EPIC-Potsdam and KORA. EPIC-Heidelberg’s metabolite network was the most dissimilar from all other networks, as it showed a high hamming distance and lower correlation (Figure 1b).

Figure 1

(a) Edges overlap among four studies included in the study. Shown are the numbers of edges. (b) Pearson’s correlation and hamming distance between metabolite networks of the studies included in the study. The upper triangle shows the hamming distance and the lower triangle shows correlation among the networks. The lower values of the hamming distance show greater similarity whereas the lower value of correlation shows less similarity between the networks. Common edges of the serum metabolite network of the four studies: EPIC-Heidelberg, EPIC-Potsdam, CARLA and KORA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Continuous black lines represent positive and dashed lines represent inverse partial correlations. The thicknesses of the edges are proportional to the strength of the correlations. Nodes with different border colours represent different metabolite classes: black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines. Overlap of the common edges among the different combinations of the four studies is shown in Figure 1b. The highest overlap of the edges was observed between EPIC-Potsdam and CARLA. The metabolite network of EPIC-Heidelberg showed the smallest overlap of edges with the other networks. Overall, 66 edges were consistently detected in all four networks, interlinking 80 metabolites (Figure 2). The other 20 out of 100 metabolites were unconnected and are not shown. Lyso-PCs, diacyl-PCs and sphingolipids consistently grouped together across all studies (Supplementary Figures 1–4, available as Supplementary data at IJE online). Among the four networks, in EPIC-Potsdam (nodes = 91), CARLA (nodes = 95) and KORA (nodes = 96), a relatively large number of metabolites were integrated in the networks, whereas 20 metabolites (mainly amino acids and acyl-alkyl PCs) remained unconnected in EPIC-Heidelberg (Table 2).

Figure 2

Common edges of the serum metabolite network of the four studies: EPIC-Heidelberg, EPIC-Potsdam, CARLA and KORA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Continuous black lines represent positive and dashed lines represent inverse partial correlations. The thicknesses of the edges are proportional to the strength of the correlations. Nodes with different border colours represent different metabolite classes: black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines.

Table 2

Number of connected nodes (metabolites) in individual and combined metabolite networks in the four studies

Name of study	Metabolites^a (number)
	Hexoses	AC	AA	LysoPC	DiA-PC	AA-PC	SL	Total
	(1)	(2)	(12)	(8)	(32)	(35)	(10)	(100)
Heidelberg (H)	00	02	06	08	30	29	10	85
Potsdam (P)	01	02	09	08	30	32	10	92
CARLA (C)	00	02	06	08	30	33	10	89
KORA (K)	00	00	08	08	29	31	10	86
HP	00	02	05	08	30	27	10	82
HC	00	02	03	08	30	27	10	80
HK	00	00	03	08	29	26	10	76
PK	00	00	07	08	29	30	10	84
PC	00	02	06	08	30	31	10	87
CK	00	00	05	08	29	31	10	83
HPK	00	00	03	08	29	25	10	75
HPC	00	02	03	08	30	26	10	79
HCK	00	00	02	08	29	26	10	75
PCK	00	00	05	08	29	30	10	82
HPCK (Common network)	00	00	02	08	29	17	10	66
Meta-analytic network	00	02	08	08	30	31	10	89

AC, acylcarnitines; AA, amino acids; LysoPC, lyso-phosphatidylcholines, DiA-PC, diacyl-phosphatidylcholines; AA-PC, acyl-alkyl- phosphatidylcholines; SL, sphingolipids.

Number of connected nodes (metabolites) in individual and combined metabolite networks in the four studies AC, acylcarnitines; AA, amino acids; LysoPC, lyso-phosphatidylcholines, DiA-PC, diacyl-phosphatidylcholines; AA-PC, acyl-alkyl- phosphatidylcholines; SL, sphingolipids. A structural comparison of the four networks showed 57 common edges and 66 commonly connected nodes (Table 3), which are shown in a common network (Figure 2). The common network showed smaller clustering of similar classes of metabolites (Figure 2). Notably, sphingolipids, lyso-PCs and subgroups of acyl-alkyl-PCs and tryptophan, tyrosine and phenylalanine were clustered together. Due to differences between EPIC-Heidelberg and the other studies, we also constructed a common network of EPIC-Potsdam, CARLA and KORA, which showed higher similarity of the metabolite network structures among the three studies (Figure 3).

Table 3

Number of connected nodes (metabolites) and edges in individual and common metabolite networks in the four studies

Name of study	Metabolites^a (number)
	Hexoses	AC	AA	LysoPC	DiA-PC	AA-PC	SL	Total	No of Edges
	(1)	(2)	(12)	(8)	(32)	(35)	(10)	(100)	(100)
Pearson r-based networks
Heidelberg (H)	00	02	06	08	30	29	10	85	100
Potsdam (P)	00	00	08	08	29	31	10	86	100
CARLA (C)	01	02	09	08	30	32	10	92	100
KORA (K)	00	02	06	08	30	33	10	89	100
Common network	00	00	02	08	29	17	10	66	57
Spearman’s rank-based networks
Heidelberg (H)	00	02	07	08	30	29	10	86	100
Potsdam (P)	00	00	08	08	30	31	10	87	100
CARLA (C)	01	02	08	08	30	32	10	91	100
KORA (K)	00	02	05	08	30	32	10	87	100
Common network	00	00	00	07	26	20	10	65	56

AC, acylcarnitines; AA, amino acids; LysoPC, lyso-phosphatidylcholines, DiA-PC, diacyl-phosphatidylcholines; AA-PC, acyl-alkyl- phosphatidylcholines; SL, sphingolipids.

Figure 3

Common edges of the serum metabolite network of the three studies: EPIC-Potsdam, CARLA and KORA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Continuous black lines represent positive and dashed lines represent inverse partial correlations. The thicknesses of the edges are proportional to the strength of the correlations. Nodes with different border colours represent different metabolite classes: black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines.

Number of connected nodes (metabolites) and edges in individual and common metabolite networks in the four studies AC, acylcarnitines; AA, amino acids; LysoPC, lyso-phosphatidylcholines, DiA-PC, diacyl-phosphatidylcholines; AA-PC, acyl-alkyl- phosphatidylcholines; SL, sphingolipids. Common edges of the serum metabolite network of the three studies: EPIC-Potsdam, CARLA and KORA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Continuous black lines represent positive and dashed lines represent inverse partial correlations. The thicknesses of the edges are proportional to the strength of the correlations. Nodes with different border colours represent different metabolite classes: black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines. The meta-analytic network of the partial-correlation coefficients represented by the 100 highly correlated metabolite pairs (edges) across the four studies is shown in Figure 4. Meta-analysis of the networks revealed that, among the 100 edges connecting 89 nodes of the meta-analytic network, 57 edges connecting 66 metabolites were present in all the four networks, 58–76 edges connecting 75–89 nodes were present in at least three networks and 63–84 edges connecting 76–87 nodes were present in at least two networks. The meta-analytic network showed clear clusters of the paired acylcarnitines, sphingolipids, lyso-PCs and three clusters of amino acids. Large but differently connected clusters of acyl-alkyl-PCs and diacyl-PCs formed the dominant structure of the networks. Comparison of this network with the common network of four studies showed dissimilarity in a number of edges (Figure 5). However, it was very similar to the combined network of Potsdam, KORA and CARLA (Supplementary Figure 9, available as Supplementary data at IJE online).

Figure 4

Figure 5

Comparative network of the common network and the meta-analytic network of the four studies: EPIC-Heidelberg, EPIC-Potsdam, KORA and CARLA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Black edge colours represent common edges in the common network and the meta-analytic network, whereas the grey colour represents edges present only in the meta-analytic network. Similarly, the white colour of nodes represents common nodes in the compared networks, whereas the red colour represents nodes present only in the meta-analytic network. Nodes with different border colours represent different metabolite classes: yellow: acylcarnitines; black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines.

Meta-analytic serum metabolite network of the four studies: EPIC-Heidelberg, EPIC-Potsdam, CARLA and KORA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Continuous black lines represent positive and dashed lines represent inverse partial correlations. The thicknesses of the edges are proportional to the strength of the correlations. Nodes with different border colours represent different metabolite classes: yellow: acylcarnitines; black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines. Comparative network of the common network and the meta-analytic network of the four studies: EPIC-Heidelberg, EPIC-Potsdam, KORA and CARLA. Nodes represent metabolites and edges are partial correlations between two metabolites adjusted for the other metabolites as well as age, sex and BMI. Black edge colours represent common edges in the common network and the meta-analytic network, whereas the grey colour represents edges present only in the meta-analytic network. Similarly, the white colour of nodes represents common nodes in the compared networks, whereas the red colour represents nodes present only in the meta-analytic network. Nodes with different border colours represent different metabolite classes: yellow: acylcarnitines; black: amino acids; purple: lyso-phosphatidylcholines; sky-blue: sphingolipids; green: diacyl-phosphatidylcholines; red: acyl-alkyl-phosphatidylcholines. The networks constructed using Spearman’s rank partial correlations are shown in Supplementary Figures 10–14, available as Supplementary data at IJE online. All the individual networks and common networks showed high similarity to the corresponding networks constructed using Pearson’s partial correlations (Table 3).

Discussion

In this study, we generated and compared the metabolite networks of four German population-based studies. Moreover, we applied a novel meta-analytic approach to combine metabolite networks to identify potentially stable correlation structures across all studies. Comparison of metabolite networks revealed overall considerable heterogeneity in network topologies. However, specific metabolite subgroups showed high consistency in the networks. Consistent network structures were detected for sphingolipids, lyso-PCs, acyl-alkyl-PCs and diacyl-PCs and among the amino acids tryptophan, tyrosine and phenylalanine. The meta-analytic network also showed clear grouping of the metabolite classes and was, in addition, sensitive for further plausible biological links. Consistent links between metabolites from the same group may reflect the same underlying metabolic pathways as the common determinants of the correlation structure across the study populations. In the identified common as well as meta-analytic networks, we observed connections of sphingolipids with PCs, which could be related to the biosynthesis pathway of the sphingolipids. The synthesis of sphingolipids require enzymatic transfer of phosphocholines from PCs to ceramide, which in turn is converted to sphingolipids. The linkage between these two classes could also be due to limitation of the measurement kit owing to possible interference in the measurement of different metabolites. In addition, we observed a consistent connection between the aromatic amino acids phenylalanine, tryptophan and tyrosine. Phenylalanine is a substrate for tyrosine biosynthesis and, with tryptophan, the two are also precursors of catecholamines. In these networks, we also observed that the majority of stable edges connected metabolites that are known to be directly related by a single metabolic reaction step. This supports the idea that the reproducible correlation structure of metabolites likely reflects linkage in metabolic pathways. Our results are supported by an earlier KORA study which showed that the GGM has high sensitivity and specificity in identifying reactions that are one step apart. The same study, which is also included in this analysis, likewise reported that reactions that were two steps apart were reflected by negative correlations in the network. This was also observed in our study, e.g. SM.C16.1 and SM.C18.0 were negatively correlated in the common and meta-analysed networks. The compared networks also showed a clear separation of amino acids and acylcarnitines and separate clustering of sphingolipids and diacyl- and acyl-alkyl-PCs. These findings are in agreement with earlier results observed in KORA and EPIC-Potsdam. The modular structure of the metabolites may reflect metabolic pathways including biosynthesis, degradation and metabolism and interaction between the different classes of metabolites. Such biological interrelations were shown to be detectable in metabolomics data in observational studies, which could be reproduced across different populations. For example, PC.ae.C32: 1 and PC.ae.C32: 2 reflect Steaoryl-CoA desaturase/ Steaoryl-CoA desaturase 5 desaturation and a pair of PC.aa.C38: 5 and PC.aa.C40: 5 reflects various fatty acid elongations. Likewise, correlation between phenylalanine, tryptophan and tyrosine denotes amino-acid-associated pathways. Some of the consistent relationships between metabolites identified in the networks might hint towards so far unknown links. Such metabolites might be better candidates for further investigation to identify their role on the metabolic pathways. In addition, for comparison of the four metabolite networks, we also constructed a meta-analytic metabolite network that shared higher similarity with the common networks from the three studies including EPIC-Potsdam, KORA and CARLA and less similarity with common networks including EPIC-Heidelberg. The heterogeneity among the identified networks may partly be attributed to the differences in health/disease status of the four populations, differences in diet, fasting status, medication/supplement use or lifestyle factors. Technical differences related to metabolite-concentration measurements in different laboratories or use of different kits could also have partly resulted in the observed differences. Indeed, it is already known that biochemical assay assessments, sample handling and other factors such as storage etc. are some of the reasons affecting reliability and are often addressed in metabolomic measurements. In addition, EPIC-Potsdam and EPIC-Heidelberg had similar study protocols, sample preparation, storage conditions and relatively similar population characteristics. However, large differences in the networks of the two populations were observed. Therefore, the difference between EPIC-Heidelberg and the other studies may partly be related to smaller sample size and technical issues such as metabolomic measurement in a different laboratory with different kits. We also observed differences between the common network and the meta-analytic network. However, it must be noted that these differences are not unexpected, as the two were created using two different approaches, i.e. (i) by combining the common edges in all the networks in one network and (ii) meta-analysis of the four networks. It is important to underline that the meta-analytic network was created using an inverse variance approach, which gives higher weight to the studies with large sample sizes, i.e. KORA, EPIC-Potsdam and CARLA, respectively. This had a large influence on the effect size (partial correlations). Consequently, it resulted in a network that is more similar to the common network of the three larger studies. Nevertheless, the random meta-analytic approach is advantageous over the approach that was based on simple structural similarity, as the former takes both within- and between-studies variation into account. It should also be noted that many of the additional edges that were detected by the meta-analytic approach again corresponded to known metabolic reaction steps. A major strength of this study was that, for the first time, metabolite networks between several large population-based studies were compared using an innovative meta-analytic networks approach. Thereby, the meta-analytic network was based on metabolomic measurements of almost 8000 participants, which represents a very large sample size for the application of these sophisticated metabolomic technologies. Metabolites were measured in different population samples, which might have slightly different environmental exposures despite living in the same country. However, the aim of this study was to see how similar these metabolite networks are in free-living populations with less restricted conditions. This approach was also chosen to better grasp the feasibility of replicating metabolomic results in different populations, which is often demanded when validating metabolomics data. In addition, relatively similar analytic methods were used to quantify the concentration of the metabolites, which makes the data more comparable than data from other platforms. Moreover, we reduced technical variation by including only those metabolites that were above the detection level and showed good reliability in any of the four studies. This study also had certain limitations. We found differences in mean metabolite concentrations between the cohorts, which are attributable to technical aspects (e.g. different laboratories and kits, sample processing and storage, etc.) as well as biological aspects (e.g. cohort differences such as age, sex, BMI, etc.). CVs of metabolite measurements were similar between the cohorts and comparable to other metabolomic studies that measured the same metabolites., However, a general limitation of metabolomic studies is that many metabolites are measured simultaneously and CVs of metabolites are usually higher than CVs of single biomarkers. Metabolite measurements were assessed in two different laboratories and using different kits, although the kits were from the same company. Further, metabolites were identified using a targeted approach, which has limited coverage. It might have resulted in missing many metabolites that are sharing similar metabolic pathways with the investigated metabolites. However, to conduct a similar study with untargeted metabolomic measurements may be challenging, as different metabolites may be detected in different populations and a number of metabolites will remain unidentified, which may complicate comparison across several studies. Another limitation of our study is that only one metabolite measurement per sample was available for the current study, so metabolite reliability could not be tested. However, two earlier EPIC studies showed moderate to good reliability of included metabolites over 4 months and over 2 years. In addition, participants with prevalent medical conditions were not excluded, which might have affected the metabolomics profiles. Similarly, we did not account for fasting state, as, due to logistic reasons in large studies, the majority were non-fasting samples, which may affect metabolite reliability. The existing methods of network construction employ either regression-based approaches or some thresholding criteria for edge inclusions in the respective networks. Nevertheless, using these approaches, the identified networks in different sample could be different, as the correlation between variables may vary due to a number of factors such as sample size, etc. Therefore, in order to construct networks of similar sizes for comparison in our study, we retained the 100 edges with the highest correlation in all four cohorts. As we do not perform any model selection, this method may result in inclusion of edges that may not reflect important biological relationships or exclusion of edges that could be important in some biological pathways. It is also important to note that GGM works under the assumption of the Gaussian distribution of the study variables. Therefore, we compared the QQ-plots against normal distribution to ensure log-normality of the metabolites concentrations. We observed some deviations in the tails of several metabolites. Nevertheless, the results were comparable with the non-parametric approach in identifying highly correlated metabolites. In summary, we observed considerable similarities in metabolite sub-networks of sphingolipids, lyso-PCs, acyl-alkyl-PCs and diacyl-PCs and amino acids across the four populations, although large variations were observed in overall networks. Variation may partly be explained by technical issues, such as different laboratories and measurement kits. These technical difficulties should be investigated further and also be taken into account when replicating metabolomic results in different population-based studies. Stable links observed within groups of biochemically related metabolites may likely reflect close interdependency of the connected metabolites in metabolic pathways. Using the meta-analytic network as a new approach for combining metabolic data from different studies, closely related metabolites could be identified, for some of which the biological relationships in metabolic pathways had been previously described. The metabolites with observed relationships in the meta-analytic network may be candidates for further investigation to explore their potential role in biological processes. Click here for additional data file.

33 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

Review 3. KORA--a research platform for population based health research.

Authors: R Holle; M Happich; H Löwel; H E Wichmann
Journal: Gesundheitswesen Date: 2005-08

Review 4. Biomarkers in nutritional epidemiology: applications, needs and new horizons.

Authors: Mazda Jenab; Nadia Slimani; Magda Bictash; Pietro Ferrari; Sheila A Bingham
Journal: Hum Genet Date: 2009-04-09 Impact factor: 4.132

Review 5. Retinal very long-chain PUFAs: new insights from studies on ELOVL4 protein.

Authors: Martin-Paul Agbaga; Md Nawajes A Mandal; Robert E Anderson
Journal: J Lipid Res Date: 2010-03-18 Impact factor: 5.922

6. Identification of Serum Metabolites Associated With Incident Hypertension in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study.

Authors: Stefan Dietrich; Anna Floegel; Cornelia Weikert; Cornelia Prehn; Jerzy Adamski; Tobias Pischon; Heiner Boeing; Dagmar Drogan
Journal: Hypertension Date: 2016-05-31 Impact factor: 10.190

Review 7. An overview of sphingolipid metabolism: from synthesis to breakdown.

Authors: Christopher R Gault; Lina M Obeid; Yusuf A Hannun
Journal: Adv Exp Med Biol Date: 2010 Impact factor: 2.622

Review 8. An overview of phenylalanine and tyrosine kinetics in humans.

Authors: Dwight E Matthews
Journal: J Nutr Date: 2007-06 Impact factor: 4.798

9. Correlation-Based Network Generation, Visualization, and Analysis as a Powerful Tool in Biological Studies: A Case Study in Cancer Cell Metabolism.

Authors: Albert Batushansky; David Toubiana; Aaron Fait
Journal: Biomed Res Int Date: 2016-10-19 Impact factor: 3.411

10. Effects of sample handling and storage on quantitative lipid analysis in human serum.

Authors: Angela M Zivkovic; Michelle M Wiest; Uyen Thao Nguyen; Ryan Davis; Steven M Watkins; J Bruce German
Journal: Metabolomics Date: 2009-08-05 Impact factor: 4.290

4 in total

1. Lipid Profiles and Heart Failure Risk: Results From Two Prospective Studies.

Authors: Clemens Wittenbecher; Fabian Eichelmann; Estefanía Toledo; Marta Guasch-Ferré; Miguel Ruiz-Canela; Jun Li; Fernando Arós; Chih-Hao Lee; Liming Liang; Jordi Salas-Salvadó; Clary B Clish; Matthias B Schulze; Miguel Ángel Martínez-González; Frank B Hu
Journal: Circ Res Date: 2020-12-04 Impact factor: 17.367

2. Metabolomics Analytics Workflow for Epidemiological Research: Perspectives from the Consortium of Metabolomics Studies (COMETS).

Authors: Mary C Playdon; Amit D Joshi; Fred K Tabung; Susan Cheng; Mir Henglin; Andy Kim; Tengda Lin; Eline H van Roekel; Jiaqi Huang; Jan Krumsiek; Ying Wang; Ewy Mathé; Marinella Temprosa; Steven Moore; Bo Chawes; A Heather Eliassen; Andrea Gsur; Marc J Gunter; Sei Harada; Claudia Langenberg; Matej Oresic; Wei Perng; Wei Jie Seow; Oana A Zeleznik
Journal: Metabolites Date: 2019-07-17

3. Prioritizing candidate diseases-related metabolites based on literature and functional similarity.

Authors: Yongtian Wang; Liran Juan; Jiajie Peng; Tianyi Zang; Yadong Wang
Journal: BMC Bioinformatics Date: 2019-11-25 Impact factor: 3.169

4. Intergenerational Metabolomic Analysis of Mothers with a History of Gestational Diabetes Mellitus and Their Offspring.

Authors: Raffael Ott; Xenia Pawlow; Andreas Weiß; Anna Hofelich; Melanie Herbst; Nadine Hummel; Cornelia Prehn; Jerzy Adamski; Werner Römisch-Margl; Gabi Kastenmüller; Anette-G Ziegler; Sandra Hummel
Journal: Int J Mol Sci Date: 2020-12-17 Impact factor: 5.923

4 in total