| Literature DB >> 26053255 |
Monika Piwowar1, Wiktor Jurkowski2.
Abstract
To date, the massive quantity of data generated by high-throughput techniques has not yet met bioinformatics treatment required to make full use of it. This is partially due to a mismatch in experimental and analytical study design but primarily due to a lack of adequate analytical approaches. When integrating multiple data types e.g. transcriptomics and metabolomics, multidimensional statistical methods are currently the techniques of choice. Typical statistical approaches, such as canonical correlation analysis (CCA), that are applied to find associations between metabolites and genes are failing due to small numbers of observations (e.g. conditions, diet etc.) in comparison to data size (number of genes, metabolites). Modifications designed to cope with this issue are not ideal due to the need to add simulated data resulting in a lack of p-value computation or by pruning of variables hence losing potentially valid information. Instead, our approach makes use of verified or putative molecular interactions or functional association to guide analysis. The workflow includes dividing of data sets to reach the expected data structure, statistical analysis within groups and interpretation of results. By applying pathway and network analysis, data obtained by various platforms are grouped with moderate stringency to avoid functional bias. As a consequence CCA and other multivariate models can be applied to calculate robust statistics and provide easy to interpret associations between metabolites and genes to leverage understanding of metabolic response. Effective integration of lipidomics and transcriptomics is demonstrated on publically available murine nutrigenomics data sets. We are able to demonstrate that our approach improves detection of genes related to lipid metabolism, in comparison to applying statistics alone. This is measured by increased percentage of explained variance (95% vs. 75-80%) and by identifying new metabolite-gene associations related to lipid metabolism.Entities:
Mesh:
Year: 2015 PMID: 26053255 PMCID: PMC4459700 DOI: 10.1371/journal.pone.0128854
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Workflow of metabolomics and transcriptomics data integration.
After preprocessing, transcriptomics and metabolomics data are used to divide data set into functional groups. CCA and PLS statistics are calculated for each group in question to find rankings of gene–small molecule associations. Subsequent validation and functional analysis (e.g. by overrepresentation tests) helps in biological interpretation of ranked results.
Results of Canonical Correlation Analysis of murine nutrigenomics.
| Group 1 | Group 2 | Group 3 | Group 4 | Undivided | |
|---|---|---|---|---|---|
|
| 20/51 | 22/65 | 22/65 | 19/45 | 120 |
|
| 2/2 | 6/6 | 2/2 | 7/11 | 21 |
|
| 0.95 (p = 0.4e-03) | 0.98 (p = 1.8e-06) | 0.92 (p = 2e-04) | 0.95 (p = 8.2e-05) | - |
|
| 0.74 (p>0.5) | 0.95 (p = 5.1e-03) | 0.88 (p = 1.5e-02) | 0.93 (p = 0.007) | - |
|
| 0.10 | 0.32 | 0.14 | 0.35 | - |
|
| 0.85 | 0.74 | 0.82 | 0.65 | - |
|
| 0.97 | 0.986 | 0.98 | 0.984 | 0.89 |
|
| 0.87 | 0.984 | 0.96 | 0.966 | 0.78 |
|
| 19.07 | 37.55 | 36.91 | 30.83 | 33.46 |
|
| 15.02 | 13.15 | 16.74 | 30.37 | 18.33 |
Number of genes (X) and metabolites (Y) in each group: after removing correlated r>0.7 variables and all. Correlations of Canonical Variables (CV1 and CV2), aggregate redundancies, percentage of variance explained by PLS latent variables are calculated within functional groups (Group 1–3), remaining variables (Group 4) and for undivided data set.
Top 10% ranked PLS results in respective categories/groups.
| Group | In reactions involving detected lipids | In lipid pathways | Not present in Reactome |
|---|---|---|---|
|
| 0 | 3 | 3 |
|
| 2 (APOA1, APOB) | 7 | 1 |
|
| 1 (PPARA) | 8 | 2 |
|
| 0 | 6 | 2 |
|
| 0 | 2 | 5 |
We compare number of genes present in sets of seed biochemical reactions with other lipid metabolism and function related genes found by CCA. Last column provides lists of pathways significantly enriched within groups of genes.
Pathways significantly enriched (FDR < 0.05) within groups of genes in top 10% ranked PLS results in respective categories/groups.
| Group | Reactome ID | Pathway name | FDR |
|---|---|---|---|
| Undivided | - | - | - |
|
| REACT_15525 | "Nuclear Receptor transcription pathway" | 1.2E-12 |
| REACT_12627 | "Generic Transcription Pathway" | 4.1E-5 | |
| REACT_6823 | "Lipoprotein metabolism" | 4.1E-5 | |
| REACT_13621 | "HDL-mediated lipid transport" | 3.7E-4 | |
| REACT_163679, | "Scavenging by Class B Receptors" | 0.0020 | |
| REACT_602 | "Lipid digestion, mobilization and transport" | 0.0068 | |
| REACT_22258 | "Metabolism of lipids and lipoproteins" | 0.012 | |
| REACT_6841 | "Chylomicron-mediated lipid transport" | 0.012 | |
| REACT_163699 | "Scavenging by Class A Receptors" | 0.019 | |
|
| REACT_22258 | "Metabolism of lipids and lipoproteins" | 5.4E-4 |
| REACT_268803 | "Defective CYP24A1 causes Hypercalcemia, infantile (HCAI)" | 0.0018 | |
| REACT_23947 | "GABA synthesis, release, reuptake and degradation " | 0.0023 | |
| REACT_602 | "Lipid digestion, mobilization and transport" | 0.0068 | |
| REACT_11082 | "Import of palmitoyl-CoA into the mitochondrial matrix" | 0.011 | |
| REACT_22279 | "Fatty acid, triacylglycerol, and ketone body metabolism" | 0.023 | |
| REACT_116145 | "PPARA activates gene expression" | 0.029 | |
| REACT_118659 | "RORA activates gene expression" | 0.029 | |
| REACT_19241 | "Regulation of lipid metabolism by Peroxisome proliferator-activated receptor alpha (PPARalpha)" | ,0.029 | |
| REACT_147904 | "Activation of gene expression by SREBF (SREBP)" | 0.046 | |
| REACT_264212 | "Transcriptional activation of mitochondrial biogenesis" | 0.046 | |
| REACT_267785 | "Signaling by Retinoic Acid" | 0.046 | |
| REACT_1190 | "Triglyceride Biosynthesis" | 0.049 | |
|
| REACT_268803 | "Defective CYP24A1 causes Hypercalcemia, infantile (HCAI)" | 0.0016 |
| REACT_11082 | "Import of palmitoyl-CoA into the mitochondrial matrix" | 0.012 | |
| REACT_22258 | "Metabolism of lipids and lipoproteins" | 0.012 | |
| REACT_11042 | "Recycling of bile acids and salts" | 0.013 | |
| REACT_11040 | "Bile acid and bile salt metabolism", | 0.049 | |
| REACT_267785 | "Signaling by Retinoic Acid" | 0.049 | |
|
| REACT_115639 | "Sulfur amino acid metabolism" | 0.019 |
| REACT_22279 | "Fatty acid, triacylglycerol, and ketone body metabolism" | 0.023 | |
| REACT_163862 | "Cobalamin (Cbl, vitamin B12) transport and metabolism" | 0.024 | |
| REACT_115589 | "Cysteine formation from homocysteine", | 0.032 | |
| REACT_169149 | "Defective MTR causes methylmalonic aciduria and homocystinuria type cblG" | 0.032 | |
| REACT_169439 | "Defective MTRR causes methylmalonic aciduria and homocystinuria type cblE" | 0.032 |
Fig 2Comparison of randomized groups with (A) CCA and (B) PLS results.
(A) CCA: Aggregate variability explained for Y|X and X|Y (B) PLS: % of explained variance. All differences between respective groups are significant (p <0.05, one-sample t-test). All–undivided data set; g1, g2, g3 –functional groups; g4 –complement; Rg1, Rg2, Rg3, Rg4 –mean values in randomized groups
Fig 3Graphical presentation of the first canonical correlation for each group (1–4).
Correlation structure is indicated by length of the bars extending toward the circumference (positive correlation) or toward the centre (negative correlations). The left semicircle lists the transcriptional variables (X) and the right semicircle lists the fatty acids variables (Y).
The CCA and rCCA results for Groups 1–4 and undivided data set.
| Gene symbol | CCA loadings | rCCA loadings | ChEBI ID | CCA loadings | rCCA loadings | |
|---|---|---|---|---|---|---|
|
| SSX2IP | - | 0,77 | 32425 | - | -0,45 |
| THRSP | - | 0,57 | 35465 | - | -0,43 | |
| LPIN1 | - | 0,46 | 36036 | - | 1,06 | |
| CYP24A1 | - | 0,43 | - | - | - | |
|
| NR1I2/ | -0,55 | 0,08 | 16196 | 0,96 | 0,65E-3 |
| APOA1 | 0,59 | -0,043 | 28716 | 0,87 | -0,17 | |
| RARB | 0,62 | -0,16 | - | - | - | |
| PLTP/ | 0,45 | -0,023 | - | - | - | |
|
| RARB | -0,61 | -0,16 | 28364 | 0,66 | 0,12 |
| APOA1 | -0,44 | -0,043 | 28125 | 0,75 | 0,12 | |
| PSMB10 | 0,43 | 0,25 | - | - | - | |
| NR1I2 | 0,64 | 0,080 | - | - | - | |
|
| RARB | -0,58 | -0,16 | 15756 | 0,89 | 0,05 |
| CPT1A | -0,42 | -0,11 | 28875 | -0,45 | 0,05 | |
| RXRA/ | 0,40 | 0,12 | - | - | - | |
| PSMB10 | 0,54 | 0,25 | - | - | - | |
| NR1I2 | 0,67 | 0,080 | - | - | - | |
|
| FAT1, | 0,59 | 0,18 | 61204_A | -0,60 | -0,16 |
| PRG4 | 0,46 | 0,11 | 61204_B | 0.47 | 0,12 | |
| PDK4 | 0,45 | 0,12 | 36036_A | 0.41 | 1,06 | |
| Il2 | -0,55 | -0,19 | - | - | - |
For both metabolites and genes canonical loadings describe the impact of particular variables on correlations between data sets. Gene symbols in italic are correlated (r>0.7) with variables in question and were removed prior canonical analysis.