| Literature DB >> 27403564 |
Tin Nguyen1, Diana Diaz1, Rebecca Tagett1, Sorin Draghici1,2.
Abstract
MicroRNAs (miRNAs) are small non-coding RNA molecules whose primary function is to regulate the expression of gene products via hybridization to mRNA transcripts, resulting in suppression of translation or mRNA degradation. Although miRNAs have been implicated in complex diseases, including cancer, their impact on distinct biological pathways and phenotypes is largely unknown. Current integration approaches require sample-matched miRNA/mRNA datasets, resulting in limited applicability in practice. Since these approaches cannot integrate heterogeneous information available across independent experiments, they neither account for bias inherent in individual studies, nor do they benefit from increased sample size. Here we present a novel framework able to integrate miRNA and mRNA data (vertical data integration) available in independent studies (horizontal meta-analysis) allowing for a comprehensive analysis of the given phenotypes. To demonstrate the utility of our method, we conducted a meta-analysis of pancreatic and colorectal cancer, using 1,471 samples from 15 mRNA and 14 miRNA expression datasets. Our two-dimensional data integration approach greatly increases the power of statistical analysis and correctly identifies pathways known to be implicated in the phenotypes. The proposed framework is sufficiently general to integrate other types of data obtained from high-throughput assays.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27403564 PMCID: PMC4941544 DOI: 10.1038/srep29251
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overall pipeline of the proposed framework.
The input consists of (i) a pathway database and a miRNA database including known targets (panel a), (ii) multiple mRNA expression datasets (panel b), and (iii) multiple miRNA expression datasets (panel c). Each expression dataset consists of two groups of samples, e.g. disease versus control. The framework first augments the signaling pathways with miRNA molecules and their interactions with coding mRNA genes (panel d). It then calculates the standardized mean difference and its standard error in each expression dataset. The summary size effect across multiple datasets for each data type are then estimated using the REstricted Maximum Likelihood (REML) algorithm (panels e,f). Similarly, the p-value for differential expression is calculated for each dataset and then combined using the additive method (add-CLT). The augmented pathways, the combined p-values, and the estimated size effects then serve as input for ImpactAnalysis, which is a topology-aware pathway analysis method (panel g).
Description of miRNA and mRNA expression datasets used in the experimental studies.
| Cancer | Data | Accession ID | Control | Disease | Tissue | Platform |
|---|---|---|---|---|---|---|
| Colorectal | mRNA | GSE4107 | 10 | 12 | Colonic mucosa | Affymetrix HG U133 Plus 2.0 |
| GSE9348 | 12 | 70 | Colonic mucosa | Affymetrix HG U133 Plus 2.0 | ||
| GSE15781 | 10 | 13 | Colon | ABI HG Survey 2 | ||
| GSE21510 | 25 | 123 | Colon | Affymetrix HG U133 Plus 2.0 | ||
| GSE23878 | 24 | 35 | Colon | Affymetrix HG U133 Plus 2.0 | ||
| GSE41657 | 12 | 25 | Colonic mucosa, epithelial neoplasm | Agilent-014850 HG 4×44K G4112F | ||
| GSE62322 | 18 | 20 | Colon | Affymetrix HG U133A | ||
| miRNA | GSE33125 | 9 | 9 | Colon | Illumina Human v2 MicroRNA | |
| GSE35834 | 23 | 55 | Colon & rectum | Affymetrix miRNA 1.0 | ||
| GSE39814 | 9 | 10 | FHC, HCT116, & SW480 cells | Agilent-021827 Human miRNA | ||
| GSE39833 | 11 | 88 | Peripheral blood serum | Agilent-021827 Human miRNA | ||
| GSE41655 | 15 | 33 | Colonic mucosa, & epithelial neoplasm | Agilent-021827 Human miRNA | ||
| GSE49246 | 40 | 40 | Colon | Sun Yat-Sen Human microRNA | ||
| GSE54632 | 5 | 5 | Colonic and rectal mucosa | Affymetrix miRNA 1.0 | ||
| GSE73487 | 23 | 90 | Colon | Affymetrix miRNA 1.0 | ||
| Pancreatic | mRNA | GSE15471 | 39 | 39 | Pancreas | Affymetrix HG U133 Plus 2.0 |
| GSE19279 | 3 | 4 | Pancreas, pancreatic duct | Affymetrix HG U133A | ||
| GSE27890 | 4 | 4 | Pancreas, ductal epithelia | Affymetrix HG U133 Plus 2.0 | ||
| GSE32676 | 7 | 25 | Pancreas | Affymetrix HG U133 Plus 2.0 | ||
| GSE36076 | 10 | 3 | Peripheral blood mononuclear cells | Affymetrix HG U133 Plus 2.0 | ||
| GSE43288 | 3 | 4 | Pancreas | Affymetrix HG U133A | ||
| GSE45757 | 9 | 132 | Pancreatic epithelial & cancer cells | Affymetrix HG U133A | ||
| GSE60601 | 3 | 9 | CD14++ & CD16- cells | Affymetrix HG U133 Plus 2.0 | ||
| miRNA | GSE24279 | 22 | 136 | Pancreas | Febit human miRBase v11 | |
| GSE25820 | 4 | 5 | Pancreatic duct | Agilent-019118 Human miRNA | ||
| GSE32678 | 7 | 25 | Pancreas | miRCURY LNA microRNA, v.11.0 | ||
| GSE34052 | 6 | 6 | Pancreas | Agilent-029297 Human miRNA | ||
| GSE43796 | 5 | 26 | Pancreas | Agilent-031181 Human miRNA V16 | ||
| GSE60978 | 6 | 51 | Pancreatic duct | Agilent-031181 Human miRNA V16 |
All of the data were downloaded from Gene Expression Omnibus.
Figure 2Graphical representation of the augmented pathway Colorectal cancer.
The green rectangle nodes and black arrows show the KEGG genes and their interactions while the blue nodes and red arrows show the miRNAs and their interactions with the genes, respectively. In each miRNA node added, we show the total number of miRNAs (blue circles) that are known to target the gene, and the names of the miRNA (blue rectangles) that were actually measured in the 8 colorectal miRNA datasets. This is a subset of the total set of miRNAs known to target genes on this pathway.
Figure 3Graphical representation of the augmented pathway Pancreatic cancer.
The green rectangle nodes and black arrows show the KEGG genes and their interactions while the blue nodes and red arrows show the miRNAs and their interactions with the genes. In each miRNA node added, we show the total number of miRNAs (blue circles) that are known to target the gene, and the names of the miRNA (blue rectangles) that were actually measured in the 6 pancreatic miRNA datasets. This is a subset of the total set of miRNAs known to target genes on this pathway.
The 16 top ranked pathways and FDR-corrected p-values obtained by combining colorectal data using 6 approaches: MetaPath_P, MetaPath_G, MetaPath_I, ImpactAnalysis_P, ImpactAnalysis_G, and ImpactAnalysis_I.
The horizontal lines show the 1% significance threshold. The target pathway Colorectal cancer is highlighted in green. All other approaches, MetaPath_P, MetaPath_G, MetaPath_I, ImpactAnalysis_P, ImpactAnalysis_G fail to identify the target pathway as significant, and rank it at the positions 16, 9, 15, 61, and 10, respectively. On the contrary, the integrative approach, ImpactAnalysis_I, identifies the target pathway as significant and ranks it on top.
The 10 top ranked pathways and FDR-corrected p-values obtained by combining colorectal data using 6 approaches: MetaPath_P, MetaPath_G, MetaPath_I, ImpactAnalysis_P, ImpactAnalysis_G, and ImpactAnalysis_I.
The horizontal lines show the 1% significance threshold. The target pathway Pancreatic cancer is highlighted in green. All other approaches, MetaPath_P, MetaPath_G, MetaPath_I, ImpactAnalysis_P, ImpactAnalysis_G fail to identify the target pathway as significant, and rank it at the positions 17, 91, 91, 32, and 8, respectively. On the contrary, the integrative approach, ImpactAnalysis_I, identifies the target pathway as significant and ranks it on top.
Running time of each pathway analysis in minutes (m).
| Method | Input | Colorectal | Pancreatic |
|---|---|---|---|
| ImpactAnalysis_I | mRNA & miRNA | 4 m | 4 m |
| MetaPath | mRNA | 39 m | 47 m |