| Literature DB >> 22078224 |
Lam C Tsoi1, Tingting Qin, Elizabeth H Slate, W Jim Zheng.
Abstract
BACKGROUND: To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets.Entities:
Mesh:
Year: 2011 PMID: 22078224 PMCID: PMC3251006 DOI: 10.1186/1471-2105-12-438
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The workflow of CDEP. Each "plate" in the figure represents an instance of gene (g = 1,2,...G), dataset (i = 1,2,...D), or dataset local FDR threshold (l ∈ (0,1)). This workflow diagram only illustrates how to identify consistently up-regulated genes, but the same procedure was applied to identify consistently down-regulated genes.
Figure 2The distributions of p-values computed by t-test and RankProd test. We compared the parametric t-test and the non-parametric rank product approaches to test which of these p-value (computed from one-sided test) calculation is robust for identifying truly differentially expressed genes (p = q = 0.05; |Δ| = 0.5).
Figure 3Log likelihood and FDR plot. Minus log likelihood versus the FDR threshold (l) for different genes in one of the simulated data (proportion of cancer-type specific and metastatic related differentially expressed genes: p = q = 0.1; degree of effect: |Δ| = 1). FDR is the proportion of false positives among genes declared to be differentially expressed for each dataset. The dotted line represents genes that are consistently differentially expressed, solid line represents genes that are differentialy expressed only in specific dataset, and dashed line represents non-differentially expressed genes. The three lines show the mean, and the vertical bars show the standard deviation of the Q values for the three types of genes at the given FDR. For clarity, only the upper bars are shown.
Figure 4Boxplots for the expected likelihood. The boxplots for the expected likelihood (EL) of the three categories of genes: genes that are consistently differentially expressed, genes differentially expressed in only a certain dataset, non-differentially expressed genes. The ranges and the quartiles are shown. The width of the boxplot is drawn proportional to the square-root of the number of observations. p = q = 0.1, |Δ| = 1.
The Power and Type I error of CDEP, Meta-Profile and Meta-RankProd from simulation study.
| CDEP | Meta-Profile | Meta-RankProd | |||||||
|---|---|---|---|---|---|---|---|---|---|
| |Δ| | FDR | Power (%) | Type I error | Power (%) | Type I error | Power (%) | Type I error | ||
| 28.7 | 1.64 × 10-4 | 6.40 | 1.92 × 10-6 | 23.7 | 1.21 × 10-2 | ||||
| 31.0 | 3.52 × 10-4 | 9.10 | 1.92 × 10-6 | 24.5 | 1.23 × 10-2 | ||||
| 70.2 | 2.21 × 10-3 | 11.9 | 1.35 × 10-5 | 27.2 | 1.27 × 10-2 | ||||
| 33.4 | 2.02 × 10-4 | 15.4 | 1.35 × 10-5 | 33.2 | 1.18 × 10-2 | ||||
| 34.3 | 4.22 × 10-4 | 15.6 | 1.73 × 10-5 | 45.0 | 1.24 × 10-2 | ||||
| 74.9 | 2.28 × 10-3 | 18.2 | 2.31 × 10-5 | 56.3 | 1.38 × 10-2 | ||||
| 26.2 | 2.29 × 10-4 | 8.93 | 2.84 × 10-6 | 23.6 | 2.29 × 10-2 | ||||
| 27.8 | 5.31 × 10-4 | 11.4 | 4.88 × 10-6 | 24.4 | 2.31 × 10-2 | ||||
| 33.1 | 1.84 × 10-3 | 13.4 | 1.12 × 10-4 | 26.7 | 2.36 × 10-2 | ||||
| 32.8 | 2.99 × 10-4 | 15.6 | 5.90 × 10-5 | 27.0 | 2.34 × 10-2 | ||||
| 33.1 | 6.02 × 10-4 | 18.2 | 1.16 × 10-4 | 32.2 | 2.39 × 10-2 | ||||
| 36.8 | 2.08 × 10-3 | 23.6 | 2.32 × 10-4 | 46.4 | 2.56 × 10-2 | ||||
| 28.2 | 1.61 × 10-4 | 8.00 | 8.12 × 10-6 | 24.3 | 1.16 × 10-2 | ||||
| 29.5 | 3.11 × 10-4 | 10.7 | 1.83 × 10-5 | 25.7 | 1.18 × 10-2 | ||||
| 66.5 | 1.78 × 10-3 | 13.0 | 2.64 × 10-5 | 29.7 | 1.22 × 10-2 | ||||
| 21.8 | 9.36 × 10-5 | 10.3 | 4.35 × 10-5 | 23.3 | 2.33 × 10-2 | ||||
| 26.3 | 4.68 × 10-4 | 12.5 | 9.14 × 10-5 | 23.5 | 2.34 × 10-2 | ||||
| 31.7 | 1.61 × 10-3 | 14.0 | 1.94 × 10-4 | 24.3 | 2.36 × 10-2 | ||||
p is the proportion of genes differentially expressed only in a certain dataset, and q is the proportion of consistently differentially expressed genes; Δ is the simulated mean difference between the expression values in case and control condition for the differentially expressed genes. FDR is the proportion of false positives among the genes identified to be consistently differentially expressed across all datasets. The results in the table are the mean values of 10 different simulated datasets. Additional simulation results can be found in Additional File 2.
Description of the six microarray datasets used.
| Cancer Type | Number of samples | Number of Metastatic samples | Affymetrix Platform | Number of probesets | Number of genes |
|---|---|---|---|---|---|
| Cervical | 33 | 12 | HG-U133 P2 | 5,4675 | 20,271 |
| Prostate | 90 | 25 | HG-U95Av2 | 1,2625 | 9,000 |
| Gastric | 22 | 15 | Hu6800 | 7,129 | 5,526 |
| Colon | 6 | 3 | HG-U133A | 22,283 | 13,069 |
| OSCC* | 27 | 19 | HG-U133A | 22,283 | 13,069 |
| RCC# | 32 | 10 | HG-U133A | 22,283 | 13,069 |
Raw data were downloaded from the NCBI GEO database. *OSCC = oral squamous cell carcinoma; #RCC = renal cell carcinoma
Five most significant genes identified by CDEP as related to common metastatic mechanism across different cancer types by using FDR < 0.05 as threshold
| Up-regulated genes | Down-regulated genes |
|---|---|
| Glycoprotein (transmembrane) nmb ( | Serpin peptidase inhibitor, clade B (ovalbumin), member 5 ( |
| Secreted phosphoprotein 1 ( | proteasome (prosome, macropain) subunit, beta type, 9 ( |
| Transforming growth factor, beta-induced ( | myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 ( |
| Heat shock 27kDa protein 1 ( | interferon-induced protein with tetratricopeptide repeats 1 ( |
| Mesoderm specific transcript homolog ( | ubiquitin D ( |
Gene Set Enrichment Analysis to identify functional groups from CDEP identified genes.
| Functions (source) | Annotated genes | FDR |
|---|---|---|
| ECM-receptor interaction (KEGG) | 1.84 × 10-8 | |
| Focal adhesion (KEGG) | 2.02 × 10-7 | |
| Blood vessel development | 1.10 × 10-5 | |
| Immune response (GO) | 2.06 × 10-6 | |
| Inflammatory response (GO) | 5.95 × 10-5 |
The functional enrichment for the genes identified as consistently differentially expressed between primary and metastatic cancers.
Figure 5CDEP identified differentially expressed genes map to biological pathways relevant to cancer metastasis. A) ECM-receptor interaction pathway. B) Focal adhesion pathway. Up-regulated genes are annotated with red color, and down-regulated genes are in green.
| Symbol | Range | Annotation |
|---|---|---|
| 1,2,..., | Number of genes in a dataset with FDR lower than the threshold | |
| (0,Inf) | Expected value of the log likelihood with respect to the FDR threshold | |
| 1,2,..., | Number of false positives using the FDR threshold | |
| (0,1) | Gene-specific false discovery rate in dataset | |
| (0,1) | Gene-specific false discovery rate for having consistently differentially expressed patterns among the datasets studied | |
| 1,2,..., | Index for a gene from the union of gene sets across all datasets | |
| 1,2,..., | Index for fold change comparison between a case and a control from a dataset, where | |
| 1,2,..., | Index for a gene expression microarray dataset (consists of | |
| (0,1) | FDR threshold used to enumerate number of genes with FDR lower than this threshold in a dataset and to estimate the number of false positives under this threshold | |
| (0,1) | Gene- and FDR threshold- specific likelihood of observing the differential expressed pattern among the datasets | |
| 1,2,..., | number of genes that are not up(not down)-regulated in dataset | |
| (0,Inf) | Minus log likelihood | |
| (0,1) | False positive rate: the probability of a non-up-regulated (non-down-regulated) gene being falsely called as over-expressed (under-expressed) | |
| 0[ | Binary variable indicating gene | |
| 1,2,..., | rank of fold change for gene |