| Literature DB >> 22458711 |
Xingbin Wang1, Yan Lin, Chi Song, Etienne Sibille, George C Tseng.
Abstract
BACKGROUND: Detecting candidate markers in transcriptomic studies often encounters difficulties in complex diseases, particularly when overall signals are weak and sample size is small. Covariates including demographic, clinical and technical variables are often confounded with the underlying disease effects, which further hampers accurate biomarker detection. Our motivating example came from an analysis of five microarray studies in major depressive disorder (MDD), a heterogeneous psychiatric illness with mostly uncharacterized genetic mechanisms.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22458711 PMCID: PMC3342232 DOI: 10.1186/1471-2105-13-52
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The direction of covariates effect in RIM_minP (Table 1A) and RIM_BIC (Table 1B) models for 9 MDD related genes selected from the literature
| S | Age | Alcohol | Antidep | pH | PMI | Suicide | Co-appearance T1 | Concordance T2 | Ratio R | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| VGF | 0 | 0 | -1 | 0 | -1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 7 | 7 | |
| SST | 0 | 0 | -1 | 0 | -1 | 1 | 1 | 0 | 1 | 0 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 7 | 7 | |
| CNP | 0 | 1 | 0 | 1 | 0 | -1 | 0 | 0 | 1 | 0 | -1 | 0 | 1 | 0 | -1 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 1 | 0 | 0 | 5 | 2 | |
| NPY | 0 | -1 | -1 | 0 | -1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 10 | 7 | |
| TAC1 | 0 | -1 | -1 | 0 | -1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 0 | 1 | 7 | 5 | |
| MBP | 0 | 0 | 0 | 1 | 1 | 0 | -1 | 0 | 1 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | -S1 | 0 | 1 | 0 | -1 | 5 | 2 | |
| MOBP | 0 | 0 | 0 | 0 | 1 | 0 | -1 | 1 | 0 | 0 | -1 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | -1 | 0 | 1 | -1 | -1 | 8 | 4 | |
| RGS4 | -1 | 0 | 0 | 0 | -1 | 0 | 1 | -1 | 1 | 0 | 1 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 3 | |
| HTR2A | 0 | -1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | -1 | -1 | 0 | -1 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 5 | 3 | |
| Total | T1 = 60 | T2 = 40 | R = T2/T1 = 0.67 | ||||||||||||||||||||||||||||||
| p-value | 0.39 | 0.014 | |||||||||||||||||||||||||||||||
| Age | Alcohol | Antidep | pH | PMI | Suicide | Co-appearance T1 | Concordance T2 | Ratio R = T2/T1 | |||||||||||||||||||||||||
| A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | 1 | 1 | ||
| VGF | 0 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 7 | 7 | |
| SST | -1 | 0 | -1 | -1 | -1 | 0 | 1 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 3 | 3 | |
| CNP | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | -1 | 0 | 7 | 7 | |
| NPY | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 6 | 6 | |
| TAC1 | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| MBP | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 3 | 3 | |
| MOBP | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 3 | 3 | |
| RGS4 | -1 | 0 | 0 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | |
| HTR2A | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | -1 | -1 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| Total | T1 = 32 | T2 = 32 | R = T2/T1 = 1 | ||||||||||||||||||||||||||||||
| p-value | 0.011 | 0.005 | |||||||||||||||||||||||||||||||
* 0: variable not included in the model; 1: appear in the model with positive effect size; -1: appear in the model with negative effect size.
A: MD1_ACC, B: MD2_ACC, C: MD3_ACC, D: MD1_AMY, E: MD3_AMY
Number of detected DE genes using different single study analysis methods (PT, RIM_ALL, RIM_minP and RIM_BIC) in the five individual studies and by two meta-analysis methods (Fisher and maxP)
| method | FDR | Individual analysis | Meta-analysis | |||||
|---|---|---|---|---|---|---|---|---|
| MD1_ACC | MD2_ACC | MD3_ACC | MD1_AMY | MD3_AMY | Fisher | maxP | ||
| RIM_minP | FDR = 0.05 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
| FDR = 0.1 | 0 | 0 | 2 | 0 | 725 | 109 | 99 | |
| FDR = 0.15 | 0 | 0 | 5 | 0 | 1442 | 810 | 683 | |
| RIM_BIC | FDR = 0.05 | 0 | 0 | 0 | 0 | 101 | 0 | 0 |
| FDR = 0.1 | 0 | 0 | 1 | 0 | 506 | 0 | 0 | |
| FDR = 0.15 | 0 | 0 | 6 | 0 | 873 | 38 | 0 | |
| RIM_ALL | FDR = 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| FDR = 0.1 | 0 | 3 | 0 | 0 | 1 | 0 | 1 | |
| FDR = 0.15 | 0 | 3 | 1 | 0 | 1 | 0 | 1 | |
| PT | FDR = 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| FDR = 0.1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| FDR = 0.15 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
Three FDR thresholds are used (5%, 10% and 15%)
Figure 1Three correlation structures of interest among disease variables X, gene expression variable Y and putative confounding covariates Z that are used in the simulation. Scenario I: gene expression depends on both disease state and covariates. Scenario II: gene expression depends only on disease state. Scenario III: gene expression depends on disease state directly and depends on covariates indirectly through disease state.
Figure 2An illustrative diagram of the proposed statistical framework.
Figure 3Comparison of number of detected DE genes in individual study analyses of RIM_minP, RIM_BIC, RIM_ALL, and PT. The result showed that RIM_minP detected the largest number of DE genes among the four methods.
Figure 4Comparison of number of detected DE genes in individual study analyses of RIM_minP and FEM_minP. The result showed that RIM_minP usually detected more DE genes.
Figure 5Comparison of meta-analyse and individual analysis based on pathway analysis criterion across RIM_minP, RIM_BIC and paired t-test. The results showed that meta-analysis produced DE analysis results with stronger association with the top 100 disease-related surrogate pathways.
Frequency of covariates appearing in RIM_minP model selection among 683 DE genes detected by maxP method under threshold FDR = 15%
| MD1_ACC | MD2_ACC | MD3_ACC | MD1_AMY | MD3_AMY | Rank average | |
|---|---|---|---|---|---|---|
| Age | 142(5) | 213 (4) | 205 (4) | 173 (3) | 218 (3) | 3.6 |
| Alcohol | 299 (2) | 279 (2) | 221 (3) | 368 (1) | 195 (4) | 2.4 |
| Antidepressant | 348 (1) | 119 (6) | 271 (2) | 346 (2) | 362 (1) | 2.4 |
| pH | 208 (3) | 150 (4) | 116 (6) | 108 (6) | 86 (6) | 5 |
| PMI | 93 (6) | 120 (5) | 141 (5) | 133 (5) | 120 (5) | 5.2 |
| Suicide | 150 (4) | 325 (1) | 340 (1) | 149 (4) | 322 (2) | 2.4 |
Rank is shown in parentheses and rank average of each covariate is calculated to indicate relative degree of frequency that a covariate impacts gene expressions and confounds with disease effect.
Evaluation of t-test, FEM_minP, FEM_BIC and FEM_ALL methods by simulations
| Type I error (s.e.) | Power (%) (s.e) | Number of DE genes (s.e) | # of variables in Z selected | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| I | 0.051 | 0.046 | 0.049 | 0.051 | 67.9 | 72.9 | 74.6 | 69.7 | 12.5 | 20.4 | 23.3 | 17.6 | 0.97/1.97* | 0.78/1.21* |
| II | 0.051 | 0.052 | 0.050 | 0.051 | 93.8 | 92.9 | 92.5 | 85.0 | 73.4 | 73.0 | 69.8 | 49.7 | 1.7 | 0.59 |
| III | 0.051 | 0.053 | 0.051 | 0.051 | 93.8 | 92.5 | 91.6 | 85.1 | 71.8 | 68.3 | 66.5 | 45.8 | 1.8 | 0.6 |
*The denominator showed average number of variables in Z selected. The numerator showed average number of selected variables that belong to the true confounders (z).
The average of type I errors, average of statistical powers, and average number of detected DE genes by each method are shown. Standard errors are shown in parentheses. In the last two columns, the average numbers of confounding variables selected by FEM_minP and FEM_BIC are shown.
A: t-test, B: FEM_minP, C: FEM_BIC, D: FEM_ALL