| Literature DB >> 27809777 |
H Robert Frost1, Christopher I Amos2.
Abstract
BACKGROUND: Gene set testing, or pathway analysis, is a bioinformatics technique that performs statistical testing on biologically meaningful sets of genomic variables. Although originally developed for supervised analyses, i.e., to test the association between gene sets and an outcome variable, gene set testing also has important unsupervised applications, e.g., p-value weighting. For unsupervised testing, however, few effective gene set testing methods are available with support especially poor for several biologically relevant use cases.Entities:
Keywords: Gene set testing; Marc̆enko-Pastur; Pathway analysis; Random matrix theory; Tracy-Widom
Mesh:
Substances:
Year: 2016 PMID: 27809777 PMCID: PMC5096314 DOI: 10.1186/s12859-016-1299-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Covariance structure examples
| Name | Model | Self-contained | Competitive |
|---|---|---|---|
| Identity |
| Accept | Accept |
|
|
|
|
|
| Single block |
| Reject | Reject |
| Multi-block |
| Reject | Reject |
| Anti-correlated multi-block |
| Reject | Reject |
| Inverted single block |
| Accept |
∗Accept |
| Repeated single block |
| Reject |
∗Reject |
|
|
|
|
|
Examples where the self-contained and competitive tests give different answers are in bold. For the inverted single block structure, a two-sided competitive null would be rejected whereas the one-sided competitive H would be accepted. For the repeated block structure, H 0 will be rejected since a random sample of g genes from among all p genes will likely include some pairs with 0 covariance
Simulation designs for type I error
| Type I error design # | Covariance structure |
| p | n |
|
|
|---|---|---|---|---|---|---|
| MVN-1 | Identity | 10 | 100 | 100 | 1 | 0 |
| MVN-2 | Scaled Identity | 10 | 100 | 100 | 2 | 0 |
| MVN-3 | Compound symmetry | 10 | 100 | 20 | 1 | 0.1 |
| MVN-4 | Compound symmetry | 10 | 100 | 50 | 1 | 0.1 |
| MVN-5 | Compound symmetry | 10 | 100 | 100 | 1 | 0.1 |
| MVN-6 | Compound symmetry | 10 | 100 | 100 | 2 | 0.1 |
| MVN-7 | Compound symmetry | 10 | 100 | 100 | 1 | 0.2 |
| Binomial-1 | Compound symmetry | 10 | 100 | 100 | 0.375 | 0.1 |
Simulation design models for assessing type I error control using a multivariate normal distribution (MVN-1 thru MVN-7) or a multivariate binomial (n=2, p=0.25) distribution (Bionomial-1) for x
Simulation designs for statistical power
| Power design # | Covariance structure |
| p | n |
|
|
|---|---|---|---|---|---|---|
| MVN-1 | Single block | 10 | 100 | 100 | 1 | 0.1 |
| MVN-2 | Single block | 10 | 100 | 50 | 1 | 0.1 |
| MVN-3 | Single block | 10 | 100 | 20 | 1 | 0.1 |
| MVN-4 | Single block | 10 | 100 | 100 | 1 | 0.15 |
| MVN-5 | Single block | 10 | 100 | 100 | 1.1 | 0 |
| MVN-6 | Single block | 10 | 100 | 100 | 1.15 | 0 |
| MVN-7 | Multi-block | 10 | 100 | 100 | 1 | 0.2/0 |
| MVN-8 | Anti-cor. multi-block | 10 | 100 | 100 | 1 | 0.1/-0.1 |
| MVN-9 | Repeated block | 10 | 100 | 100 | 1 | 0.1 |
| Binomial-1 | Single block | 10 | 100 | 100 | 0.375 | 0.1 |
Simulation designs for assessing statistical power using a multivariate normal distribution (MVN-1 thru MVN-9) or multivariate binomial (n=2, p=0.25) distribution (Binomial-1) for x
Average type I error rate
| # | Cov. struct., n, | TWT | MPDT | MLRT | SGSE |
|---|---|---|---|---|---|
| MVN-1 | Identity, 100, 1, 0 | 0.049 | 0.049 | 0.050 | 0.050 |
| MVN-2 | Scaled Identity, 100, 2, 0 | 0.050 | 0.045 | 0.049 | 0.052 |
| MVN-3 | Compound sym., 20, 1, 0.1 | 0.053 | 0.051 | 0.051 | 0.049 |
| MVN-4 | Compound sym., 50, 1, 0.1 | 0.049 | 0.045 | 0.048 | 0.051 |
| MVN-5 | Compound sym., 100, 1, 0.1 | 0.046 | 0.051 | 0.051 | 0.051 |
| MVN-6 | Compound sym., 100, 2, 0.1 | 0.048 | 0.051 | 0.048 | 0.046 |
| MVN-7 | Compound sym., 100, 1, 0.2 | 0.049 | 0.049 | 0.052 | 0.050 |
| Binom-1 | Compound sym., 100, 1, 0.1 | 0.044 | 0.054 | 0.053 | 0.048 |
Average type I error rate for each of the evaluated competitive methods computed on 1000 simulated data sets for the eight simulation designs detailed in Table 2
Average empirical power
| # | Cov. struct, n, | TWT | MPDT | MLRT | SGSE |
|---|---|---|---|---|---|
| MVN-1 | Single block, 100, 1, 0.1 |
| 0.35 | 0.79 | 0.76 |
| MVN-2 | Single block, 50, 1, 0.1 | 0.52 | 0.20 | 0.42 |
|
| MVN-3 | Single block, 20, 1, 0.1 | 0.24 | 0.16 | 0.18 |
|
| MVN-4 | Single block, 100, 1, 0.15 |
| 0.66 | 0.97 | 0.95 |
| MVN-5 | Single block, 100, 1.1, 0 |
| 0.38 | 0.16 | 0.10 |
| MVN-6 | Single block, 100, 1.15, 0 | 0.63 |
| 0.29 | 0.10 |
| MVN-7 | Multi-block, 100, 1, 0.2/0 | 0.31 | 0.28 |
| 0.19 |
| MVN-8 | Anti-cor. multi-block, 100, 1, 0.1/-0.1 |
| 0.32 | 0.74 | 0.06 |
| MVN-9 | Repeated block, 100, 1, 0.1 |
| 0.11 | 0.23 | 0.10 |
| Binom-1 | Single block,100, 1, 0.1 |
| 0.02 | 0.26 | 0.55 |
Average empirical power for each of the evaluated competitive methods computed on 1000 simulated data sets for the ten simulation designs detailed in Table 3. The largest average power found for each design is listed in bold
Leukemia gene expression results
| Gene set | Direction | GSE | Unweighted | MLRT | SGSE | TWT | MPDT |
|---|---|---|---|---|---|---|---|
|
|
| wFDR | wFDR | wFDR | wFDR | ||
| *GSE10325_{B}CELL_{V}S_{M}YELOID_{U}P (124) | ALL | 0.00225 | 0.999 | 0.655 | 0.711 |
| 1 |
| *GSE29618_{B}CELL_{V}S_{M}ONOCYTE_{D}AY … (130) | ALL | 0.00302 | 0.999 | 0.655 | 0.711 |
| 1 |
| GSE29618_{B}CELL_{V}S_{M}DC_{D}AY7_{F}LU … (126) | ALL | 0.0046 | 0.999 | 0.655 | 0.729 | 0.574 | 1 |
| GSE10325_{C}D4_{T}CELL_{V}S_{B}CELL_{D}N (132) | ALL | 0.005 | 0.999 | 1 | 0.711 | 0.574 | 1 |
| *GSE10325_{L}UPUS_{B}CELL_{V}S_{L}UPUS_ … (123) | ALL | 0.00563 | 0.999 | 0.431 | 0.711 |
| 1 |
| GSE29618_{B}CELL_{V}S_{M}DC_{U}P (133) | ALL | 0.00719 | 0.999 | 0.655 | 0.955 | 0.574 | 1 |
| GSE29618_{B}CELL_{V}S_{M}ONOCYTE_{U}P (108) | ALL | 0.00776 | 0.999 | 0.655 | 0.711 | 0.574 | 1 |
| GSE29618_{B}CELL_{V}S_{M}ONOCYTE_{D}AY … (143) | AML | 0.0137 | 0.999 | 0.655 | 0.711 | 0.574 | 1 |
| GSE24634_{T}REG_{V}S_{T}CONV_{P}OST_{D}A … (123) | AML | 0.0162 | 0.999 | 1 | 0.711 | 1 | 1 |
| GSE6269_{H}EALTHY_{V}S_{S}TREP_{A}UREU … (133) | AML | 0.0168 | 0.999 | 0.655 | 0.711 | 0.574 | 1 |
| GSE29618_{B}CELL_{V}S_{M}DC_{D}AY7_{F}LU … (126) | AML | 0.0171 | 0.999 | 0.655 | 0.711 | 1 | 1 |
| GSE6269_{H}EALTHY_{V}S_{S}TREP_{P}NEUM … (134) | AML | 0.0174 | 0.999 | 0.655 | 0.711 | 0.742 | 1 |
| GSE15767_{M}ED_{V}S_{S}CS_{M}AC_{L}N_{U}P (117) | AML | 0.0229 | 0.999 | 1 | 0.711 | 1 | 1 |
| GSE6269_{E}_COLI_{V}S_{S}TREP_{A}UREUS … (130) | AML | 0.0245 | 0.999 | 1 | 0.711 | 1 | 1 |
| GSE22886_{N}AIVE_{C}D8_{T}CELL_{V}S_{N}E … (122) | AML | 0.0295 | 0.999 | 1 | 0.729 | 1 | 1 |
| GSE6269_{F}LU_{V}S_{E}_COLI_{I}NF_{P}BMC … (128) | AML | 0.0306 | 0.999 | 1 | 0.711 | 1 | 1 |
| GSE29618_{M}ONOCYTE_{V}S_{P}DC_{U}P (126) | AML | 0.0333 | 0.999 | 0.659 | 0.711 | 0.706 | 1 |
| GSE6269_{H}EALTHY_{V}S_{S}TREP_{A}UREU … (109) | ALL | 0.0353 | 0.999 | 0.906 | 0.907 | 0.592 | 1 |
| GSE3982_{M}EMORY_{C}D4_{T}CELL_{V}S_{B}C … (73) | ALL | 0.0361 | 0.999 | 1 | 0.995 | 1 | 1 |
| GSE360_{C}TRL_{V}S_{M}_TUBERCULOSIS_ … (71) | AML | 0.0364 | 0.999 | 1 | 0.711 | 1 | 1 |
| GSE11057_{E}FF_{M}EM_{V}S_{C}ENT_{M}EM_{C} … (85) | AML | 0.0381 | 0.999 | 1 | 0.729 | 1 | 1 |
| GSE10325_{L}UPUS_{B}CELL_{V}S_{L}UPUS_ … (88) | AML | 0.0384 | 0.999 | 0.655 | 0.711 | 0.998 | 1 |
| GSE360_{C}TRL_{V}S_{L}_DONOVANI_{D}C_{D} … (75) | AML | 0.0403 | 0.999 | 1 | 0.729 | 1 | 1 |
| GSE22886_{N}AIVE_{C}D4_{T}CELL_{V}S_{N}E … (92) | AML | 0.0403 | 0.999 | 1 | 0.758 | 1 | 1 |
| GSE3982_{C}TRL_{V}S_{P}MA_{S}TIM_{E}OSIN … (93) | AML | 0.0427 | 0.999 | 1 | 0.711 | 1 | 1 |
Results for the MSigDB C7 v5.0 collection and the Armstrong et al. [34] leukemia gene expression data (1910 total gene sets after size-based filtering). Significant q-values are marked in bold with a *before the gene set name
p53 gene expression results
| Gene set | Direction | GSE | Unweighted | MLRT | SGSE | TWT | MPDT |
|---|---|---|---|---|---|---|---|
|
|
| wFDR | wFDR | wFDR | wFDR | ||
| ∗ P53_DN.V1_UP (10) | MUT | 5.21e-09 |
|
|
|
|
|
| ∗ P53_DN.V1_DN (14) | WT | 8.67e-07 |
|
|
|
|
|
| RB_P130_DN.V1_DN (123) | MUT | 0.0342 | 0.914 | 1 | 0.505 | 1 | 1 |
| BCAT.100_UP.V1_UP (103) | MUT | 0.0385 | 0.914 | 1 | 0.517 | 1 | 1 |
| ∗ VEGF_A_UP.V1_DN (114) | MUT | 0.0432 | 0.914 | 0.315 | 0.505 |
| 1 |
| EGFR_UP.V1_DN (136) | MUT | 0.046 | 0.914 | 0.315 | 0.505 | 1 | 0.989 |
| RB_DN.V1_DN (110) | MUT | 0.056 | 0.914 | 1 | 0.505 | 1 | 1 |
| ∗ CORDENONSI_YAP_CONSERVED_SIGNA… (124) | MUT | 0.0593 | 0.914 | 0.324 | 0.505 |
|
|
| RAF_UP.V1_UP (98) | MUT | 0.0597 | 0.914 | 0.873 | 0.505 | 1 | 1 |
| SRC_UP.V1_UP (98) | WT | 0.0604 | 0.914 | 1 | 1 | 1 | 1 |
| ∗ RPS14_DN.V1_DN (98) | MUT | 0.0651 | 0.914 | 1 | 0.734 | 1 |
|
| HOXA9_DN.V1_DN (105) | MUT | 0.0685 | 0.914 | 1 | 0.505 | 1 | 1 |
| CSR_EARLY_UP.V1_UP (159) | MUT | 0.0719 | 0.914 | 0.958 | 0.505 | 1 | 1 |
| TBK1.DF_DN (150) | MUT | 0.0797 | 0.914 | 1 | 0.56 | 1 | 1 |
| EGFR_UP.V1_UP (17) | MUT | 0.0832 | 0.914 | 0.332 | 0.517 | 1 | 1 |
| MEK_UP.V1_UP (9) | MUT | 0.085 | 0.914 | 0.332 | 0.505 | 1 | 1 |
| *GCNP_SHH_UP_EARLY.V1_UP (170) | MUT | 0.0875 | 0.914 | 0.958 | 1 |
| 1 |
| ESC_J1_UP_EARLY.V1_DN (156) | MUT | 0.0899 | 0.914 | 1 | 0.734 | 1 | 1 |
| ERB2_UP.V1_UP (170) | MUT | 0.103 | 0.914 | 1 | 0.56 | 1 | 1 |
| KRAS.300_UP.V1_UP (155) | WT | 0.108 | 0.914 | 1 | 0.994 | 1 | 1 |
| BRCA1_DN.V1_UP (64) | WT | 0.125 | 0.914 | 1 | 1 | 1 | 1 |
| AKT_UP.V1_UP (87) | MUT | 0.127 | 0.914 | 1 | 0.821 | 1 | 1 |
| ERB2_UP.V1_DN (90) | MUT | 0.13 | 0.914 | 0.444 | 0.56 | 1 | 1 |
| STK33_SKM_DN (97) | MUT | 0.139 | 0.914 | 1 | 0.56 | 1 | 1 |
| ALK_DN.V1_UP (100) | WT | 0.146 | 0.914 | 1 | 1 | 1 | 1 |
Results for the MSigDB C6 v5.0 collection and p53 [4] gene expression data (188 total gene sets after size-based filtering). Significant q-values are marked in bold with a *before the gene set name