| Literature DB >> 22018358 |
Svitlana Tyekucheva1, Luigi Marchionni, Rachel Karchin, Giovanni Parmigiani.
Abstract
We introduce and evaluate data analysis methods to interpret simultaneous measurement of multiple genomic features made on the same biological samples. Our tools use gene sets to provide an interpretable common scale for diverse genomic information. We show we can detect genetic effects, although they may act through different mechanisms in different samples, and show we can discover and validate important disease-related gene sets that would not be discovered by analyzing each data type individually.Entities:
Mesh:
Year: 2011 PMID: 22018358 PMCID: PMC3333775 DOI: 10.1186/gb-2011-12-10-r105
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
P-values for top 30 gene sets discovered by the integrative method using the competitive gene sets test
| Pathway | E1 | E2 | C1 | C2 | INT |
|---|---|---|---|---|---|
| AMINOSUGARS_METABOLISM* | 0.0132 | 0.4429 | 0.4686 | 0.0085 | 0.0008 |
| 0.1180 | 0.0822 | 0.8774 | 0.1301 | 0.0024 | |
| STARCH_AND_SUCROSE_METABOLISM* | 0.0163 | 0.0457 | 0.6400 | 0.7045 | 0.0048 |
| TRANSLATION_FACTORS | 0.0006 | 0.0007 | 0.0700 | 0.0505 | 0.0092 |
| HSA00860_PORPHYRIN_AND_CHLOROPHYLL_METABOLISM | 0.0034 | 0.1760 | 0.1306 | 0.1287 | 0.0098 |
| 0.1321 | 0.2590 | 0.1119 | 0.0716 | 0.0105 | |
| HSA00624_1_AND_2_METHYLNAPHTHALENE_DEGRADATION | 0.0157 | 0.0037 | 0.3096 | 0.4940 | 0.0115 |
| HSA03050_PROTEASOME | < 10-4 | 0.0021 | 0.7431 | 0.9031 | 0.0123 |
| 0.1243 | 0.1364 | 0.9467 | 0.2227 | 0.0124 | |
| HSA00530_AMINOSUGARS_METABOLISM* | 0.0123 | 0.2232 | 0.8209 | 0.2205 | 0.0130 |
| 0.2724 | 0.6586 | 0.2115 | 0.0618 | 0.0140 | |
| 0.3783 | 0.0589 | 0.0636 | 0.0533 | 0.0175 | |
| HSA00500_STARCH_AND_SUCROSE_METABOLISM* | 0.0328 | 0.3084 | 0.7397 | 0.8215 | 0.0188 |
| 0.3108 | 0.1873 | 0.6021 | 0.1767 | 0.0198 | |
| PORPHYRIN_AND_CHLOROPHYLL_METABOLISM | 0.0444 | 0.1849 | 0.3903 | 0.0703 | 0.0206 |
| 0.3816 | 0.4798 | 0.2583 | 0.1846 | 0.0230 | |
| KREBS_TCA_CYCLE | 0.3707 | 0.0663 | 0.3670 | 0.0117 | 0.0246 |
| IL10PATHWAY | 0.0391 | 0.0166 | 0.1396 | 0.0650 | 0.0263 |
| BLOOD_GROUP_GLYCOLIPID_BIOSYNTHESIS_LACTOSERIES | 0.0070 | 0.3230 | 0.4054 | 0.6334 | 0.0268 |
| HSA00642_ETHYLBENZENE_DEGRADATION | 0.0263 | 0.0198 | 0.1641 | 0.4937 | 0.0306 |
| 0.1347 | 0.2727 | 0.2331 | 0.4541 | 0.0312 | |
| 0.1987 | 0.2374 | 0.5055 | 0.0538 | 0.0333 | |
| CYTOKINEPATHWAY | 0.0238 | 0.0439 | 0.1226 | 0.2484 | 0.0400 |
| 0.8890 | 0.2008 | 0.3227 | 0.0549 | 0.0402 | |
| HSA00051_FRUCTOSE_AND_MANNOSE_METABOLISM* | 0.5893 | 0.7157 | 0.3759 | 0.0078 | 0.0408 |
| GLYCOSAMINOGLYCAN_DEGRADATION* | 0.0135 | 0.4210 | 0.0348 | 0.0208 | 0.0464 |
| 0.1634 | 0.0849 | 0.7657 | 0.5843 | 0.0488 | |
| 0.1634 | 0.0849 | 0.7657 | 0.5843 | 0.0488 | |
| PROTEASOMEPATHWAY | 0.0015 | 0.0702 | 0.6308 | 0.7039 | 0.0497 |
| HSA00052_GALACTOSE_METABOLISM* | 0.2507 | 0.4202 | 0.5990 | 0.0354 | 0.0497 |
The sets discovered using integration but not using any of the single-data-type analyses are in bold font. An asterisk denotes sets related to sugar metabolic processes. Discoveries are sets whose P-value is < 0.05. E1 and E2, single data type analysis using expression data; C1 and C2, single data type analysis using copy number data; INT, integrative method.
Figure 1Heatmaps of the two sample t-statistic from each data type between long- and short-term survival phenotypes. Color keys larger than 1.9 or smaller than -1.9 approximate statistical significance of the difference. Positive t-statistic means higher average measurements for short-term survivors.
P-values for top 30 gene sets discovered by the integrative method using the self-contained gene sets test
| Pathway | E1 | E2 | C1 | C2 | INT | INT for validation set |
|---|---|---|---|---|---|---|
| HSA04810_REGULATION_OF_ACTIN_CYTOSKELETON | 0.0697 | 0.2250 | < 10-4 | < 10-4 | < 10-4 | 0.0001 |
| HSA04010_MAPK_SIGNALING_PATHWAY | 0.1302 | 0.0746 | < 10-4 | < 10-4 | < 10-4 | < 10-4 |
| HSA04060_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION | 0.1422 | 0.0132 | < 10-4 | < 10-4 | < 10-4 | 0.0147 |
| HSA04310_WNT_SIGNALING_PATHWAY | 0.2389 | 0.0039 | < 10-4 | < 10-4 | < 10-4 | 0.0001 |
| HSA04080_NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION | 0.1164 | 0.8695 | 0.0001 | 0.0005 | < 10-4 | 0.0042 |
| HSA00230_PURINE_METABOLISM | 0.0034 | 0.0503 | 0.0000 | 0.0003 | < 10-4 | 0.0022 |
| TRANSLATION_FACTORS | 0.0208 | 0.0155 | 0.0022 | 0.0052 | 0.0001 | 0.2711 |
| HSA01030_GLYCAN_STRUCTURES_BIOSYNTHESIS_1 | 0.0781 | 0.0292 | 0.0004 | 0.0003 | 0.0001 | 0.0454 |
| HSA04360_AXON_GUIDANCE | 0.4293 | 0.1803 | < 10-4 | 0.0106 | 0.0001 | 0.0001 |
| GLUCONEOGENESIS* | 0.0435 | 0.1010 | 0.0613 | 0.0106 | 0.0001 | 0.0409 |
| GLYCOLYSIS* | 0.0435 | 0.1010 | 0.0613 | 0.1065 | 0.0001 | 0.0409 |
| HSA00500_STARCH_AND_SUCROSE_METABOLISM* | 0.0448 | 0.2669 | 0.0140 | 0.0043 | 0.0001 | 0.0364 |
| HSA04630_JAK_STAT_SIGNALING_PATHWAY | 0.2975 | 0.0090 | 0.0022 | 0.0041 | 0.0002 | 0.0141 |
| HSA04510_FOCAL_ADHESION | 0.0423 | 0.0926 | 0.0042 | 0.0374 | 0.0002 | 10-4 |
| MRNA_PROCESSING_REACTOME | 0.1660 | 0.0062 | 0.0537 | 0.0004 | 0.0003 | < 10-4 |
| HSA05215_PROSTATE_CANCER | 0.0534 | 0.3618 | 0.0013 | 0.0534 | 0.0004 | 0.0008 |
| HSA01031_GLYCAN_STRUCTURES_BIOSYNTHESIS_2 | 0.0084 | 0.0076 | 0.0312 | 0.0141 | 0.0004 | 0.0926 |
| HSA00240_PYRIMIDINE_METABOLISM | 0.0013 | 0.0307 | 0.0022 | 0.0065 | 0.0004 | 0.0063 |
| HSA04210_APOPTOSIS | 0.1207 | 0.2064 | 0.0003 | 0.0001 | 0.0004 | 0.0021 |
| HSA04620_TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY | 0.1986 | 0.1228 | 0.0001 | 0.0109 | 0.0004 | 0.0006 |
| GLYCOLYSIS_AND_GLUCONEOGENESIS* | 0.0209 | 0.1407 | 0.0476 | 0.0112 | 0.0005 | 0.1808 |
| HSA04660_T_CELL_RECEPTOR_SIGNALING_PATHWAY | 0.0192 | 0.1176 | 0.0141 | 0.0027 | 0.0006 | 0.0661 |
| HSA00051_FRUCTOSE_AND_MANNOSE_METABOLISM* | 0.1565 | 0.2238 | 0.0314 | 0.0112 | 0.0006 | 0.2631 |
| HSA00010_GLYCOLYSIS_AND_GLUCONEOGENESIS* | 0.0515 | 0.1333 | 0.1276 | 0.0148 | 0.0007 | 0.0647 |
| INTEGRIN_MEDIATED_CELL_ADHESION_KEGG | 0.2779 | 0.1734 | 0.0017 | 0.0038 | 0.0007 | 0.0001 |
| HSA04640_HEMATOPOIETIC_CELL_LINEAGE | 0.0634 | 0.0676 | 0.0140 | 0.0009 | 0.0007 | 0.0026 |
| GPCRDB_CLASS_A_RHODOPSIN_LIKE | 0.3429 | 0.4710 | 0.0003 | 0.0483 | 0.0008 | 0.0449 |
| HSA00350_TYROSINE_METABOLISM | 0.0219 | 0.0421 | 0.0772 | 0.0085 | 0.0008 | 0.0648 |
| HSA05221_ACUTE_MYELOID_LEUKEMIA | 0.0697 | 0.3196 | 0.0213 | 0.0105 | 0.0009 | 0.0310 |
| CALCINEURIN_NF_AT_SIGNALING | 0.1302 | 0.1298 | 0.0205 | 0.0005 | 0.0010 | 0.1128 |
Asterisks indicate sets related to sugar metabolic processes. E1 and E2, single data type analysis using expression data; C1 and C2, single data type analysis using copy number data; INT, integrative method.
Figure 2Simulation results. (a) Average number of discovered spiked-in sets among the top ten sets that are inferred to be enriched for genes discriminating between two phenotypes, against the expected fraction of the altered genes; β = 0.5. (b) Average ROC curves; β = 0.5, γ = 0.1. INT, integrative method.
Areas under the ROC curves for various simulation scenarios
| Method | ||||||
|---|---|---|---|---|---|---|
| β | γ | E1 | C1 | INT | AvgP | MinP |
| Synthetic gene sets | ||||||
| 0.166 | 0.1 | 0.59 (0.1) | 0.58 (0.09) | 0.84 (0.07) | 0.65 (0.09) | 0.64 (0.09) |
| 0.5 | 0.83 (0.1) | 0.79 (0.1) | 0.98 (0.03) | 0.91 (0.06) | 0.89 (0.06) | |
| 1 | 0.94 (0.07) | 0.92 (0.07) | 0.97 (0.03) | 0.98 (0.03) | 0.97 (0.03) | |
| 0.5 | 0.1 | 0.69 (0.09) | 0.7 (0.09) | 0.96 (0.03) | 0.80 (0.07) | 0.77 (0.07) |
| 0.5 | 0.94 (0.05) | 0.94 (0.05) | 1 (0.01) | 0.99 (0.02) | 0.98 (0.02) | |
| 1 | 0.99 (0.02) | 0.99 (0.01) | 1 (0.01) | 1 (0.01) | 1 (0.01) | |
| 0.84 | 0.1 | 0.73 (0.08) | 0.74 (0.08) | 0.98 (0.02) | 0.85 (0.06) | 0.81 (0.07) |
| 0.5 | 0.97 (0.03) | 0.97 (0.03) | 1 (0) | 1 (0.01) | 0.99 (0.01) | |
| 1 | 1 (0) | 1 (0) | 1 (0) | 1 (0) | 1 (0) | |
| Chromosome bands | ||||||
| 0.166 | 0.1 | 0.65 (0.09) | 0.55 (0.1) | 0.78 (0.07) | 0.63 (0.09) | 0.64 (0.09) |
| 0.5 | 0.9 (0.08) | 0.67 (0.09) | 0.98 (0.02) | 0.83 (0.06) | 0.83 (0.06) | |
| 1 | 0.98 (0.05) | 0.72 (0.11) | 0.95 (0.03) | 0.93 (0.04) | 0.93 (0.04) | |
| 0.5 | 0.1 | 0.72 (0.08) | 0.60 (0.09) | 0.90 (0.04) | 0.70 (0.08) | 0.70 (0.08) |
| 0.5 | 0.99 (0.02) | 0.88 (0.04) | 1 (0) | 0.95 (0.02) | 0.93 (0.02) | |
| 1 | 1 (0.01) | 0.94 (0.06) | 1 (0) | 0.99 (0.01) | 0.99 (0.01) | |
| 0.84 | 0.1 | 0.76 (0.08) | 0.62 (0.09) | 0.93 (0.03) | 0.74 (0.07) | 0.74 (0.07) |
| 0.5 | 1 (0.01) | 0.93 (0.03) | 1 (0) | 0.98 (0.01) | 0.96 (0.02) | |
| 1 | 1 (0) | 0.99 (0.01) | 1 (0) | 1 (0) | 0.99 (0) | |
| Canonical pathways | ||||||
| 0.166 | 0.1 | 0.56 (0.08) | 0.55 (0.08) | 0.71 (0.07) | 0.58 (0.08) | 0.57 (0.07) |
| 0.5 | 0.70 (0.09) | 0.70 (0.08) | 0.87 (0.04) | 0.78 (0.06) | 0.76 (0.06) | |
| 1 | 0.82 (0.08) | 0.82 (0.07) | 0.87 (0.05) | 0.88 (0.04) | 0.86 (0.05) | |
| 0.5 | 0.1 | 0.60 (0.08) | 0.61 (0.08) | 0.80 (0.06) | 0.66 (0.07) | 0.64 (0.07) |
| 0.5 | 0.81 (0.06) | 0.82 (0.06) | 0.92 (0.02) | 0.89 (0.04) | 0.87 (0.04) | |
| 1 | 0.91 (0.04) | 0.92 (0.03) | 0.93 (0.04) | 0.94 (0.02) | 0.92 (0.02) | |
| 0.84 | 0.1 | 0.62 (0.08) | 0.63 (0.08) | 0.82 (0.05) | 0.70 (0.07) | 0.68 (0.06) |
| 0.5 | 0.84 (0.05) | 0.85 (0.04) | 0.93 (0.02) | 0.91 (0.03) | 0.90 (0.03) | |
| 1 | 0.93 (0.02) | 0.94 (0.02) | 0.94 (0.02) | 0.95 (0.02) | 0.93 (0.02) | |
Mean (and standard deviation) of the area under the ROC curve for selected values of the signal strength (β), and expected fraction of altered genes (γ). E1, single data type analysis using expression data; C1, single data type analysis using copy number data; INT, analysis of four data types using the integrative method; AvgP and MinP, analysis of four data types using meta-analytical approach.
Figure 3True positive exclusively discovered sets. (a,b) The average fraction of true positive exclusively discovered sets by each of traditional one-dimensional analysis (EF in Materials and methods section) (a) and integrative and meta-analytical methods (EF* in Materials and methods section) (b). β = 0.5, γ = 0.1. INT, integrative method.