| Literature DB >> 24651673 |
Bisakha Ray1, Mikael Henaff2, Sisi Ma1, Efstratios Efstathiadis1, Eric R Peskin1, Marco Picone3, Tito Poli4, Constantin F Aliferis5, Alexander Statnikov6.
Abstract
The spectrum of modern molecular high-throughput assaying includes diverse technologies such as microarray gene expression, miRNA expression, proteomics, DNA methylation, among many others. Now that these technologies have matured and become increasingly accessible, the next frontier is to collect "multi-modal" data for the same set of subjects and conduct integrative, multi-level analyses. While multi-modal data does contain distinct biological information that can be useful for answering complex biology questions, its value for predicting clinical phenotypes and contributions of each type of input remain unknown. We obtained 47 datasets/predictive tasks that in total span over 9 data modalities and executed analytic experiments for predicting various clinical phenotypes and outcomes. First, we analyzed each modality separately using uni-modal approaches based on several state-of-the-art supervised classification and feature selection methods. Then, we applied integrative multi-modal classification techniques. We have found that gene expression is the most predictively informative modality. Other modalities such as protein expression, miRNA expression, and DNA methylation also provide highly predictive results, which are often statistically comparable but not superior to gene expression data. Integrative multi-modal analyses generally do not increase predictive signal compared to gene expression data.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24651673 PMCID: PMC3961740 DOI: 10.1038/srep04411
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of predictivity of various modalities for each dataset/task. Predictivity is measured by the area under ROC curve (AUC). AUC values listed in the table were optimized over uni-modal methods given in Table S3. The highlighting of each cell corresponds to relative predictivity for each dataset/task – the more predictive is modality for a given dataset/task, the darker is its highlighting
| TCGA_BRCA1.R1 | TCGA_BRCA1.R2 | TCGA_BRCA1.R3 | TCGA_BRCA1.R4 | TCGA_BRCA1.R5 | TCGA_BRCA1.R6 | TCGA_BRCA1.R7 | TCGA_BRCA1.R8 | TCGA_BRCA2.R1 | TCGA_BRCA2.R2 | TCGA_BRCA2.R3 | TCGA_BRCA2.R4 | TCGA_BRCA2.R5 | TCGA_BRCA2.R6 | TCGA_BRCA2.R7 | TCGA_BRCA2.R8 | TCGA_OVCA.R1 | TCGA_OVCA.R2 | TCGA_OVCA.R3 | TCGA_OVCA.R4 | TCGA_OVCA.R5 | TCGA_OVCA.R6 | TCGA_OVCA.R7 | TCGA_OVCA.R8 | TCGA_OVCA.R9 | TCGA_OVCA.R10 | TCGA_OVCA.R11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.590 | 0.656 | 0.646 | 0.646 | 0.633 | 0.630 | 0.725 | 0.594 | 0.578 | 0.587 | 0.697 | 0.689 | 0.604 | 0.806 | 0.759 | 0.721 | 0.667 | 0.690 | 0.657 | 0.543 | 0.650 | 0.577 | 0.582 | 0.549 | 0.614 | 0.639 | 0.652 | |
| 0.661 | 0.637 | 0.612 | 0.979 | 0.969 | 0.836 | 0.606 | 0.764 | 0.661 | 0.672 | 0.721 | 0.935 | 0.910 | 0.860 | 0.860 | 0.750 | 0.744 | 0.768 | 0.811 | 0.734 | 0.670 | 0.618 | 0.608 | 0.566 | 0.609 | 0.714 | 0.763 | |
| 0.650 | 0.743 | 0.733 | 0.983 | 0.953 | 0.812 | 0.713 | 0.733 | 0.584 | 0.618 | 0.705 | 0.936 | 0.881 | 0.845 | 0.644 | 0.716 | 0.695 | 0.687 | 0.688 | 0.704 | 0.665 | 0.592 | 0.610 | 0.581 | 0.602 | 0.738 | 0.757 | |
| 0.589 | 0.618 | 0.668 | 0.845 | 0.801 | 0.748 | 0.636 | 0.691 | 0.651 | 0.659 | 0.627 | 0.709 | 0.716 | 0.796 | 0.744 | 0.759 | - | - | - | - | - | - | - | - | - | - | - | |
| 0.695 | 0.637 | 0.619 | 0.983 | 0.935 | 0.753 | 0.681 | 0.697 | 0.680 | 0.688 | 0.723 | 0.921 | 0.821 | 0.856 | 0.883 | 0.713 | 0.728 | 0.772 | 0.745 | 0.707 | 0.670 | 0.642 | 0.575 | 0.627 | 0.590 | 0.675 | 0.682 | |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 0.761 | 0.723 | 0.741 | 0.803 | 0.698 | 0.611 | 0.584 | 0.593 | 0.601 | 0.658 | 0.742 | |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Comparison of various modalities with gene expression in terms of mean AUC (computed over datasets/tasks where data from both modalities was available)
| Mean AUC | Comparison of mean AUC differences | ||||
|---|---|---|---|---|---|
| Modality | Number of datasets/tasks where this modality was measured | this modality | gene expression modality | p-value | p-value adjusted for multiple comp. |
| 47 | 0.686 | <0.788 | <10-5 | ||
| 27 | 0.725 | <0.742 | 0.1910 | 0.2183 | |
| 16 | 0.704 | <0.777 | 0.0023 | ||
| 27 | 0.730 | <0.742 | 0.1290 | 0.1720 | |
| 14 | 0.730 | <0.742 | 0.2585 | 0.2585 | |
| 3 | 0.783 | <0.929 | <10−5 | ||
| 1 | 0.947 | <0.979 | 0.0502 | 0.0803 | |
| 16 | 0.708 | <0.826 | <10−5 | ||
*Since there is only one dataset for tumor imaging, significance in difference between AUC was not assessed by permutation testing, but instead was assessed by the method of Delong43.
Comparison of various modalities with gene expression in terms of proportion of dataset/tasks where two modalities achieve ‘statistically optimal’ AUC
| Number of datasets/tasks | Comparison of proportions | ||||
|---|---|---|---|---|---|
| Modality | where this modality was measured | where this modality had ‘statistically optimal’ performance | where gene expression had ‘statistically optimal’ performance | p-value | p-value adjusted for multiple comp. |
| 47 | 15 | <43 | <10−5 | ||
| 27 | 22 | <25 | 0.2242 | 0.3587 | |
| 16 | 9 | <14 | 0.0493 | 0.0986 | |
| 27 | 23 | <25 | 0.3865 | 0.5153 | |
| 14 | 14 | = 14 | 1 | 1 | |
| 3 | 0 | <3 | 0.0143 | ||
| 1 | 1 | = 1 | 1 | 1 | |
| 16 | 1 | <14 | <10−5 | ||
Comparison of various modalities with gene expression in terms of proportion of datasets/tasks where one modality performs at least as good as the other
| Number of datasets/tasks | Comparison of proportions | ||||
|---|---|---|---|---|---|
| Modality | where this modality was measured | where this modality had performance ≥ gene expression | where gene expression had performance ≥ this modality | p-value | p-value adjusted for multiple comp. |
| 47 | 21 | <45 | <10−5 | ||
| 27 | 24 | <25 | 0.6387 | 0.8516 | |
| 16 | 11 | <16 | 0.0149 | ||
| 27 | 25 | <27 | 0.1495 | 0.2393 | |
| 14 | 14 | = 14 | 1 | 1 | |
| 3 | 0 | <3 | 0.0143 | ||
| 1 | 1 | = 1 | 1 | 1 | |
| 16 | 1 | <16 | <10−5 | ||
Figure 1Comparison of predictivity of gene expression microarrays (GE) with other modalities.
The results are based on 151 comparisons of gene expression with various modalities for various datasets/tasks. Predictivity is measured by the area under ROC curve (AUC). The results in (a) are obtained using statistical comparison of AUC differences in individual datasets/tasks, while the results in (b) are obtained using nominal comparison of AUC difference in individual datasets/tasks.
Figure 2Comparison of predictivity of various analytic approaches.
Predictivity is measured by the area under ROC curve (AUC) and averaged over all 47 datasets/tasks.
Figure 3Comparison of predictivity of uni-modal gene expression-based approach (GE) with multi-modal approaches.
The results are based on 141 comparisons of uni-modal gene expression-based approach with 3 multi-modal approaches for 47 datasets/tasks. Predictivity is measured by the area under ROC curve (AUC). The results in (a) are obtained using statistical comparison of AUC differences in individual datasets/tasks, while the results in (b) are obtained using nominal comparison of AUC difference in individual datasets/tasks.
Characteristics of datasets/tasks used in this study. “N” is number of subjects with complete coverage of data from all available modalities in a given dataset. “N(0)” and “N(1)” denote number of subjects for classes “0” and “1”, respectively. The encoding of classes is given in the second column
| Dataset short name | Phenotypic response variable definition and encoding | N(0) | N(1) | Gene Expression | miRNA Expression | Protein expression | Clinical | Tumor Imaging | GWAS | DNA Methylation | Somatic Mutations | Copy Number |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TCGA_BRCA1.R1 | Neoplasm disease stages I*, II* (0) vs. III*, IV* (1) | 124 | 111 | X | X | X | X | X | ||||
| TCGA_BRCA1.R2 | Lymph node stage N0* (0) vs. N1*, N2*, N3* (1) | 183 | 50 | X | X | X | X | X | ||||
| TCGA_BRCA1.R3 | Tumor stages T1*, T2* (0) vs. T3*, T4* (1) | 203 | 37 | X | X | X | X | X | ||||
| TCGA_BRCA1.R4 | Estrogen receptor negative (0) vs. positive (1) | 57 | 176 | X | X | X | X | X | ||||
| TCGA_BRCA1.R5 | Progesterone receptor negative (0) vs. positive (1) | 89 | 144 | X | X | X | X | X | ||||
| TCGA_BRCA1.R6 | survived 2 years (0) or not (1) | 96 | 7 | X | X | X | X | X | ||||
| TCGA_BRCA1.R7 | survived 3 years (0) or not (1) | 62 | 13 | X | X | X | X | X | ||||
| TCGA_BRCA1.R8 | survived 4 years (0) or not (1) | 45 | 13 | X | X | X | X | X | ||||
| TCGA_BRCA2.R1 | Neoplasm disease stages I*, II* (0) vs. III*, IV* (1) | 57 | 101 | X | X | X | X | X | ||||
| TCGA_BRCA2.R2 | Lymph node stage N0* (0) vs. N1*, N2*, N3* (1) | 107 | 52 | X | X | X | X | X | ||||
| TCGA_BRCA2.R3 | Tumor stages T1*, T2* (0) vs. T3*, T4* (1) | 129 | 30 | X | X | X | X | X | ||||
| TCGA_BRCA2.R4 | Estrogen receptor negative (0) vs. positive (1) | 39 | 121 | X | X | X | X | X | ||||
| TCGA_BRCA2.R5 | Progesterone receptor negative (0) vs. positive (1) | 60 | 100 | X | X | X | X | X | ||||
| TCGA_BRCA2.R6 | survived 2 years (0) or not (1) | 94 | 6 | X | X | X | X | X | ||||
| TCGA_BRCA2.R7 | survived 3 years (0) or not (1) | 71 | 9 | X | X | X | X | X | ||||
| TCGA_BRCA2.R8 | survived 4 years (0) or not (1) | 38 | 14 | X | X | X | X | X | ||||
| TCGA_OVCA.R1 | Lymphatic invasion present (1) vs. absent (0) | 47 | 87 | X | X | X | X | X | ||||
| TCGA_OVCA.R2 | Neoplasm histologic grade G1,G2 (0) vs. G3,G4 (1) | 52 | 325 | X | X | X | X | X | ||||
| TCGA_OVCA.R3 | Tumor stages T1*, T2* (0) vs. T3*, T4* (1) | 30 | 350 | X | X | X | X | X | ||||
| TCGA_OVCA.R4 | Venous invasion present (1) vs. absent (0) | 40 | 53 | X | X | X | X | X | ||||
| TCGA_OVCA.R5 | survived 1 year (1) or not (0) | 271 | 32 | X | X | X | X | X | ||||
| TCGA_OVCA.R6 | survived 2 years (1) or not (0) | 206 | 68 | X | X | X | X | X | ||||
| TCGA_OVCA.R7 | survived 3 years (1) or not (0) | 153 | 98 | X | X | X | X | X | ||||
| TCGA_OVCA.R8 | survived 4 years (1) or not (0) | 85 | 148 | X | X | X | X | X | ||||
| TCGA_OVCA.R9 | survived 5 years (1) or not (0) | 55 | 168 | X | X | X | X | X | ||||
| TCGA_OVCA.R10 | survived 6 years (1) or not (0) | 30 | 182 | X | X | X | X | X | ||||
| TCGA_OVCA.R11 | survived 7 years (1) or not (0) | 19 | 189 | X | X | X | X | X | ||||
| MSKCC_PRCA.R1 | Lymph node stage N0 (0) vs. N1 (1) | 62 | 12 | X | X | X | X | |||||
| MSKCC_PRCA.R2 | Primary (0) vs. metastatic (1) | 79 | 13 | X | X | X | X | |||||
| MSKCC_PRCA.R3 | Tumor stages T1 (0) vs. T2, T3, T4 (1) | 53 | 35 | X | X | X | X | |||||
| NEOMARK.R1 | Recurrence (1) vs. no recurrence (0) of oral squam. cell cancer | 71 | 6 | X | X | X | ||||||
| METABRIC.R1 | ER_Expr positive (1) vs. negative (0) | 463 | 1487 | X | X | X | ||||||
| METABRIC.R2 | HER2_Expr positive (1) vs. negative (0) | 1710 | 240 | X | X | X | ||||||
| METABRIC.R3 | PR_Expr positive (1) vs. negative (0) | 920 | 1030 | X | X | X | ||||||
| METABRIC.R4 | Grade 1 (0) vs. 2,3 (1) | 167 | 1783 | X | X | X | ||||||
| METABRIC.R5 | Grade 1,2 (0) vs. 3 (1) | 1018 | 932 | X | X | X | ||||||
| METABRIC.R6 | Stage 0 (0) vs. Stages 1,2,3,4 (1) | 509 | 1441 | X | X | X | ||||||
| METABRIC.R7 | Stages 0,1 (0) vs. Stages 2,3,4 (1) | 1005 | 945 | X | X | X | ||||||
| METABRIC.R8 | Stages 0,1,2 (0) vs. Stages 3,4 (1) | 1825 | 125 | X | X | X | ||||||
| METABRIC.R9 | Stages 0,1,2,3 (0) vs. Stage 4 (1) | 1940 | 10 | X | X | X | ||||||
| METABRIC.R10 | survived 1 year (1) or not (0) | 27 | 1878 | X | X | X | ||||||
| METABRIC.R11 | survived 2 years (1) or not (0) | 102 | 1767 | X | X | X | ||||||
| METABRIC.R12 | survived 3 years (1) or not (0) | 184 | 1634 | X | X | X | ||||||
| METABRIC.R13 | survived 4 years (1) or not (0) | 279 | 1482 | X | X | X | ||||||
| METABRIC.R14 | survived 5 years (1) or not (0) | 340 | 1328 | X | X | X | ||||||
| METABRIC.R15 | survived 6 years (1) or not (0) | 387 | 1138 | X | X | X | ||||||
| METABRIC.R16 | survived 7 years (1) or not (0) | 424 | 1013 | X | X | X | ||||||
Figure 4Multi-modal uniform (MMU) predictive analytics approaches.
(a) MMU w/o feature selection, (b) MMU with feature selection performed on all modalities at once, (c) MMU with feature selection performed independently on individual modalities.
Figure 5Multi-modal ensemble (MME) predictive analytics approaches.