| Literature DB >> 27570650 |
Sisi Ma1, Jiwen Ren1, David Fenyö1.
Abstract
Breast cancer affects one in eight women in America and is a leading cause of death from cancer worldwide. In the current study, four types of Omics data including copy number variation, gene expression, proteome and phosphoproteome were collected from seventy-seven breast cancer patients. Individual types of Omics data were used to separately construct predictive models to predict ten-year survival, an important clinical hallmark. The predictive models constructed with proteome data achieved decent predictivity (mean AUC = 0.725) and outperforms the models constructed with other types of Omics data. This indicates that high quality, large scale protein data is more effective for survival prediction compared to other types of omics data. Further, we experimented with ten different data fusion techniques (generic and Multi-kernel learning based) to test whether combining multi-Omics data can result in improved predictive performance. None of the data fusion techniques tested in the current study outperforms the predictive models built with the proteome data.Entities:
Year: 2016 PMID: 27570650 PMCID: PMC5001766
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Classification models and feature selection methods used for 4 individual Omics data and data fusion strategies used to combine 4 individual Omics data/ 2 individual Omics data. For some data fusion strategies, not all feature selection methods were explored, to avoid repetition. For example (i)_4 with univariate association is the same as (ii)_4 with univariate association.
| Data Used for Modeling | Feature Set/Data Fusion Stratergy Name | Classifier | Feature Selection Method | ||
|---|---|---|---|---|---|
| No Feature Selection | Univariate Association | SVMRFE | |||
|
|
|
| ✓ | ✓ | ✓ |
|
|
| ✓ | ✓ | ✓ | |
|
|
| ✓ | ✓ | ✓ | |
|
|
| ✓ | ✓ | ✓ | |
|
|
|
| ✓ | ||
|
|
| ✓ | ✓ | ||
|
|
| ✓ | ✓ | ||
|
|
| ✓ | |||
|
|
| ✓ | ✓ | ||
|
|
|
| ✓ | ||
|
|
| ✓ | ✓ | ||
|
|
| ✓ | ✓ | ||
|
|
| ✓ | |||
|
|
| ✓ | ✓ | ||
Figure 1.Data fusion strategies employed in the present study.
Predictive Performance of models constructed with a single type Omics data vs. that with multi-Omics Data. Results are shown for SMV rbf classifier and MKL. Different feature selection methods were applied when applicable. Results are shown in the form of mean AUC with standard deviations inside a pair of Parentheses.
| Data Used for Modeling | Feature Set/Data Fusion Stratergy Name | Classifier | Feature Selection Method | ||
| No Feature Selection | Univariate Association | SVM RFE | |||
| Single Omics Data | Copy Number Variation | SVM rbf | 0.551(0.187) | 0.463(0.218) | 0.546(0.244) |
| Gene Expression | SVM rbf | 0.431(0.152) | 0.547(0.207) | 0.431(0.276) | |
| Proteome | SVM rbf | 0.557(0.263) |
| 0.536(0.262) | |
| Phosphoproteome | SVM rbf | 0.516(0.257) | 0.671(0.234) | 0.582(0.219) | |
| Combination of 4 types of Omics Data | (i) 4:All Omics concatenated 4 | SVM rbf | 0.482(0.216) | ||
| (ii) 4:All Omics concatenated with feature selection 4 | SVM rbf | 0.607(0.217) | 0.517(0.219) | ||
| (iii) 4: Selected features concatenated 4 | SVM rbf | 0.607(0.217) | 0.476(0.259) | ||
| (iv) 4:MKL 4 | MKL | 0.469(0.118) | |||
| (v) 4:Selected features MKL 4 | MKL | 0.561(0.170) | 0.504(0.268) | ||
| Combination of 2 types of Omics Data | (i) 2:All Omics concatenated 2 | SVM rbf | 0.495(0.235) | ||
| (ii) 2:All Omics concatenated with feature selection 2 | SVM rbf | 0.676(0.235) | 0.566(0.222) | ||
| (iii) 2: Selected features concatenated 2 | SVM rbf | 0.676(0.235) | 0.557(0.253) | ||
| (iv) 2:MKL 2 | MKL | 0.494(0.109) | |||
| (v) 2:Selected features MKL 2 | MKL | 0.588(0.190) | 0.530(0.233) | ||
Figure 2.The AUCs for predictive models built with one type of Omics data with SVM rbf and univariate association for feature selection. Models built with proteome data out performs models built with other data modules.
Figure 3.Predictive performance for model built with proteome data vs. that with data fusion methods. Figures shows predictive performance of models built with SVM rbf with univariate association as feature selection methods when applicable.