| Literature DB >> 31660856 |
Jing Xu1,2, Peng Wu3,4, Yuehui Chen1,2, Qingfang Meng1,2, Hussain Dawood5, Hassan Dawood6.
Abstract
BACKGROUND: Cancer subtype classification attains the great importance for accurate diagnosis and personalized treatment of cancer. Latest developments in high-throughput sequencing technologies have rapidly produced multi-omics data of the same cancer sample. Many computational methods have been proposed to classify cancer subtypes, however most of them generate the model by only employing gene expression data. It has been shown that integration of multi-omics data contributes to cancer subtype classification.Entities:
Keywords: Autoencoder; Cancer subtype classification; Cascade forest; Data integration; Deep learning
Year: 2019 PMID: 31660856 PMCID: PMC6819613 DOI: 10.1186/s12859-019-3116-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Statistics of datasets for three cancer types
| Cancer type | DNA methylation | miRNA expression | Gene expression | Patient |
|---|---|---|---|---|
| BRCA | 23094 | 354 | 17814 | 104 |
| GBM | 1305 | 534 | 12042 | 213 |
| OV | 24963 | 539 | 16860 | 102 |
Parameter settings of FNT
| Parameter | Value |
|---|---|
| Population size | 50 |
| Crossover probability | 0.4 |
| Mutation probability | 0.01 |
| C1 | 2 |
| C2 | 2 |
| Vmax | 2 |
Performance comparison of the proposed method with multiple and single dimensional data
| Cancer type | DNA methylation | miRNA expression | Gene expression | Integrative Data |
|---|---|---|---|---|
| BRCA | 0.731 | 0.769 | 0.808 | 0.846 |
| GBM | 0.596 | 0.539 | 0.865 | 0.885 |
| OV | 0.640 | 0.640 | 0.760 | 0.840 |
Fig. 1Comparison of classification accuracy between different data
Performance comparison of dimensionality reduction methods on BRCA dataset
| Data | PCA | NMF | SAE |
|---|---|---|---|
| DNA methylation | 0.692 | 0.654 | 0.731 |
| miRNA expression | 0.731 | 0.692 | 0.769 |
| Gene expression | 0.769 | 0.731 | 0.808 |
| Integrative Data | 0.808 | 0.769 | 0.846 |
Performance comparison of dimensionality reduction methods on GBM dataset
| Data | PCA | NMF | SAE |
|---|---|---|---|
| DNA methylation | 0.558 | 0.577 | 0.596 |
| miRNA expression | 0.519 | 0.500 | 0.539 |
| Gene expression | 0.808 | 0.781 | 0.865 |
| Integrative Data | 0.827 | 0.808 | 0.885 |
Performance comparison of dimensionality reduction methods on OV dataset
| Data | PCA | NMF | SAE |
|---|---|---|---|
| DNA methylation | 0.600 | 0.560 | 0.640 |
| miRNA expression | 0.560 | 0.520 | 0.640 |
| Gene expression | 0.720 | 0.680 | 0.760 |
| Integrative Data | 0.760 | 0.720 | 0.840 |
Fig. 2Performance comparison of proposed SAE framework, PCA and NMF using integrative data
Comparison of overall accuracy on BRCA datasets
| Data | KNN | SVM | RF | gcForest | mixOmics | DFNForest |
|---|---|---|---|---|---|---|
| DNA methylation | 0.615 | 0.692 | 0.615 | 0.731 | 0.692 | 0.731 |
| miRNA expression | 0.654 | 0.731 | 0.731 | 0.731 | 0.692 | 0.769 |
| Gene expression | 0.731 | 0.769 | 0.769 | 0.769 | 0.769 | 0.808 |
| Integrative Data | 0.769 | 0.769 | 0.808 | 0.808 | 0.808 | 0.846 |
Comparison of overall accuracy on GBM datasets
| Data | KNN | SVM | RF | gcForest | mixOmics | DFNForest |
|---|---|---|---|---|---|---|
| DNA methylation | 0.404 | 0.558 | 0.558 | 0.577 | 0.558 | 0.596 |
| miRNA expression | 0.539 | 0.442 | 0.462 | 0.558 | 0.539 | 0.539 |
| Gene expression | 0.635 | 0.827 | 0.827 | 0.846 | 0.827 | 0.865 |
| Integrative Data | 0.635 | 0.846 | 0.846 | 0.865 | 0.846 | 0.885 |
Comparison of overall accuracy on OV datasets
| Data | KNN | SVM | RF | gcForest | mixOmics | DFNForest |
|---|---|---|---|---|---|---|
| DNA methylation | 0.440 | 0.520 | 0.560 | 0.560 | 0.520 | 0.640 |
| miRNA expression | 0.480 | 0.520 | 0.480 | 0.640 | 0.560 | 0.640 |
| Gene expression | 0.680 | 0.680 | 0.720 | 0.720 | 0.720 | 0.760 |
| Integrative Data | 0.720 | 0.720 | 0.760 | 0.800 | 0.760 | 0.840 |
Fig. 3Comparison of overall performance of different classifiers on BRCA, GBM and OV datasets. The average precision, recall and F-1 score of each dataset were evaluated on BRCA, GBM and OV datasets
Fig. 4Architecture of autoencoder. a Structure of basic autoencoder. b Structure of three-layer stacked autoencoder
Fig. 5Illustration of the cascade forest structure. Three forests are generated by different grammar, the first forest (black) use function set F of {+2,+3, +4}, the second forest (green) use {+2,+4, +5}, and the last forest (blue) use function set F of {+3,+4, +5}
Fig. 6Illustration of class vector generation. Each FNT will generate an estimated value and then concatenated
Fig. 7The hierarchical integration deep flexible neural forest framework