| Literature DB >> 31395005 |
Leihong Wu1, Xiangwen Liu2,3, Joshua Xu2.
Abstract
BACKGROUND: Researchers today are generating unprecedented amounts of biological data. One trend in current biological research is integrated analysis with multi-platform data. Effective integration of multi-platform data into the solution of a single or multi-task classification problem; however, is critical and challenging. In this study, we proposed HetEnc, a novel deep learning-based approach, for information domain separation.Entities:
Mesh:
Year: 2019 PMID: 31395005 PMCID: PMC6686264 DOI: 10.1186/s12864-019-5997-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of Neuroblastoma Endpoints
| Endpoint | FAV | OS_All | OS_HR |
|---|---|---|---|
| Full description | Neuroblastoma Favorable Prognosis | Overall Survival | Survival in High Risk patients |
| Sample size (Train/Test) | 136/136 | 249/249 | 86/90 |
| Train set prevalence | 45/91 (0.669) | 51/198 (0.795) | 43/43 (0.500) |
| Test set prevalence | 46/90 (0.662) | 54/195 (0.783) | 49/41 (0.544) |
Predicting difficulty (Zhang, et al., 2015) | Easy | Medium | Hard |
Fig. 1(a) Diagram of CombNet. Microarray and RNA-seq data were mixed before entering the autoencoder. Same feature spaces were defined in both platforms (b) Diagram of CrossNet. The first part (generative part) is an autoencoder, where an encoder and decoder are combined to regenerate microarray gene expression profile. The second part (discriminative part) is then introduced to reduce the difference between regenerated microarray data (i.e., the output of generative part) and origin RNA-seq data. In current version, we do not build another discriminative model but use the crossentropy to simplify the process
Fig. 2HetEnc overview. (a) feature representation model architecture and three different encoding networks (AE, CombNet and CrossNet) used in the study; (b) feature extraction and 6-DNN structure in the modeling step
Fig. 3(a) Principle Component Analysis (PCA) by features extracted by HetEnc and its three encoding networks: AE, CombNet and CrossNet. RNA-seq and Microarray samples are combined for PCA analysis. Green and red dots represent RNA-seq and Microarray samples, respectively. (b) A sample-wise scatter plot of PC2 correlation analysis between Microarray and RNA-seq platform
Predictive performance (AUC) for the neuroblastoma dataset
| Model | RNA-seq | Microarray | |||||
|---|---|---|---|---|---|---|---|
| FAV | OS_All | OS_HR | FAV | OS_All | OS_HR | ||
| Cross-validation | HetEnc | 0.964 (0.009) | 0.830 (0.019) | 0.520 (0.044) | 0.962 (0.011) | 0.849 (0.024) | 0.651 (0.044) |
| HetEnc |
|
|
|
|
|
| |
| Raw-DNN* | 0.926 (0.043) | 0.698 (0.058) | 0.578 (0.03) | 0.906 (0.054) | 0.721 (0.035) | 0.568 (0.031) | |
| FS-DNN* | 0.923 (0.052) | 0.704 (0.046) | 0.558 (0.028) | 0.919 (0.056) | 0.722 (0.047) | 0.559 (0.025) | |
External Testing (on same testing set) | KNN | 0.896 (0.032) | 0.641 (0.032) | 0.495 (0.048) | 0.907 (0.035) | 0.662 (0.031) | 0.515 (0.041) |
| NSC | 0.901 (0.036) | 0.700 (0.048) | 0.499 (0.036) | 0.921 (0.032) | 0.713 (0.067) | 0.510 (0.035) | |
| SVM | 0.894 (0.043) | 0.631 (0.024) | 0.512 (0.050) | 0.914 (0.035) | 0.620 (0.034) | 0.525 (0.047) | |
| RandomForest | 0.905 (0.014) | 0.740 (0.019) | 0.563 (0.030) | 0.912 (0.012) | 0.727 (0.020) | 0.560 (0.030) | |
| XGBoost | 0.883 | 0.742 | 0.517 | 0.874 | 0.749 | 0.611 | |
| Avg. of Best 60 SEQC Models | 0.931 (0.02) | 0.735 (0.072) | 0.544 (0.052) | 0.929 (0.02) | 0.756 (0.082) | 0.563 (0.038) | |
*Raw-DNN used the raw 10,042 gene features as input of DNN model, FS-DNN further applied feature selection threshold (p < 0.05 for each endpoint) before entering the DNN model. The structure of DNN model used in Raw-DNN and FS-DNN are the same as the DNN used in HetEnc supervised learning step