Hai Yang1, Rui Chen2,3, Dongdong Li1, Zhe Wang1. 1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, PR China. 2. Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America. 3. Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America.
Abstract
MOTIVATION: The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. RESULTS: We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. AVAILABILITY: The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. RESULTS: We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. AVAILABILITY: The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.