| Literature DB >> 31829157 |
Martin Palazzo1,2,3, Pierre Beauseroy2, Patricio Yankilevich4.
Abstract
BACKGROUND: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing.Entities:
Keywords: Autoencoder; Cancer genomics; Kernel learning
Mesh:
Year: 2019 PMID: 31829157 PMCID: PMC6907172 DOI: 10.1186/s12859-019-3298-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Model architecture. Scheme of the multi-modal autoencoder architecture for both deleterious and non-deleterious mutational profiles. Input and output dimension have 12424 genes. The encoder and decoder functions contain one hidden layer each of 400 activation functions (neurons). The latent layer of each autoencoder has 50 activation functions. Highlighted in red is the latent space L which contains signal from both types of mutational profiles
Fig. 2t-SNE scatter plot. Scatter plot of the projection of the latent space using t-SNE dimensions showing by different colors the 14 tumor types by primary site
Fig. 3Validation loss. Autoencoder training and validation loss during training epochs after cross validation
Fig. 4Latent Space evaluation. Left: Kernel target alignment score for different values of sigma parameter. Right: Mutual Information score for different number of clusters
Classification results for 40 cancer subtypes
| Primary site | Project name | Test samples | AUC latent | AUC De. | AUC Nd. |
|---|---|---|---|---|---|
| Head and neck | 36 | 0.75 | 0.50 | 0.55 | |
| Brain | 56 | 0.81 | 0.62 | 0.80 | |
| Blood | CLLE-ES | 102 | 0.83 | 0.84 | 0.76 |
| Head and neck | 28 | 0.82 | 0.66 | 0.81 | |
| Liver | 79 | 0.54 | 0.50 | 0.50 | |
| Lung | 34 | 0.77 | 0.57 | 0.50 | |
| Skin | 24 | 0.79 | 0.50 | 0.50 | |
| Stomach | 49 | 0.67 | 0.50 | 0.51 | |
| Blood | 65 | 0.77 | 0.5 | 0.61 | |
| Lung | 40 | 0.81 | 0.74 | 0.55 | |
| Colorectal | 63 | 0.60 | 0.50 | 0.51 | |
| Skin | 67 | 0.84 | 0.50 | 0.50 | |
| Liver | 79 | 0.61 | 0.50 | 0.50 | |
| Blood | ALL-US | 15 | 0.83 | 0.91 | 0.94 |
| Skin | 20 | 0.84 | 0.50 | 0.50 | |
| Brain | 55 | 0.87 | 0.50 | 0.66 | |
| Nervous system | NBL-US | 18 | 0.96 | 0.95 | 0.96 |
| Blood | 41 | 0.97 | 0.73 | 0.64 | |
| Prostate | 28 | 0.65 | 0.5 | 0.50 | |
| Prostate | 51 | 0.83 | 0.58 | 0.74 | |
| Blood | 49 | 0.82 | 0.50 | 0.50 | |
| Kidney | 82 | 0.90 | 0.50 | 0.58 | |
| Brain | 90 | 0.79 | 0.77 | 0.59 | |
| Kidney | 49 | 0.84 | 0.5 | 0.5 | |
| Blood | AML-US | 18 | 0.94 | 0.96 | 0.96 |
| Breast | 28 | 0.59 | 0.50 | 0.50 | |
| Kidney | 33 | 0.90 | 0.50 | 0.57 | |
| Prostate | 58 | 0.88 | 0.71 | 0.50 | |
| Stomach | 58 | 0.84 | 0.50 | 0.50 | |
| Stomach | 115 | 0.80 | 0.50 | 0.50 | |
| Liver | 32 | 0.88 | 0.50 | 0.75 | |
| Breast | 189 | 0.75 | 0.60 | 0.50 | |
| Lung | 39 | 0.91 | 0.50 | 0.50 | |
| Esophagous | 61 | 0.90 | 0.50 | 0.50 | |
| Colorectal | 51 | 0.83 | 0.50 | 0.50 | |
| Breast | 15 | 0.82 | 0.50 | 0.56 | |
| Pancreas | 73 | 0.72 | 0.60 | 0.50 | |
| Pancreas | PACA-CA | 54 | 0.87 | 0.50 | 0.89 |
| Head & Neck | THCA-US | 76 | 0.85 | 0.85 | 0.50 |
| Breast | 114 | 0.79 | 0.56 | 0.50 |
The number of the test samples for the corresponding class is detailed. Area under the Roc curve is detailed for classifiers on Latent Space, Deleterious and Non-Deleterious input data. Tumor subtypes where the classification performance is improved in the latent space are highlighted in bold