| Literature DB >> 35720914 |
Haleema Attique1, Sajid Shah1,2, Saima Jabeen3, Fiaz Gul Khan1, Ahmad Khan1, Mohammed ELAffendi2.
Abstract
DNA copy number variation (CNV) is the type of DNA variation which is associated with various human diseases. CNV ranges in size from 1 kilobase to several megabases on a chromosome. Most of the computational research for cancer classification is traditional machine learning based, which relies on handcrafted extraction and selection of features. To the best of our knowledge, the deep learning-based research also uses the step of feature extraction and selection. To understand the difference between multiple human cancers, we developed three end-to-end deep learning models, i.e., DNN (fully connected), CNN (convolution neural network), and RNN (recurrent neural network), to classify six cancer types using the CNV data of 24,174 genes. The strength of an end-to-end deep learning model lies in representation learning (automatic feature extraction). The purpose of proposing more than one model is to find which architecture among them performs better for CNV data. Our best model achieved 92% accuracy with an ROC of 0.99, and we compared the performances of our proposed models with state-of-the-art techniques. Our models have outperformed the state-of-the-art techniques in terms of accuracy, precision, and ROC. In the future, we aim to work on other types of cancers as well.Entities:
Mesh:
Year: 2022 PMID: 35720914 PMCID: PMC9203194 DOI: 10.1155/2022/4742986
Source DB: PubMed Journal: Comput Intell Neurosci
The average performances of different models along with the state of the art.
| S. no | Models | Train Acc | Val Acc (%) | ROC area | Precision | Recall |
|---|---|---|---|---|---|---|
| 1 | DNN3 | 95% | 91 | 0.99 | 0.88 | 0.87 |
| 2 | DNN5 | 96% |
| 0.99 | 0.89 | 0.88 |
| 3 | LSTM | 95% | 91 | 0.98 | 0.89 | 0.85 |
| 4 | 1D-CNN | 88% | 90 | 0.98 | 0.88 | 0.85 |
| 5 | Sana Fekry et al. [ | — | 85.9 | 0.965 | 0.852 | 0.862 |
The distribution of samples with respect to each cancer type in our dataset.
| Sr. | Cancer type | No of samples |
|---|---|---|
| 0 | BRCA (breast carcinoma) | 847 |
| 1 | BLCA (bladder urothelial) | 135 |
| 2 | COAD/READ (colon and rectal adenocarcinoma) | 575 |
| 3 | GBM (glioblastoma multiforme) | 563 |
| 4 | KIRC (kidney renal cell carcinoma) | 306 |
| 5 | HNSC (head and neck squamous cell) | 490 |
| Total | 2916 |
Algorithm 1Batch normalization.
Figure 1The architecture of the fully connected model with three hidden layers.
Figure 21D convolution-based architecture.
Figure 3LSTM architecture.
Figure 4Our methodology.
Figure 5Classification of accuracy of different models: (a) DNN3, (b) DNN5, (c) LSTM, and (d) 1D-CNN.
Figure 6ROC of different models on various cancer types: (a) DNN5, (b) LSTM, and (c) 1D-CNN.
The classwise performances of all networks.
| Models | GBM (3) | KIRC(4) | HNSC(5) | COAD/READ(2) | BLCA(1) | BRCA(0) |
|---|---|---|---|---|---|---|
| NN | ||||||
| TP rate | 0.68 | 0.96 | 0.82 | 0.98 | 0.83 | 0.93 |
| ROC area | 0.97 | 0.99 | 0.97 | 1.00 | 0.98 | 0.99 |
| Precision | 0.77 | 0.90 | 0.92 | 0.93 | 0.81 | 0.97 |
|
| 0.72 | 0.93 | 0.87 | 0.96 | 0.82 | 0.95 |
| Recall | 0.68 | 0.96 | 0.82 | 0.98 | 0.83 | 0.93 |
| FP rate | 0.00 | 0.01 | 0.04 | 0.01 | 0.02 | 0.00 |
|
| ||||||
| DNN | ||||||
| TP rate | 0.72 | 0.96 | 0.85 | 0.98 | 0.85 | 0.94 |
| ROC area | 0.97 | 0.98 | 0.99 | 1.00 | 0.98 | 0.99 |
| Precision | 0.75 | 0.93 | 0.94 | 0.94 | 0.85 | 0.93 |
|
| 0.73 | 0.94 | 0.89 | 0.96 | 0.85 | 0.94 |
| Recall | 0.72 | 0.96 | 0.85 | 0.98 | 0.85 | 0.94 |
| FP rate | 0.01 | 0.02 | 0.01 | 0.01 | 0.01 | 0.01 |
|
| ||||||
| LSTM | ||||||
| TP rate | 0.52 | 0.95 | 0.85 | 0.98 | 0.88 | 0.92 |
| ROC area | 0.96 | 0.99 | 0.98 | 1.00 | 0.97 | 1.00 |
| Precision | 0.87 | 0.91 | 0.93 | 0.92 | 0.79 | 0.95 |
|
| 0.65 | 0.93 | 0.88 | 0.95 | 0.83 | 0.94 |
| Recall | 0.68 | 0.94 | 0.84 | 0.96 | 0.79 | 0.91 |
| FP rate | 0.52 | 0.95 | 0.85 | 0.98 | 0.88 | 0.92 |
|
| ||||||
| 1D-CNN | ||||||
| TP rate | 0.64 | 0.93 | 0.92 | 0.96 | 0.77 | 0.91 |
| ROC area | 0.97 | 0.99 | 0.97 | 1.00 | 0.97 | 0.99 |
| Precision | 0.84 | 0.93 | 0.81 | 0.93 | 0.86 | 0.94 |
|
| 0.73 | 0.93 | 0.86 | 0.94 | 0.82 | 0.92 |
| Recall | 0.64 | 0.93 | 0.92 | 0.96 | 0.77 | 0.91 |
| FP rate | 0.00 | 0.02 | 0.04 | 0.01 | 0.01 | 0.01 |
Confusion matrix for training data.
| BRCA(0) | BLCA(1) | COAD/READ(2) | GBM (3) | KIRC(4) | HNSC(5) | |
|---|---|---|---|---|---|---|
| BRCA(0) |
| 0 | 1 | 0 | 0 | 0 |
| BLCA(1) | 0 |
| 6 | 0 | 0 | 0 |
| COAD/READ(2) | 0 | 1 |
| 0 | 0 | 0 |
| GBM (3) | 0 | 0 | 2 |
| 0 | 0 |
| KIRC(4) | 0 | 0 | 7 | 0 |
| 0 |
| HNSC(5) | 0 | 0 | 3 | 0 | 0 |
|
The confusion matrix for testing data.
| BRCA(0) | BLCA(1) | COAD/READ(2) | GBM (3) | KIRC(4) | HNSC(5) | |
|---|---|---|---|---|---|---|
| BRCA(0) |
| 1 | 2 | 2 | 3 | 2 |
| BLCA(1) | 0 |
| 3 | 3 | 2 | 2 |
| COAD/READ(2) | 0 | 3 |
| 1 | 4 | 2 |
| GBM (3) | 0 | 0 | 1 |
| 1 | 0 |
| KIRC(4) | 1 | 5 | 1 | 0 |
| 4 |
| HNSC(5) | 1 | 0 | 3 | 1 | 1 |
|