| Literature DB >> 31881961 |
Ranjan Kumar Barman1,2, Anirban Mukhopadhyay3, Ujjwal Maulik2, Santasabuj Das4,5.
Abstract
BACKGROUND: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets.Entities:
Keywords: Classification; Deep neural networks; Functional annotations; Infectious disease-associated host genes; Sequence and interaction network features
Mesh:
Substances:
Year: 2019 PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Features wise performance measures on disease and non-disease associated proteins dataset using deep neural network classifier
| Primary sequence features | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Features set | Vector length | Sensitivity (%) | Specificity (%) | Accuracy (%) | PPV (%) | MCC | F1 score (%) | AUC | |
| AAC | 20 | 1: 1 | 86.32 | 53.31 | 70.09 | 66.04 | 0.43 | 74.34 | 0.755 |
| PAAC | 50 | 1: 1 | 86.32 | 53.31 | 70.09 | 66.04 | 0.43 | 74.34 | 0.755 |
| CTD | 343 | 1: 1 | 91.09 | 37.87 | 64.52 | 59.52 | 0.34 | 71.86 | 0.692 |
| DC | 400 | 1: 1 | 88.59 | 44.63 | 66.83 | 62.96 | 0.38 | 72.89 | 0.715 |
| AAC_PAAC | 70 | 1: 1 | 85.15 | 59.93 | 72.98 | 69.02 | 0.47 | 75.92 | 0.766 |
| AAC_CTD | 363 | 1: 1 | 87.45 | 47.18 | 67.74 | 62.83 | 0.39 | 72.81 | 0.709 |
| AAC_DC | 420 | 1: 1 | 83.55 | 52.72 | 68.73 | 64.66 | 0.39 | 72.69 | 0.708 |
| PAAC_CTD | 393 | 1: 1 | 88.52 | 45.23 | 67.02 | 62.46 | 0.39 | 72.78 | 0.720 |
| PAAC_DC | 450 | 1: 1 | 88.08 | 50.40 | 69.73 | 65.24 | 0.43 | 74.40 | 0.732 |
| CTD_DC | 743 | 1: 1 | 87.15 | 48.30 | 67.94 | 64.59 | 0.40 | 73.08 | 0.733 |
| AAC_PAAC_CTD | 413 | 1: 1 | 83.72 | 53.77 | 68.96 | 64.93 | 0.40 | 72.72 | 0.730 |
| AAC_PAAC_DC | 470 | 1: 1 | 86.32 | 52.49 | 69.86 | 65.64 | 0.43 | 74.09 | 0.729 |
| AAC_CTD_DC | 763 | 1: 1 | 90.22 | 45.17 | 67.88 | 62.69 | 0.40 | 73.72 | 0.729 |
| PAAC_CTD_DC | 793 | 1: 1 | 90.30 | 45.27 | 67.80 | 63.62 | 0.40 | 73.94 | 0.743 |
| AAC_PAAC_CTD_DC | 813 | 1: 1 | 87.50 | 49.44 | 68.50 | 64.00 | 0.41 | 73.50 | 0.739 |
| Network Analyzer properties | |||||||||
| Network properties | 1: 1 | ||||||||
| Normalized And Filtered Network properties | 1: 1 | ||||||||
The notable performances are indicated by bold
Mixed features based performance on disease and non-disease associated proteins dataset
| Mixed features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Features set | Methods | Vector length | Sensitivity (%) | Specificity (%) | Accuracy (%) | PPV (%) | MCC | F1 score (%) | AUC | |
| AAC_Network properties | DNN | 29 | 1: 1 | 82.23 | 88.30 | 85.41 | 88.10 | 0.71 | 84.91 | 0.900 |
| PAAC_Network properties | DNN | |||||||||
| AAC_PAAC_ Network properties | DNN | |||||||||
| Normalized And Filtered AAC_Network properties | DNN | 26 | 1: 1 | 83.78 | 86.90 | 85.51 | 86.95 | 0.71 | 85.21 | 0.904 |
| Normalized And Filtered PAAC_Network properties | DNN | |||||||||
| Normalized And Filtered AAC_PAAC_Network properties | DNN | |||||||||
The notable performances are indicated by bold
Selected features wise performance measures using different classifier
| Features set | Methods | Vector length | Sensitivity (%) | Specificity (%) | Accuracy (%) | PPV (%) | MCC | F1 score (%) | AUC | |
|---|---|---|---|---|---|---|---|---|---|---|
| Selected Features For PAAC_Network properties | DNN | |||||||||
| Selected Features For PAAC_Network properties | SVM | 16 | 1: 1 | 78.03 | 87.87 | 82.95 | 86.40 | 0.66 | 81.81 | 0.862 |
| Selected Features For PAAC_Network properties | RF | 16 | 1: 1 | 83.93 | 88.03 | 85.98 | 87.52 | 0.72 | 85.69 | 0.916 |
| Selected Features For PAAC_Network properties | NB | 16 | 1: 1 | 78.03 | 88.03 | 83.03 | 86.70 | 0.66 | 82.14 | 0.904 |
| Selected Features For AAC_PAAC_Network properties | DNN | |||||||||
| Selected Features For AAC_PAAC_Network properties | SVM | 24 | 1: 1 | 80.00 | 87.87 | 83.93 | 86.64 | 0.68 | 83.01 | 0.881 |
| Selected Features For AAC_PAAC_Network properties | RF | 24 | 1: 1 | 82.62 | 87.70 | 85.16 | 87.05 | 0.70 | 84.78 | 0.918 |
| Selected Features For AAC_PAAC_Network properties | NB | 24 | 1: 1 | 78.52 | 88.36 | 83.44 | 87.09 | 0.67 | 82.59 | 0.911 |
| Selected Features For Normalized And Filtered PAAC_Network properties | DNN | |||||||||
| Selected Features For Normalized And Filtered PAAC_Network properties | SVM | 10 | 1: 1 | 77.54 | 87.70 | 82.62 | 86.34 | 0.66 | 81.48 | 0.880 |
| Selected Features For Normalized And Filtered PAAC_Network properties | RF | 10 | 1: 1 | 81.15 | 86.39 | 83.77 | 85.64 | 0.68 | 83.33 | 0.910 |
| Selected Features For Normalized And Filtered PAAC_Network properties | NB | 10 | 1: 1 | 76.23 | 91.31 | 83.77 | 89.77 | 0.68 | 82.45 | 0.896 |
| Selected Features For Normalized And Filtered AAC_PAAC_Network properties | DNN | |||||||||
| Selected Features For Normalized And Filtered AAC_PAAC_Network properties | SVM | 25 | 1: 1 | 78.85 | 88.52 | 83.69 | 87.07 | 0.68 | 82.56 | 0.889 |
| Selected Features For Normalized And Filtered AAC_PAAC_Network properties | RF | 25 | 1: 1 | 81.64 | 86.72 | 84.18 | 86.01 | 0.68 | 83.77 | 0.911 |
| Selected Features For Normalized And Filtered AAC_PAAC_Network properties | NB | 25 | 1: 1 | 77.38 | 89.67 | 83.52 | 88.22 | 0.68 | 82.45 | 0.908 |
The notable performances are indicated by bold
Fig. 1Performance measures of different classifiers based on 16 selected features from pseudo-amino acid composition (PAAC) and network properties
Performance on imbalanced datasets using deep neural network classifier
| Features set | Vector length | Sensitivity (%) | Specificity (%) | Accuracy (%) | PPV (%) | MCC | F1 score (%) | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| Selected Features For PAAC _Network properties | |||||||||
| Selected Features For PAAC _Network properties | 16 | 1: 2 | 77.89 | 92.56 | 87.81 | 84.64 | 0.72 | 80.72 | 0.900 |
| Selected Features For PAAC _Network properties | 16 | 1: 3 | 72.34 | 94.54 | 89.03 | 81.70 | 0.70 | 76.53 | 0.902 |
| Selected Features For PAAC _Network properties | 16 | 1: 4 | 68.89 | 95.46 | 90.20 | 79.20 | 0.68 | 73.52 | 0.897 |
| Selected Features For PAAC _Network properties | 16 | 1: 5 | 69.00 | 95.13 | 90.85 | 74.44 | 0.66 | 71.25 | 0.895 |
| Selected Features For Normalized And Filtered PAAC_ Network properties | |||||||||
| Selected Features For Normalized And Filtered PAAC_ Network properties | 10 | 1: 2 | 76.76 | 92.94 | 87.62 | 84.41 | 0.72 | 80.25 | 0.895 |
| Selected Features For Normalized And FilteredPAAC_ Network properties | 10 | 1: 3 | 74.35 | 93.52 | 88.91 | 80.40 | 0.70 | 76.88 | 0.895 |
| Selected Features For Normalized And Filtered PAAC_ Network properties | 10 | 1: 4 | 67.39 | 96.27 | 90.57 | 82.68 | 0.69 | 73.66 | 0.897 |
| Selected Features For Normalized And Filtered PAAC_ Network properties | 10 | 1: 5 | 67.52 | 96.01 | 91.31 | 77.95 | 0.67 | 71.97 | 0.895 |
The notable performances are indicated by bold
Performance on blind dataset using best deep neural network classifier
| Best Model Features set | Vector length | Sensitivity (%) | Specificity (%) | Accuracy (%) | PPV (%) | MCC | F1 score (%) | AUC | |
|---|---|---|---|---|---|---|---|---|---|
PAAC_Network properties | 59 | 1: 1 | 85.09 | 76.32 | 80.70 | 78.23 | 0.62 | 81.51 | 0.872 |
| Selected Features For PAAC _Network properties | |||||||||
| Selected Features For Normalized And Filtered PAAC_ Network properties |
The notable performances are indicated by bold
Fig. 2Histogram representation of different disease terms based on GAD
Fig. 3Scatter plot of significantly enriched GO biological process terms, visualized by REVIGO summarizes and visualizes long lists of gene ontology terms [21]
Fig. 4The architecture of simple Deep Neural Networks