| Literature DB >> 31881828 |
Binh P Nguyen1, Quang H Nguyen2, Giang-Nam Doan-Ngoc2, Thanh-Hoang Nguyen-Vo3, Susanto Rahardja4.
Abstract
BACKGROUND: Since protein-DNA interactions are highly essential to diverse biological events, accurately positioning the location of the DNA-binding residues is necessary. This biological issue, however, is currently a challenging task in the age of post-genomic where data on protein sequences have expanded very fast. In this study, we propose iProDNA-CapsNet - a new prediction model identifying protein-DNA binding residues using an ensemble of capsule neural networks (CapsNets) on position specific scoring matrix (PSMM) profiles. The use of CapsNets promises an innovative approach to determine the location of DNA-binding residues. In this study, the benchmark datasets introduced by Hu et al. (2017), i.e., PDNA-543 and PDNA-TEST, were used to train and evaluate the model, respectively. To fairly assess the model performance, comparative analysis between iProDNA-CapsNet and existing state-of-the-art methods was done.Entities:
Keywords: Capsule neural network; Deep learning; PSSM; Prediction; Protein-DNA interaction; Residue
Mesh:
Substances:
Year: 2019 PMID: 31881828 PMCID: PMC6933727 DOI: 10.1186/s12859-019-3295-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data distribution in the training set (PDNA-543) and the independent testing set (PDNA-TEST)
| Dataset | No. of Sequences | No. of Positive Samples ( | No. of Negative Samples ( | Ratio ( |
|---|---|---|---|---|
| PDNA-543 | 543 | 9,549 | 134,995 | 14.137 |
| PDNA-TEST | 41 | 734 | 14,021 | 19.102 |
Fig. 1Architecture of the proposed CapsNet model
Fig. 2Computation between two layers: PrimaryCaps and BindCaps
Fig. 3Diagram of training and testing the CapsNet models
Fig. 410-fold cross-validation process
10-fold cross-validation performances of iProDNA-CapsNet on the training dataset (PDNA-543) under various decision thresholds
| Setting | ACC (%) | SN (%) | SP (%) | PR (%) | MCC | AUC |
|---|---|---|---|---|---|---|
| Threshold = 0.5 | 74.73 | 74.55 | 17.32 | 0.282 | 0.832 | |
| FPR ≈ 5% | 36.31 | 0.301 | 0.832 | |||
| FPR ≈ 15% | 83.66 | 64.21 | 85.00 | 22.78 | 0.832 | |
| SP ≈ SN | 76.02 | 76.02 | 76.02 | 17.93 | 0.287 | 0.832 |
Values which are significantly higher than the others are in bold
Performances of iProDNA-CapsNet on the test dataset (PDNA-TEST) under various decision thresholds
| Setting | ACC (%) | SN (%) | SP (%) | PR (%) | MCC | AUC |
|---|---|---|---|---|---|---|
| Threshold = 0.5 | 75.72 | 74.79 | 75.77 | 13.59 | 0.245 | 0.833 |
| FPR ≈ 5% | 42.17 | 0.833 | ||||
| FPR ≈ 8% | 91.13 | 45.73 | 93.45 | 26.23 | 0.302 | 0.833 |
| FPR ≈ 15% | 84.05 | 65.38 | 85.00 | 18.17 | 0.285 | 0.833 |
| SP ≈ SN | 75.34 | 75.34 | 13.47 | 0.245 | 0.833 |
Values which are significantly higher than the others are in bold
Fig. 5ROC curves for iProDNA-CapsNet on PDNA-543 (blue dashed line) in model testing and on PDNA-TEST (orange solid line) in 10-fold cross-validation
Performance comparison between iProDNA-CapsNet and other state-of-the-art methods
| Method | Setting | ACC (%) | SN (%) | SP (%) | PR (%) | MCC |
|---|---|---|---|---|---|---|
| BindN | Unknown | 79.15 | 45.64 | 80.90 | 11.12 | 0.143 |
| ProteDNA | Unknown | 95.11 | 4.77 | 99.84 | 60.30 | 0.160 |
| MetaDBSite | Unknown | 90.41 | 34.20 | 93.35 | 21.22 | 0.221 |
| DP-Bind | Unknown | 81.40 | 61.72 | 82.43 | 15.53 | 0.241 |
| DNABind | Unknown | 79.78 | 70.16 | 80.28 | 15.70 | 0.264 |
| BindN+ | FPR ≈ 5% | 91.58 | 24.11 | 20.51 | 0.178 | |
| FPR ≈ 15% | 83.69 | 50.81 | 85.41 | 15.42 | 0.213 | |
| TargetDNA | FPR ≈ 5% | 90.89 | 93.27 | 26.13 | 0.300 | |
| FPR ≈ 15% | 60.22 | 18.16 | 0.269 | |||
| iProDNA-CapsNet | FPR ≈ 5% | 42.17 | 94.93 | |||
| FPR ≈ 15% | 84.05 | 85.00 |
Values which are significantly higher than the others are in bold with FPR ≈ 5% and FPR ≈ 15%