| Literature DB >> 33203392 |
Wessam Elhefnawy1, Min Li2, Jianxin Wang2, Yaohang Li3.
Abstract
BACKGROUND: One of the most essential problems in structural bioinformatics is protein fold recognition. In this paper, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features at fragment level to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multi-modal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolutional neural network (CNN) to classify the fragment vector into the corresponding fold.Entities:
Keywords: Deep learning; Fold recognition; Protein fragments
Mesh:
Substances:
Year: 2020 PMID: 33203392 PMCID: PMC7672895 DOI: 10.1186/s12859-020-3504-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Fold recognition architecture. Two stages protein fold recognition architecture
Fig. 2Phase I.Multi-modal DBN architecture for fragment prediction
Fig. 3Phase II.Protein Fold Classification 1D-CNN model
Hyperparameters for Fold Classification Architecture
| Layer | Layer type | # of Units | Unit Type | Size | Stride |
|---|---|---|---|---|---|
| Input | # of fragments | ||||
| COV_1 | Convolution | 10 | ReLU | 1,10 | 1,1 |
| MP_1 | Max Pool | 1,10 | 1,1 | ||
| ST | Stacking | ||||
| COV_2 | Convolution | 100 | ReLU | 10,10 | 1,1 |
| MP_2 | Max Pool | 5,5 | 5,5 | ||
| FC_1 | Fully Connected | 100 | ReLU | ||
| FC_2 | Fully Connected | 100 | ReLU | ||
| Output | SoftMax | # of folds | Logistics |
Protein sequence features
| Feature | Type | Dimension |
|---|---|---|
| Sequence Composition | Frequency of Function Group | 10 |
| Information Entropy | 2 | |
| Distribution | 20 | |
| Transition | 45 | |
| Physicochemical properties | Pseudo Amino Acid Composition | 40 |
| Discrete Wavelet Transformation | 42 | |
| Evolutionary Information | P-PSSM | 400 |
| PSSM-DC | 400 | |
| Bi-Gram PSSM | 400 | |
| ED-PSSM | 400 |
Fig. 4Accuracy of variable length Frag-k fragment prediction when different feature groups and their combinations are applied
Fig. 5Accuracy, specificity, and sensitivity of fragment libraries models
Fig. 6Comparison with existing fold recognition methods on DD-dataset
DeepFrag-k and ProFold folds classifications accuracies for DD-dataset
| # | Fold ID | Fold Name | DeepFrag-k Accuracy | ProFold Accuracy |
|---|---|---|---|---|
| 1 | a.1 | Globin-like | 98.0 | 100.0 |
| 2 | a.3 | Cytochrome c | 95.0 | 100.0 |
| 3 | a.4 | DNA/RNA-binding 3-helical bundle | 85.9 | 60.0 |
| 4 | a.24 | 4-Helical up-and-down bundle | 91.5 | 87.5 |
| 5 | a.26 | 4-Helical cytokines | 98.9 | 88.9 |
| 6 | a.39 | EF hand-like | 90.8 | 77.8 |
| 7 | b.1 | Immunoglobulin-like | 91.1 | 84.1 |
| 8 | b.6 | Cupredoxin-like | 78.7 | 66.7 |
| 9 | b.121 | Nucleoplasmin-like/VP | 91.3 | 92.3 |
| 10 | b.29 | ConA-like lectins/glucanases | 76.7 | 66.7 |
| 11 | b.34 | SH3-like barrel | 78.0 | 50.0 |
| 12 | b.40 | OB-Fold | 80.4 | 68.4 |
| 13 | b.42 | 89.0 | 100.0 | |
| 14 | b.47 | Trypsin-like serine proteases | 75.0 | 50.0 |
| 15 | b.60 | Lipocalins | 90.5 | 100.0 |
| 16 | c.1 | TIM | 93.8 | 93.8 |
| 17 | c.2 | FAD/NAD(P)-binding domain | 89.7 | 91.7 |
| 18 | c.3 | Flavodoxin-like | 60.2 | 46.2 |
| 19 | c.23 | NAD(P)-binding Rossmann | 90.2 | 85.2 |
| 20 | c.37 | P-loop containing NTH | 79.5 | 50.0 |
| 21 | c.47 | Thioredoxin-fold | 97.5 | 87.5 |
| 22 | c.55 | Ribonuclease H-like motif | 75.3 | 58.3 |
| 23 | c.69 | 78.4 | 71.4 | |
| 24 | c.93 | Periplasmic binding protein-like | 92.0 | 100.0 |
| 25 | d.15 | 69.4 | 25.0 | |
| 26 | d.58 | Ferredoxin-like | 76.8 | 59.3 |
| 27 | g.3 | Knottins (small inhibitors, toxins, lectins) | 88.2 | 96.3 |
| Accuracy | 85.3 | 76.2 |
Fig. 7Comparing DeepFrag-k with other fold recognition methods on the TG and EDD datasets
Fig. 8Class activation map for EDD fold classification in DeepFrag-k