| Literature DB >> 31212665 |
Ruibo Gao1, Mengmeng Wang2, Jiaoyan Zhou3, Yuhang Fu4, Meng Liang5, Dongliang Guo6, Junlan Nie7.
Abstract
During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.Entities:
Keywords: DCNN; amino acid sequence; enzyme function prediction; mutation information
Year: 2019 PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Prediction accuracy of enzymatic function in different models with two architectures.
| Architecture 1 | Architecture 2 | ||||
|---|---|---|---|---|---|
| Class | Sample | AD-CNN | ADL-DCNN | AD-CNN | ADL-DCNN |
| EC1 | 16,669 |
|
|
|
|
| EC2 | 1893 |
|
|
|
|
| EC3 | 1757 |
|
|
|
|
| EC4 | 3102 |
|
|
|
|
| EC5 | 7968 |
|
|
|
|
| EC6 | 7968 |
|
|
|
|
| Total | 43,843 | 85.97 | 90.01 | 90.83 | 92.34 |
Matrices for each fusion scheme and feature maps.
| Prediction by Architecture 1 | Prediction by Architecture 2 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Feature Maps | Class | EC1 | EC2 | EC3 | EC4 | EC5 | EC6 | EC1 | EC2 | EC3 | EC4 | EC5 | EC6 |
| AD-CNN | EC1 |
| 0.33 | 0.54 | 0.39 | 1.60 | 5.08 |
| 0.03 | 0.15 | 0.18 | 1.30 | 4.02 |
| EC2 | 10.46 |
| 1.79 | 2.30 | 6.63 | 17.09 | 9.95 |
| 0.26 | 1.02 | 4.34 | 9.44 | |
| EC3 | 14.57 | 0.84 |
| 0.28 | 6.16 | 20.73 | 10.92 | 0.00 |
| 0.56 | 3.92 | 12.04 | |
| EC4 | 8.77 | 0.80 | 0.64 |
| 4.94 | 9.41 | 5.26 | 0.16 | 0.32 |
| 4.94 | 7.34 | |
| EC5 | 5.53 | 0.57 | 0.63 | 0.69 |
| 4.97 | 4.15 | 0.00 | 0.06 | 0.19 |
| 3.58 | |
| EC6 | 8.06 | 0.44 | 0.44 | 1.08 | 2.57 |
| 4.81 | 0.16 | 0.20 | 0.28 | 1.76 |
| |
| ADL-DCNN | EC1 |
| 0.21 | 0.36 | 0.45 | 1.12 | 4.69 |
| 0.00 | 0.09 | 0.03 | 0.73 | 3.30 |
| EC2 | 7.65 |
| 2.81 | 4.85 | 1.60 | 8.67 | 9.95 |
| 0.77 | 0.77 | 4.08 | 10.20 | |
| EC3 | 7.84 | 0.28 |
| 1.12 | 1.96 | 14.85 | 10.08 | 0.00 |
| 0.00 | 2.52 | 13.00 | |
| EC4 | 6.86 | 1.28 | 0.48 |
| 3.03 | 7.18 | 4.63 | 0.16 | 0.16 |
| 3.83 | 7.97 | |
| EC5 | 3.71 | 0.53 | 0.19 | 0.50 |
| 2.83 | 3.40 | 0.00 | 0.00 | 0.06 |
| 2.70 | |
| EC6 | 6.75 | 0.40 | 0.64 | 0.60 | 1.68 |
| 4.53 | 0.04 | 0.08 | 0.12 | 0.84 |
| |
Figure 1The precision change of two different models. The two red curves show training and test accuracy of the model AD-CNN change with the number of iterations varying. Likewise, the two blue curves indicate two precisions of the model in this paper vary with the number of iterations changing.
Figure 2Error change of different the number of iterations.
Figure 3ROC curves for each enzymatic class for Architecture 2.
Figure 4Feature maps of peptide chain mutation information.
Figure 5Torsion angles feature maps.
Figure 6Feature maps of paiwise amino acid distances.
Figure 7Framework of the DCNN.