| Literature DB >> 30723495 |
Zhenzhen Zou1, Shuye Tian2, Xin Gao1, Yu Li1.
Abstract
As a great challenge in bioinformatics, enzyme function prediction is a significant step toward designing novel enzymes and diagnosing enzyme-related diseases. Existing studies mainly focus on the mono-functional enzyme function prediction. However, the number of multi-functional enzymes is growing rapidly, which requires novel computational methods to be developed. In this paper, following our previous work, DEEPre, which uses deep learning to annotate mono-functional enzyme's function, we propose a novel method, mlDEEPre, which is designed specifically for predicting the functionalities of multi-functional enzymes. By adopting a novel loss function, associated with the relationship between different labels, and a self-adapted label assigning threshold, mlDEEPre can accurately and efficiently perform multi-functional enzyme prediction. Extensive experiments also show that mlDEEPre can outperform the other methods in predicting whether an enzyme is a mono-functional or a multi-functional enzyme (mono-functional vs. multi-functional), as well as the main class prediction across different criteria. Furthermore, due to the flexibility of mlDEEPre and DEEPre, mlDEEPre can be incorporated into DEEPre seamlessly, which enables the updated DEEPre to handle both mono-functional and multi-functional predictions without human intervention.Entities:
Keywords: EC number; deep learning; function prediction; hierarchical classification; multi-functional enzyme; multi-label learning
Year: 2019 PMID: 30723495 PMCID: PMC6349967 DOI: 10.3389/fgene.2018.00714
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The hierarchical classification strategy combining DEEPre and mlDEEPre. When we input a protein sequence to the system, we first use DEEPre level 0 to predict whether the input sequence is an enzyme or not. If it is an enzyme, we use mlDEEPre level 1 to predict whether the enzyme is a mono-functional enzyme or a multi-functional enzyme. If it is a mono-functional enzyme, DEEPre will take over. If not, we use mlDEEPre to predict that multi-functional enzyme's main classes. Inputting the main classes and the sequence to DEEPre, we can obtain the full annotation for each function of the enzyme.
Dataset I: 22,168 single-labeled enzymes.
| Name | Oxidoreductase | Transferase | Hydrolase | Lyase | Isomerase | Ligase |
| Number | 3343 | 8517 | 5917 | 1532 | 1193 | 1666 |
Dataset II: 1,085 multi-labeled enzymes with 65% sequence similarity cut-off.
| Name | Oxidoreductase | Transferase | Hydrolase | Lyase | Isomerase | Ligase | |
| Before redundancy | 1534 | 1924 | 2657 | 1698 | 616 | 179 | 4076 |
| After CD-HIT | 386 | 503 | 689 | 473 | 137 | 52 | 1085 |
Figure 2The deep learning model architecture. We use a convolutional neural network component to deal with sequence-length dependent features, such as PSSM and sequence one-hot encoding, and fully connected neural network component to handle functional domain encoding. After those components, we concatenate their outputs into one vector, which is fed to a fully connected classifier. We apply a threshold function to the output of the model to obtain the labels of the input sequence.
Figure 3The mono-functional enzyme VS multi-functional enzyme classification testing performance of different models. Performance lower than 0.6 are not shown in the figure.
The multi-functional classification performance of mlDEEPre on dataset II shown in Table 3.
| 3.3 ± 0.4% | 82.6 ± 2.7% | 96.7 ± 0.3% | 96.4 ± 0.6% |
| 96.5 ± 0.5% | 96.7 ± 0.4% | 95.1 ± 1.6% | 96.2 ± 0.8% |
Figure 4The multi-functional classification testing performance of different models. Performance lower than 0.65 are not shown in the figure.
Dataset II: 4,076 multi-labeled enzymes.
| EC numbers | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 4 | 4 | 1 | 1 | 1 | 1 |
| 2 | 3 | 4 | 2 | ||||||||||||||
| 2 | 3 | 4 | 5 | 3 | 4 | 5 | 6 | 4 | 5 | 6 | 5 | 6 | 4 | 6 | 5 | 3 | |
| 4 | |||||||||||||||||
| Number | 147 | 841 | 63 | 37 | 1148 | 235 | 38 | 131 | 622 | 22 | 4 | 308 | 34 | 215 | 10 | 211 | 10 |
This table shows the number of multi-functional enzymes in the dataset with different EC main class combinations.