| Literature DB >> 29069344 |
Yu Li1, Sheng Wang1, Ramzan Umarov1, Bingqing Xie2, Ming Fan3, Lihua Li3, Xin Gao1.
Abstract
Motivation: Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29069344 PMCID: PMC6030869 DOI: 10.1093/bioinformatics/btx680
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Dataset summary
| Dataset | |||
|---|---|---|---|
| Source | Self-constructed | ||
| Enzymes | 9832 | 22 168 | 284 |
| Non-enzymes | 9850 | 22 168 | — |
Note: The KNN dataset and NEW dataset are used for cross-fold validation. The COFACTOR dataset is used for cross-dataset validation.
Fig. 1.(A) Strategy for predicting detailed function. Following the structure of the EC number system, we use this level-by-level classification approach. (B) Overview of the model. We use the CNN component to extract convolutional features and the RNN component to extract sequential features from each sequence-length-dependent raw feature encoding, followed by a fully connected component, which concatenates all extracted features together, serving as the classifier. Here we show the procedure of predicting the main class digit of three enzymes with different sequence lengths
Fig. 2.Cross-fold validation results. (A) Performance comparison of Level 0 prediction (predicting whether the input is an enzyme or not) on the KNN dataset. (B) Performance comparison of Level 1 prediction (predicting the input enzyme’s main class) on the KNN dataset. (C) Performance comparison of Level 2 prediction (predicting the input enzyme’s subclass given the main class) on the KNN dataset. (D) Performance comparison of Level 0 prediction on the NEW dataset. (E) Performance comparison of Level 1 prediction on the NEW dataset. (F) Performance comparison of Level 2 prediction on the NEW dataset
Fig. 3.(A) Feature contribution investigation considering sequence one-hot encoding (sequence), PSSM and FunD. (B) The performance change of the model before and after we input more local feature encoding. Macro-precision, Macro-recall and Macro-F1 score are improved by at least 11% by inputting solvent accessibility and secondary structure information
Fig. 4.The performance comparison of different servers on predicting the main class of the COFACTOR dataset. DEEPre improves the prediction accuracy over the other servers by at least 6%