| Literature DB >> 31671806 |
Riccardo Concu1, M Natália D S Cordeiro2.
Abstract
The Enzyme Classification (EC) number is a numerical classification scheme for enzymes, established using the chemical reactions they catalyze. This classification is based on the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Six enzyme classes were recognised in the first Enzyme Classification and Nomenclature List, reported by the International Union of Biochemistry in 1961. However, a new enzyme group was recently added as the six existing EC classes could not describe enzymes involved in the movement of ions or molecules across membranes. Such enzymes are now classified in the new EC class of translocases (EC 7). Several computational methods have been developed in order to predict the EC number. However, due to this new change, all such methods are now outdated and need updating. In this work, we developed a new multi-task quantitative structure-activity relationship (QSAR) method aimed at predicting all 7 EC classes and subclasses. In so doing, we developed an alignment-free model based on artificial neural networks that proved to be very successful.Entities:
Keywords: QSAR; alignment-free; artificial neural network; enzyme; enzyme classification; machine learning
Mesh:
Substances:
Year: 2019 PMID: 31671806 PMCID: PMC6862210 DOI: 10.3390/ijms20215389
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Accuracy for the linear discriminant analysis (LDA) model.
| Training | Validation | Overall | |||||||
|---|---|---|---|---|---|---|---|---|---|
| All | −1 = Sn | 1 = Sp | All | −1 = Sn | 1 = Sp | All | −1 = Sn | 1 = Sp | |
| −1 | 98.13 | 40,781 | 778 | 98.27 | 13,613 | 240 | 98.16 | 54,394 | 1018 |
| 1 | 99.7 | 57 | 19,498 | 99.71 | 19 | 6498 | 99.71 | 76 | 25,996 |
| Total | 98.63 | 40,838 | 20,276 | 98.73 | 13,632 | 6738 | 98.66 | 54,470 | 27,014 |
Relevant statistics for the LDA model.
| Eigenvalue | CanonicalR | Wilk’sLambda | Chi-Sqr. | df | MCC | |
|---|---|---|---|---|---|---|
| 1.241879 | 0.744275 | 0.446054 | 49334.99 | 4.000000 | 0.00 | 0.97 |
Performance of the best multi-layer perceptron (MLP) model found.
| Obs. Sets a | Stat. Param. a | Pred. Stat. a | Predicted sets | ||
|---|---|---|---|---|---|
| 1 | −1 | nj | |||
|
| |||||
| 1 | Sp a | 100 | 17,500 | 0 | 57,039 |
| −1 | Sn a | 100 | 0 | 39,539 | 0 |
| total | Ac a | 100 | 17,500 | 39,539 | 57,039 |
|
| |||||
| 1 | Sp a | 100 | 8572 | 0 | 24,445 |
| −1 | Sn a | 100 | 0 | 15,873 | 0 |
| total | Ac a | 100 | 8572 | 15,873 | 24,445 |
|
| |||||
| 1 | Sp a | 100 | 26,072 | 0 | 81,484 |
| −1 | Sn a | 100 | 0 | 55,412 | 0 |
| total | Ac a | 100 | 26,072 | 55,412 | 81,484 |
a Obs. Sets = Observed sets, Stat. Param. = Statistical parameter, Pred. Stat. =Predicted statistics, Sp = Specificity, Sn = Sensitivity, Ac =Accuracy.
Resumé of the 10 best MLP and radial basis function (RBF) models.
| Training | Validation | Overall | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | −1 = Sn | 1 = Sp | All | −1 = Sn | 1 = Sp | All | −1 = Sn | 1 = Sp | All | |
| BEST | Total | 55,412 | 26,072 | 81,484 | 55,412 | 26,072 | 81,484 | 55,412 | 26,072 | 81,484 |
| Correct | 55,412 | 26,072 | 81,484 | 55,412 | 26,072 | 81,484 | 55,412 | 26,072 | 81,484 | |
| Incorrect | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Correct (%) | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | |
| Incorrect (%) | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| 1.MLP 4-7-2 | Total | 39,448 | 17,591 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,448 | 17,567 | 57,015 | 15,873 | 8562 | 24,435 | 55,412 | 26,034 | 81,446 | |
| Incorrect | 0 | 24 | 24 | 0 | 10 | 10 | 0 | 38 | 38 | |
| Correct (%) | 100 | 99.86 | 99.96 | 100.00 | 99.88 | 99.96 | 100.00 | 99.85 | 99.95 | |
| Incorrect (%) | 0 | 0.14 | 0.04 | 0.00 | 0.12 | 0.04 | 0.00 | 0.15 | 0.05 | |
| 2.MLP 4-8-2 | Total | 39,448 | 17,591 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,448 | 17,565 | 57,013 | 15,873 | 8563 | 24,436 | 55,412 | 26,037 | 81,449 | |
| Incorrect | 0 | 26 | 26 | 0 | 9 | 9 | 0 | 35 | 35 | |
| Correct (%) | 100 | 99.85 | 99.95 | 100.00 | 99.90 | 99.96 | 100.00 | 99.87 | 99.96 | |
| Incorrect (%) | 0 | 0.15 | 0.05 | 0.00 | 0.10 | 0.04 | 0.00 | 0.13 | 0.04 | |
| 3.MLP 4-10-2 | Total | 39,448 | 17,591 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,448 | 17,565 | 57,013 | 15,873 | 8563 | 24,436 | 55,412 | 26,037 | 81,449 | |
| Incorrect | 0 | 26 | 26 | 0 | 9 | 9 | 0 | 35 | 35 | |
| Correct (%) | 100 | 99.85 | 99.95 | 100.00 | 99.90 | 99.96 | 100.00 | 99.87 | 99.96 | |
| Incorrect (%) | 0 | 0.15 | 0.05 | 0.00 | 0.10 | 0.04 | 0.00 | 0.13 | 0.04 | |
| 4.MLP 4-11-2 | Total | 39,448 | 17,591 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,448 | 17,566 | 57,014 | 15,873 | 8563 | 24,436 | 55,412 | 26,037 | 81,449 | |
| Incorrect | 0 | 25 | 25 | 0 | 9 | 9 | 0 | 35 | 35 | |
| Correct (%) | 100 | 99.86 | 99.96 | 100.00 | 99.90 | 99.96 | 100.00 | 99.87 | 99.96 | |
| Incorrect (%) | 0 | 0.14 | 0.04 | 0.00 | 0.10 | 0.04 | 0.00 | 0.13 | 0.04 | |
| 5.MLP 4-16-2 | Total | 39,448 | 17,591 | 57,039 | 15,873 | 8572 | 24,445 | 55,321 | 26,163 | 81,484 |
| Correct | 39,448 | 17,567 | 57,015 | 15,873 | 8572 | 24,445 | 55,321 | 26,139 | 81,460 | |
| Incorrect | 0 | 24 | 24 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Correct (%) | 100 | 99.86 | 99.96 | 100.00 | 100.00 | 100.00 | 100.00 | 99.91 | 99.97 | |
| Incorrect (%) | 0 | 0.14 | 0.04 | 0.00 | 0.00 | 0.00 | 0.00 | 0.09 | 0.03 | |
| 6.RBF 4-21-2 | Total | 39,539 | 17,500 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,520 | 16,426 | 55,946 | 15,855 | 8059 | 23,914 | 55,375 | 24,485 | 79,860 | |
| Incorrect | 19 | 1074 | 1093 | 18 | 513 | 531 | 37 | 1587 | 1624 | |
| Correct (%) | 99.95 | 93.86 | 98.08 | 99.89 | 94.02 | 97.83 | 99.93 | 93.91 | 98.01 | |
| Incorrect (%) | 0.05 | 6.14 | 1.92 | 0.11 | 5.98 | 2.17 | 0.07 | 6.09 | 1.99 | |
| 7.RBF 4-29-2 | Total | 39,539 | 17,500 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,165 | 17,475 | 56,640 | 15,714 | 8561 | 24,275 | 54,879 | 26,036 | 80,915 | |
| Incorrect | 374 | 25 | 399 | 159 | 11 | 170 | 533 | 36 | 569 | |
| Correct (%) | 99.05 | 99.86 | 99.3 | 99.00 | 99.87 | 99.30 | 99.04 | 99.86 | 99.30 | |
| Incorrect (%) | 0.95 | 0.14 | 0.7 | 1.00 | 0.13 | 0.70 | 0.96 | 0.14 | 0.70 | |
| 8.RBF 4-21-2 | Total | 39,539 | 17,500 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 39,526 | 16,138 | 55,664 | 15,868 | 7873 | 23,741 | 55,394 | 24,011 | 79,405 | |
| Incorrect | 13 | 1362 | 1375 | 5 | 699 | 704 | 18 | 2061 | 2079 | |
| Correct (%) | 99.97 | 92.22 | 97.59 | 99.97 | 91.85 | 97.12 | 99.97 | 92.09 | 97.45 | |
| Incorrect (%) | 0.03 | 7.78 | 2.41 | 0.03 | 8.15 | 2.88 | 0.03 | 7.91 | 2.55 | |
| 9.RBF 4-28-2 | Total | 39,539 | 17,500 | 57,039 | 15,197 | 8571 | 23,768 | 53,008 | 26,060 | 81,484 |
| Correct | 39,489 | 16,000 | 23,489 | 15,197 | 8448 | 23,645 | 53,008 | 25,674 | 78,682 | |
| Incorrect | 50 | 1500 | 1,450 | 0 | 123 | 123 | 0 | 386 | 386 | |
| Correct (%) | 99.87 | 91.43 | 95.65 | 100.00 | 98.56 | 99.48 | 100.00 | 98.52 | 99.51 | |
| Incorrect (%) | 0.03 | 7.78 | 4.35 | 0.00 | 1.44 | 0.52 | 0.00 | 1.48 | 0.49 | |
| 10.RBF 4-26-2 | Total | 39,539 | 17,500 | 57,039 | 15,873 | 8572 | 24,445 | 55,412 | 26,072 | 81,484 |
| Correct | 11,880 | 6629 | 18,509 | 4748 | 3170 | 7918 | 16,628 | 9799 | 26,427 | |
| Incorrect | 27659 | 10871 | 38530 | 11125 | 5402 | 16527 | 38784 | 16273 | 55057 | |
| Correct (%) | 30.05 | 37.88 | 32.45 | 29.91 | 36.98 | 32.39 | 30.01 | 37.58 | 32.43 | |
| Incorrect (%) | 69.95 | 62.12 | 67.55 | 70.09 | 63.02 | 67.61 | 69.99 | 62.42 | 67.57 | |
Quantitative analysis of the non-optimal MLP models.
| Model | Class | Fail | Total Class |
|---|---|---|---|
| 1. MLP 4-7-2 | 6.4 | 1 | 104 |
| 6.5 | 34 | 36 | |
| 2. MLP 4-8-2 | 1.6 | 3 | 4 |
| 6.4 | 1 | 104 | |
| 6.5 | 34 | 36 | |
| 3. MLP 4-10-2 | 1.6 | 3 | 4 |
| 6.4 | 1 | 104 | |
| 6.5 | 33 | 36 | |
| 4. MLP 4-11-2 | 1.6 | 3 | 4 |
| 6.4 | 1 | 104 | |
| 6.5 | 32 | 36 | |
| 5. MLP 4-16-2 | 6.4 | 1 | 104 |
| 6.5 | 33 | infer 36 |
Sensitivity analysis for the artificial neural network (ANN) model.
| Input Variable | Variable Sensitivity | Variable Name/Details |
|---|---|---|
| <Tr5(srn)> | 15,896,991 | Expected value of Trace of order 5 of the srn for the sequence |
| D Tr5(srn) | 1,288,626 | Deviation of Trace of order 5 of the srn with respect to the mean value of the class |
| <Tr3(srn)> | 591,331.9 | Expected value of Trace of order 3 of the srn for the sequence |
| D Tr3(srn) | 108.7591 | Deviation of Trace of order 3 of the srn with respect to the mean value of the class |
Number of entries for each subclass.
| EC Subclass | Number of Sequences | EC Subclass | Number of Sequences | EC Subclass | Number of Sequences |
|---|---|---|---|---|---|
| 1.1 | 555 | 2.3 | 722 | 4.6 | 120 |
| 1.2 | 250 | 2.4 | 424 | 4.99 | 95 |
| 1.3 | 172 | 2.5 | 291 | 5.1 | 176 |
| 1.4 | 108 | 2.6 | 19 | 5.2 | 74 |
| 1.5 | 5 | 2.7 | 3112 | 5.3 | 247 |
| 1.6 | 4 | 2.8 | 71 | 5.4 | 160 |
| 1.7 | 91 | 2.9 | 10 | 5.5 | 115 |
| 1.8 | 165 | 3.1 | 1559 | 5.6 | 159 |
| 1.9 | 73 | 3.11 | 7 | 5.99 | 3 |
| 1.10 | 555 | 3.13 | 3 | 6.1 | 277 |
| 1.11 | 136 | 3.2 | 700 | 6.2 | 38 |
| 1.12 | 32 | 3.3 | 164 | 6.3 | 291 |
| 1.13 | 123 | 3.4 | 1481 | 6.4 | 104 |
| 1.14 | 244 | 3.5 | 561 | 6.5 | 36 |
| 1.15 | 162 | 3.6 | 417 | 7.1 | 8827 |
| 1.16 | 173 | 3.7 | 69 | 7.2 | 927 |
| 1.17 | 121 | 3.8 | 77 | 7.4 | 189 |
| 1.18 | 45 | 3.9 | 3 | 7.5 | 187 |
| 1.20 | 250 | 4.1 | 486 | 7.6 | 197 |
| 1.21 | 28 | 4.2 | 460 | ||
| 1.23 | 3 | 4.3 | 97 | ||
| 2.1 | 522 | 4.4 | 39 | ||
| 2.2 | 107 | 4.5 | 25 |