| Literature DB >> 17705863 |
Serene A K Ong1, Hong Huang Lin, Yu Zong Chen, Ze Rong Li, Zhiwei Cao.
Abstract
BACKGROUND: Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17705863 PMCID: PMC1997217 DOI: 10.1186/1471-2105-8-300
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Protein descriptors commonly used for predicting protein functional families.
| D1 | Amino acid composition | 1 | 20 | Sequence composition | [23] | |
| D2 | Dipeptide composition | 1 | 400 | Sequence composition | [24] | |
| D3 | Normalized Moreau – Broto autocorrelation | 8 | 240 | Correlation of physicochemical properties | Hydrophobicity scale, average flexibility index, polarizability parameter, free energy of amino acid solution in water, residue accessible surface area, amino acid residue volume, steric parameters, relative mutability | [25, 26] |
| D4 | Moran autocorrelation | 8 | 240 | Correlation of physicochemical properties | Hydrophobicity scale, average flexibility index, polarizability parameter, free energy of amino acid solution in water, residue accessible surface area, amino acid residue volume, steric parameters, relative mutability | [27] |
| D5 | Geary autocorrelation | 8 | 240 | Square correlation of physicochemical properties | Hydrophobicity scale, average flexibility index, polarizability parameter, free energy of amino acid solution in water, residue accessible surface area, amino acid residue volume, steric parameters, relative mutability | [28] |
| D6 | Descriptors of composition, transition and distribution | 21 | 147 | Distribution and variation of physicochemical properties | Hydrophobicity, Van der Waals volume, polarity, polarizability, charge, secondary structures, solvent accessibility | [2-6, 8, 17, 29, 30] |
| D7 | Quasi sequence order | 4 | 160 | Combination of sequence composition and correlation of physicochemical | Hydrophobicity, hydrophilicity, polarity, side-chain volume | [10, 11, 18, 31] |
| D8 | Pseudo amino acid composition | 3 | 298 | Combination of sequence composition and square correlation of physicochemical | Hydrophobicity, hydrophilicity, side chain mass | [23, 32] |
| D9 | Combination of amino acid and dipeptide composition | 2 | 420 | Combination of sequence compositions | ||
| D10 | Combination of all eight sets of descriptors | 54 | 1745 | Combination of all sets |
Summary of datasets statistics, including size of training, testing and independent evaluation sets, and average sequence length.
| P | N | P | N | P | N | P | N | ||
| EC2.4 | 3304 | 14373 | 1382 | 5068 | 1022 | 5859 | 900 | 3446 | 460 |
| GPCR | 2819 | 21515 | 1580 | 7389 | 717 | 7333 | 522 | 6793 | 498 |
| TC8.A | 229 | 23096 | 94 | 7962 | 72 | 7962 | 63 | 7172 | 483 |
| Chlorophyll | 999 | 22997 | 356 | 7928 | 333 | 7928 | 310 | 7141 | 480 |
| Lipid | 2192 | 11537 | 850 | 5779 | 707 | 4483 | 635 | 1275 | 312 |
| rRNA | 5855 | 13770 | 2004 | 5246 | 1940 | 4953 | 1911 | 3571 | 376 |
Dataset training statistics and prediction accuracies of six protein functional families. DS refers to descriptor set, where D1 = amino acid composition; D2 = dipeptide composition; D3 = Moreau-Broto autocorrelation; D4 = Moran autocorrelation; D5 = Geary autocorrelation; D6 = composition, transition and distribution descriptors; D7 = quasi sequence order; D8 = pseudo amino acid composition; D9 = combination of D1+D2; and D10 = combination of D1-D8. Predicted results given as TP (true positive), FN (false negative), TN (true negative), FP (false positive), Sen (sensitivity), Spec (specificity), Q (overall accuracy) and MCC (Matthews correlation coefficient).
| P | N | P | N | P | N | Q(%) | MCC | ||||||||
| TP | FN | TN | FP | TP | FN | Sen(%) | TN | FP | Spec(%) | ||||||
| EC2.4 | D1 | 1249 | 2120 | 1154 | 1 | 9065 | 12 | 724 | 176 | 80.4 | 3244 | 202 | 94.1 | 91.3 | 0.74 |
| D2 | 1319 | 2120 | 1080 | 5 | 8806 | 1 | 646 | 154 | 82.9 | 3349 | 97 | 97.2 | 94.1 | 0.80 | |
| D3 | 1105 | 1756 | 1295 | 4 | 9166 | 5 | 768 | 132 | 85.3 | 3394 | 52 | 98.5 | 95.8 | 0.87 | |
| D4 | 1239 | 2221 | 1161 | 4 | 8701 | 5 | 756 | 144 | 84.0 | 3365 | 81 | 97.7 | 94.8 | 0.84 | |
| D5 | 1242 | 2223 | 1160 | 2 | 8690 | 14 | 753 | 147 | 83.6 | 3391 | 55 | 98.4 | 95.4 | 0.85 | |
| D6 | 1214 | 2077 | 1145 | 45 | 8846 | 4 | 741 | 159 | 82.3 | 3383 | 63 | 98.2 | 94.9 | 0.84 | |
| D7 | 1293 | 2624 | 1072 | 39 | 8295 | 8 | 696 | 204 | 77.3 | 3270 | 176 | 94.9 | 91.3 | 0.73 | |
| D8 | 1226 | 3008 | 1177 | 1 | 7918 | 1 | 794 | 106 | 88.2 | 3387 | 59 | 98.3 | 96.2 | 0.88 | |
| D9 | 1275 | 2747 | 1129 | 0 | 8177 | 3 | 782 | 118 | 86.9 | 3367 | 79 | 97.7 | 95.5 | 0.86 | |
| D10 | 1228 | 3254 | 1176 | 0 | 7672 | 1 | 798 | 102 | 88.7 | 3397 | 49 | 98.6 | 96.5 | 0.89 | |
| GPCR | D1 | 1590 | 7458 | 1847 | 1 | 14166 | 3 | 505 | 17 | 96.7 | 6735 | 58 | 99.1 | 99.0 | 0.93 |
| D2 | 564 | 711 | 1728 | 3 | 14121 | 5 | 510 | 12 | 97.7 | 6737 | 56 | 99.2 | 99.1 | 0.93 | |
| D3 | 1169 | 4628 | 1122 | 4 | 10208 | 1 | 507 | 15 | 97.1 | 6737 | 56 | 99.2 | 99.0 | 0.93 | |
| D4 | 1257 | 4474 | 1037 | 1 | 10363 | 0 | 499 | 23 | 95.6 | 6745 | 48 | 99.3 | 99.0 | 0.93 | |
| D5 | 1290 | 4724 | 997 | 8 | 10113 | 0 | 494 | 28 | 94.6 | 6734 | 59 | 99.1 | 98.8 | 0.91 | |
| D6 | 757 | 2060 | 1536 | 2 | 12777 | 0 | 503 | 19 | 96.3 | 6742 | 51 | 99.2 | 99.0 | 0.93 | |
| D7 | 812 | 2950 | 1482 | 1 | 11887 | 0 | 495 | 27 | 94.8 | 6696 | 97 | 98.6 | 98.3 | 0.88 | |
| D8 | 653 | 2171 | 1644 | 0 | 12550 | 1 | 501 | 21 | 96.0 | 6769 | 24 | 99.7 | 99.4 | 0.95 | |
| D9 | 1590 | 7458 | 693 | 12 | 7322 | 57 | 512 | 10 | 98.1 | 6735 | 58 | 99.1 | 99.1 | 0.93 | |
| D10 | 672 | 2454 | 1625 | 0 | 12268 | 0 | 502 | 20 | 96.2 | 6757 | 36 | 99.5 | 99.2 | 0.94 | |
| TC8.A | D1 | 118 | 2858 | 49 | 0 | 13121 | 0 | 36 | 27 | 57.1 | 1843 | 2 | 99.9 | 98.5 | 0.73 |
| D2 | 116 | 1100 | 50 | 0 | 14824 | 0 | 41 | 22 | 65.1 | 1843 | 2 | 99.9 | 98.7 | 0.78 | |
| D3 | 94 | 7962 | 53 | 0 | 14501 | 0 | 42 | 21 | 66.7 | 1842 | 3 | 98.6 | 98.7 | 0.78 | |
| D4 | 94 | 7962 | 47 | 0 | 11250 | 0 | 37 | 26 | 58.7 | 1843 | 2 | 99.9 | 98.5 | 0.74 | |
| D5 | 94 | 7962 | 47 | 0 | 11137 | 0 | 37 | 26 | 58.7 | 1843 | 2 | 99.9 | 98.5 | 0.74 | |
| D6 | 94 | 7962 | 64 | 0 | 15283 | 0 | 44 | 19 | 69.8 | 1843 | 2 | 99.9 | 98.9 | 0.81 | |
| D7 | 94 | 7962 | 59 | 0 | 15045 | 0 | 43 | 20 | 68.3 | 1843 | 2 | 99.9 | 98.9 | 0.80 | |
| D8 | 103 | 943 | 63 | 0 | 14981 | 0 | 48 | 15 | 76.2 | 1843 | 2 | 99.9 | 99.1 | 0.85 | |
| D9 | 114 | 810 | 52 | 0 | 15114 | 0 | 41 | 22 | 65.1 | 1843 | 2 | 99.9 | 98.7 | 0.78 | |
| D10 | 102 | 1068 | 64 | 0 | 14856 | 0 | 48 | 15 | 76.2 | 1843 | 2 | 99.9 | 99.1 | 0.85 | |
| Chlorophyll | D1 | 356 | 7928 | 166 | 0 | 14297 | 0 | 182 | 128 | 58.7 | 1587 | 11 | 99.3 | 92.7 | 0.71 |
| D2 | 4S40 | 934 | 248 | 1 | 7927 | 1 | 228 | 82 | 73.6 | 1595 | 3 | 99.8 | 95.6 | 0.83 | |
| D3 | 425 | 603 | 264 | 0 | 15253 | 0 | 246 | 64 | 79.4 | 1594 | 4 | 99.8 | 96.4 | 0.86 | |
| D4 | 415 | 574 | 273 | 1 | 15282 | 0 | 247 | 65 | 79.7 | 1597 | 1 | 99.9 | 96.6 | 0.87 | |
| D5 | 429 | 615 | 259 | 1 | 15240 | 1 | 233 | 77 | 75.2 | 1597 | 1 | 99.9 | 95.9 | 0.84 | |
| D6 | 482 | 946 | 202 | 5 | 14910 | 0 | 205 | 105 | 66.1 | 1597 | 1 | 99.9 | 94.4 | 0.79 | |
| D7 | 394 | 3337 | 210 | 85 | 12517 | 2 | 178 | 132 | 57.4 | 1597 | 1 | 99.9 | 93.0 | 0.73 | |
| D8 | 371 | 1421 | 317 | 1 | 14435 | 0 | 255 | 55 | 82.3 | 1593 | 5 | 99.7 | 96.9 | 0.88 | |
| D9 | 399 | 1273 | 289 | 1 | 14582 | 1 | 249 | 61 | 80.3 | 1591 | 7 | 99.6 | 96.4 | 0.86 | |
| D10 | 381 | 1753 | 307 | 1 | 14102 | 1 | 251 | 59 | 81.0 | 1594 | 4 | 99.8 | 96.7 | 0.88 | |
| Lipid synthesis | D1 | 849 | 2026 | 705 | 3 | 8229 | 7 | 470 | 165 | 74.0 | 1218 | 57 | 95.5 | 88.4 | 0.73 |
| D2 | 927 | 2037 | 629 | 1 | 8225 | 0 | 512 | 123 | 80.6 | 1259 | 16 | 98.6 | 92.7 | 0.84 | |
| D3 | 898 | 2968 | 659 | 0 | 7294 | 0 | 509 | 126 | 80.2 | 1271 | 4 | 99.7 | 93.2 | 0.84 | |
| D4 | 968 | 3227 | 588 | 1 | 7035 | 0 | 493 | 142 | 77.6 | 1273 | 2 | 99.8 | 92.5 | 0.83 | |
| D5 | 970 | 3280 | 586 | 1 | 6982 | 0 | 491 | 144 | 77.3 | 1260 | 15 | 98.8 | 91.7 | 0.81 | |
| D6 | 874 | 2112 | 681 | 2 | 8149 | 1 | 525 | 110 | 82.7 | 1268 | 7 | 99.5 | 93.9 | 0.86 | |
| D7 | 863 | 2415 | 692 | 2 | 7845 | 2 | 512 | 123 | 80.6 | 1271 | 4 | 99.7 | 93.4 | 0.85 | |
| D8 | 907 | 1608 | 615 | 0 | 4488 | 0 | 498 | 137 | 78.4 | 1268 | 7 | 99.5 | 92.5 | 0.83 | |
| D9 | 815 | 1613 | 740 | 2 | 8638 | 11 | 525 | 110 | 82.7 | 1248 | 27 | 97.9 | 92.8 | 0.84 | |
| D10 | 865 | 1640 | 657 | 0 | 4456 | 0 | 531 | 104 | 83.6 | 1268 | 7 | 99.5 | 94.2 | 0.87 | |
| rRNA binding | D1 | 548 | 579 | 3390 | 6 | 9598 | 22 | 1824 | 87 | 95.5 | 3511 | 60 | 98.3 | 97.3 | 0.94 |
| D2 | 1133 | 1225 | 2811 | 0 | 8974 | 0 | 1844 | 67 | 96.5 | 3519 | 52 | 98.5 | 97.8 | 0.95 | |
| D3 | 1126 | 1638 | 2816 | 2 | 8560 | 1 | 1812 | 99 | 94.8 | 3535 | 36 | 99.0 | 97.5 | 0.95 | |
| D4 | 1337 | 1958 | 2697 | 0 | 8241 | 0 | 1783 | 128 | 93.3 | 3484 | 87 | 97.6 | 96.1 | 0.91 | |
| D5 | 1372 | 1976 | 2572 | 0 | 8223 | 0 | 1784 | 127 | 93.4 | 3479 | 92 | 97.4 | 96.0 | 0.91 | |
| D6 | 921 | 1208 | 2971 | 52 | 8991 | 0 | 1824 | 87 | 95.5 | 3541 | 30 | 99.2 | 97.9 | 0.95 | |
| D7 | 878 | 2743 | 3040 | 26 | 7442 | 14 | 1808 | 103 | 97.9 | 3481 | 90 | 97.5 | 96.5 | 0.92 | |
| D8 | 810 | 2245 | 3143 | 0 | 7954 | 0 | 1849 | 62 | 96.8 | 3541 | 30 | 99.2 | 98.3 | 0.96 | |
| D9 | 810 | 972 | 3075 | 3 | 9182 | 2 | 1848 | 63 | 96.7 | 3526 | 45 | 98.7 | 98.0 | 0.96 | |
| D10 | 900 | 2600 | 3044 | 0 | 7599 | 0 | 1858 | 53 | 97.2 | 3547 | 24 | 99.3 | 98.6 | 0.97 | |
Dataset statistics and prediction accuracies after homologous sequences removal (HSR) at 90% and 70% identity. DS refers to descriptor set, where D1 = amino acid composition; D2 = dipeptide composition; D3 = Moreau-Broto autocorrelation; D4 = Moran autocorrelation; D5 = Geary autocorrelation; D6 = composition, transition and distribution descriptors; D7 = quasi sequence order; D8 = pseudo amino acid composition; D9 = combination of D1+D2; and D10 = combination of D1-D8. Predicted results given as TP (true positive), FN (false negative), TN (true negative), FP (false positive), Sen (sensitivity), Spec (specificity), Q (overall accuracy) and MCC (Matthews correlation coefficient).
| P | N | Q (%) | MCC | |||||||
| TP | FN | Sen(%) | TN | FP | Spec(%) | |||||
| EC2.4 | 90 | D1 | 552 | 250 | 68.8 | 3235 | 201 | 94.2 | 89.4 | 0.65 |
| D2 | 626 | 176 | 78.1 | 3339 | 97 | 97.2 | 93.6 | 0.78 | ||
| D3 | 609 | 193 | 75.9 | 3384 | 52 | 98.5 | 94.2 | 0.80 | ||
| D4 | 603 | 199 | 75.2 | 3355 | 81 | 97.6 | 93.4 | 0.78 | ||
| D5 | 591 | 211 | 73.7 | 3381 | 55 | 98.4 | 93.7 | 0.79 | ||
| D6 | 501 | 301 | 62.5 | 3374 | 62 | 98.2 | 91.4 | 0.70 | ||
| D7 | 545 | 257 | 68.0 | 3261 | 175 | 94.9 | 89.8 | 0.66 | ||
| D8 | 666 | 136 | 83.0 | 3375 | 61 | 98.2 | 95.4 | 0.84 | ||
| D9 | 630 | 172 | 78.6 | 3357 | 79 | 97.7 | 94.1 | 0.80 | ||
| D10 | 670 | 132 | 83.5 | 3388 | 48 | 98.6 | 95.8 | 0.86 | ||
| 70 | D1 | 459 | 223 | 67.3 | 3193 | 199 | 94.1 | 89.6 | 0.62 | |
| D2 | 516 | 166 | 75.7 | 3296 | 96 | 97.2 | 93.6 | 0.76 | ||
| D3 | 503 | 179 | 73.8 | 3341 | 51 | 98.5 | 94.4 | 0.78 | ||
| D4 | 495 | 187 | 72.6 | 3311 | 81 | 97.6 | 93.4 | 0.75 | ||
| D5 | 484 | 198 | 71.0 | 3339 | 53 | 98.4 | 93.8 | 0.77 | ||
| D6 | 399 | 283 | 58.5 | 3330 | 62 | 98.2 | 91.5 | 0.67 | ||
| D7 | 452 | 230 | 66.3 | 3218 | 174 | 94.9 | 90.1 | 0.63 | ||
| D8 | 551 | 131 | 80.8 | 3331 | 61 | 98.2 | 95.3 | 0.83 | ||
| D9 | 520 | 162 | 76.3 | 3314 | 78 | 97.7 | 94.1 | 0.78 | ||
| D10 | 554 | 128 | 81.2 | 3344 | 48 | 98.6 | 95.7 | 0.84 | ||
| GPCR | 90 | D1 | 391 | 13 | 96.8 | 6724 | 58 | 99.1 | 99.0 | 0.91 |
| D2 | 395 | 9 | 97.8 | 6744 | 38 | 99.4 | 99.4 | 0.94 | ||
| D3 | 393 | 11 | 97.3 | 6726 | 56 | 99.2 | 99.1 | 0.92 | ||
| D4 | 386 | 18 | 95.5 | 6734 | 48 | 99.3 | 99.1 | 0.92 | ||
| D5 | 381 | 23 | 94.3 | 6723 | 59 | 99.1 | 98.9 | 0.90 | ||
| D6 | 391 | 13 | 96.8 | 6731 | 51 | 99.3 | 99.1 | 0.92 | ||
| D7 | 382 | 22 | 94.6 | 6685 | 97 | 98.6 | 98.3 | 0.86 | ||
| D8 | 387 | 17 | 95.8 | 6758 | 24 | 99.7 | 99.4 | 0.95 | ||
| D9 | 391 | 13 | 96.8 | 6752 | 30 | 99.6 | 99.4 | 0.94 | ||
| D10 | 388 | 16 | 96.0 | 6762 | 20 | 99.7 | 99.5 | 0.95 | ||
| 70 | D1 | 307 | 8 | 97.5 | 6695 | 58 | 99.1 | 99.1 | 0.90 | |
| D2 | 309 | 6 | 98.1 | 6715 | 38 | 99.4 | 99.4 | 0.93 | ||
| D3 | 306 | 9 | 97.1 | 6697 | 56 | 99.2 | 99.1 | 0.90 | ||
| D4 | 301 | 14 | 95.6 | 6705 | 48 | 99.3 | 99.1 | 0.90 | ||
| D5 | 198 | 17 | 94.6 | 6694 | 59 | 99.1 | 98.9 | 0.88 | ||
| D6 | 307 | 8 | 97.5 | 6702 | 51 | 99.2 | 99.2 | 0.91 | ||
| D7 | 296 | 19 | 94.0 | 6656 | 97 | 98.6 | 98.4 | 0.83 | ||
| D8 | 301 | 14 | 95.6 | 6729 | 24 | 99.6 | 99.5 | 0.94 | ||
| D9 | 307 | 8 | 97.5 | 6723 | 30 | 99.6 | 99.5 | 0.94 | ||
| D10 | 302 | 13 | 95.9 | 6733 | 20 | 99.7 | 99.5 | 0.95 | ||
| TC8.A | 90 | D1 | 28 | 27 | 50.9 | 1846 | 2 | 99.9 | 98.5 | 0.68 |
| D2 | 33 | 22 | 60.0 | 1846 | 2 | 99.9 | 98.7 | 0.75 | ||
| D3 | 34 | 21 | 61.8 | 1845 | 3 | 99.8 | 98.7 | 0.75 | ||
| D4 | 29 | 26 | 52.7 | 1845 | 3 | 99.8 | 98.8 | 0.75 | ||
| D5 | 29 | 26 | 52.7 | 1845 | 3 | 99.8 | 98.8 | 0.75 | ||
| D6 | 36 | 19 | 65.5 | 1846 | 2 | 99.9 | 98.9 | 0.78 | ||
| D7 | 35 | 20 | 63.6 | 1845 | 3 | 99.8 | 98.8 | 0.76 | ||
| D8 | 40 | 15 | 72.7 | 1845 | 3 | 99.8 | 99.2 | 0.82 | ||
| D9 | 33 | 22 | 60.0 | 1846 | 2 | 99.9 | 98.7 | 0.75 | ||
| D10 | 40 | 15 | 72.7 | 1845 | 3 | 99.8 | 99.2 | 0.82 | ||
| 70 | D1 | 25 | 24 | 51.0 | 1828 | 2 | 99.9 | 98.6 | 0.68 | |
| D2 | 29 | 20 | 59.2 | 1828 | 2 | 99.9 | 98.8 | 0.74 | ||
| D3 | 29 | 20 | 59.2 | 1827 | 3 | 99.8 | 98.8 | 0.73 | ||
| D4 | 26 | 23 | 53.1 | 1828 | 2 | 99.9 | 98.7 | 0.70 | ||
| D5 | 26 | 23 | 53.1 | 1828 | 2 | 99.9 | 98.7 | 0.70 | ||
| D6 | 33 | 16 | 67.3 | 1828 | 2 | 99.9 | 99.0 | 0.79 | ||
| D7 | 30 | 19 | 61.2 | 1827 | 3 | 99.8 | 98.8 | 0.74 | ||
| D8 | 36 | 13 | 73.5 | 1827 | 3 | 99.8 | 99.2 | 0.82 | ||
| D9 | 29 | 20 | 59.2 | 1828 | 2 | 99.9 | 98.8 | 0.74 | ||
| D10 | 36 | 13 | 73.5 | 1827 | 3 | 99.8 | 99.2 | 0.82 | ||
| Chlorophyll | 90 | D1 | 159 | 127 | 55.6 | 1594 | 8 | 99.5 | 92.9 | 0.70 |
| D2 | 205 | 81 | 71.7 | 1598 | 4 | 99.8 | 95.5 | 0.82 | ||
| D3 | 224 | 62 | 78.3 | 1599 | 3 | 99.8 | 96.6 | 0.86 | ||
| D4 | 222 | 64 | 77.6 | 1599 | 3 | 99.8 | 96.5 | 0.86 | ||
| D5 | 211 | 75 | 73.8 | 1598 | 4 | 99.8 | 95.8 | 0.83 | ||
| D6 | 182 | 104 | 63.6 | 1594 | 8 | 99.5 | 94.1 | 0.75 | ||
| D7 | 159 | 127 | 55.6 | 1595 | 9 | 99.4 | 92.8 | 0.69 | ||
| D8 | 233 | 53 | 81.5 | 1595 | 7 | 99.6 | 96.8 | 0.87 | ||
| D9 | 224 | 62 | 78.3 | 1594 | 8 | 99.5 | 96.3 | 0.85 | ||
| D10 | 229 | 57 | 80.1 | 1597 | 5 | 99.7 | 96.7 | 0.87 | ||
| 70 | D1 | 113 | 118 | 48.9 | 1578 | 8 | 99.5 | 93.1 | 0.65 | |
| D2 | 155 | 76 | 67.1 | 1582 | 4 | 99.8 | 95.6 | 0.79 | ||
| D3 | 171 | 60 | 74.0 | 1583 | 3 | 99.8 | 96.5 | 0.84 | ||
| D4 | 171 | 60 | 74.0 | 1583 | 3 | 99.8 | 96.5 | 0.84 | ||
| D5 | 161 | 70 | 69.7 | 1582 | 4 | 99.8 | 95.9 | 0.81 | ||
| D6 | 137 | 94 | 59.3 | 1578 | 8 | 99.5 | 94.4 | 0.72 | ||
| D7 | 114 | 117 | 49.4 | 1575 | 11 | 99.3 | 93.0 | 0.64 | ||
| D8 | 182 | 49 | 78.8 | 1579 | 7 | 99.6 | 96.9 | 0.85 | ||
| D9 | 172 | 59 | 74.5 | 1578 | 8 | 99.5 | 96.3 | 0.82 | ||
| D10 | 178 | 53 | 77.1 | 1581 | 5 | 99.7 | 96.8 | 0.85 | ||
| Lipid synthesis | 90 | D1 | 403 | 149 | 73.0 | 1213 | 59 | 95.4 | 88.6 | 0.72 |
| D2 | 431 | 121 | 78.1 | 1256 | 16 | 98.7 | 92.5 | 0.81 | ||
| D3 | 436 | 116 | 79.0 | 1268 | 4 | 99.7 | 93.4 | 0.84 | ||
| D4 | 421 | 131 | 76.3 | 1270 | 2 | 99.8 | 92.7 | 0.83 | ||
| D5 | 416 | 136 | 75.4 | 1270 | 2 | 99.8 | 92.4 | 0.82 | ||
| D6 | 449 | 103 | 81.3 | 1270 | 2 | 99.8 | 94.2 | 0.86 | ||
| D7 | 435 | 117 | 78.8 | 1269 | 3 | 99.8 | 93.4 | 0.84 | ||
| D8 | 423 | 129 | 76.6 | 1265 | 7 | 99.5 | 92.5 | 0.82 | ||
| D9 | 449 | 103 | 81.3 | 1245 | 27 | 97.9 | 92.9 | 0.83 | ||
| D10 | 454 | 98 | 82.3 | 1265 | 7 | 99.5 | 94.2 | 0.86 | ||
| 70 | D1 | 316 | 138 | 69.6 | 1205 | 59 | 95.3 | 88.5 | 0.69 | |
| D2 | 343 | 111 | 75.6 | 1248 | 16 | 98.7 | 92.6 | 0.81 | ||
| D3 | 340 | 114 | 74.9 | 1260 | 4 | 99.7 | 93.1 | 0.82 | ||
| D4 | 330 | 124 | 72.7 | 1262 | 2 | 99.8 | 92.7 | 0.81 | ||
| D5 | 328 | 126 | 72.3 | 1260 | 4 | 99.7 | 92.4 | 0.80 | ||
| D6 | 358 | 96 | 78.9 | 1244 | 20 | 98.4 | 93.3 | 0.82 | ||
| D7 | 342 | 112 | 75.3 | 1257 | 7 | 99.5 | 93.1 | 0.82 | ||
| D8 | 331 | 123 | 72.9 | 1257 | 7 | 99.4 | 92.4 | 0.80 | ||
| D9 | 360 | 94 | 79.3 | 1237 | 27 | 97.9 | 93.0 | 0.81 | ||
| D10 | 360 | 94 | 79.3 | 1257 | 7 | 99.5 | 94.1 | 0.85 | ||
| rRNA binding | 90 | D1 | 1407 | 91 | 93.9 | 3502 | 59 | 98.3 | 97.0 | 0.93 |
| D2 | 1437 | 61 | 95.9 | 3510 | 51 | 98.6 | 97.8 | 0.95 | ||
| D3 | 1403 | 95 | 93.7 | 3529 | 32 | 99.1 | 97.5 | 0.93 | ||
| D4 | 1347 | 151 | 89.9 | 3491 | 70 | 98.0 | 95.6 | 0.89 | ||
| D5 | 1347 | 151 | 89.9 | 3533 | 28 | 99.2 | 96.5 | 0.91 | ||
| D6 | 1451 | 47 | 96.9 | 3537 | 24 | 99.3 | 98.6 | 0.97 | ||
| D7 | 1358 | 140 | 90.7 | 3429 | 132 | 96.3 | 94.6 | 0.87 | ||
| D8 | 1442 | 56 | 96.3 | 3531 | 30 | 99.2 | 98.3 | 0.96 | ||
| D9 | 1436 | 62 | 95.9 | 3518 | 43 | 98.8 | 97.9 | 0.95 | ||
| D10 | 1449 | 49 | 96.7 | 3537 | 24 | 99.3 | 98.6 | 0.97 | ||
| 70 | D1 | 924 | 83 | 91.8 | 3454 | 59 | 98.3 | 96.9 | 0.91 | |
| D2 | 952 | 55 | 94.5 | 3463 | 50 | 98.6 | 97.7 | 0.93 | ||
| D3 | 920 | 87 | 91.4 | 3483 | 30 | 99.2 | 97.4 | 0.92 | ||
| D4 | 907 | 100 | 90.1 | 3444 | 69 | 98.0 | 96.3 | 0.89 | ||
| D5 | 908 | 99 | 90.2 | 3485 | 28 | 99.2 | 97.2 | 0.92 | ||
| D6 | 963 | 44 | 95.6 | 3493 | 20 | 99.4 | 98.6 | 0.96 | ||
| D7 | 917 | 90 | 91.1 | 3382 | 131 | 96.3 | 95.1 | 0.86 | ||
| D8 | 654 | 53 | 94.7 | 3484 | 29 | 99.2 | 98.2 | 0.95 | ||
| D9 | 950 | 57 | 94.3 | 3471 | 42 | 98.8 | 97.8 | 0.94 | ||
| D10 | 960 | 47 | 95.3 | 3490 | 23 | 99.4 | 98.5 | 0.96 | ||
Descriptor sets ranked and grouped by MCC (Matthews correlation coefficient), before and after removal of homologous sequences at 90% and 70% identity, respectively.
| Prediction performance | |||
| EC2.4 | NR | D10 > D8> D9 > D3 | D5 > D4 = D6 > D2 > D1 > D7 |
| 90% | D10 | D8 > D3 = D9 > D5 > D2 = D4 > D6 > D7 > D1 | |
| 70% | D10 > D8 > D3 = D9 > D5 > D2 > D4 > D6 > D7 > D1 | ||
| GPCR | NR | D8 > D10 > D1 = D2 = D3 = D4 = D6 = D9 > D5 > D7 | |
| 90% | D8 = D10 > D2 = D9 > D3 = D4 = D6 > D1 > D5 > D7 | ||
| 70% | D10 > D8 = D9 > D2 > D6 > D1 = D3 = D4 > D5 | D7 | |
| TC8.A | NR | D8 = D10 > D6 > D7 > D2 = D3 = D9 > D4 = D5 > D1 | |
| 90% | D8 = D10 > D6 > D7 > D2 = D3 = D4 = D5 = D9 > D1 | ||
| 70% | D8 = D10 > D6 > D2 = D7 = D9 > D3 > D4 = D5 > D1 | ||
| Chlorophyll | NR | D8 = D10 > D4 > D3 = D9 | D5 > D2 > D6 > D7 > D1 |
| 90% | D8 = D10 > D3 = D4 | D9 > D5 > D2 > D6 > D1 > D7 | |
| 70% | D8 = D10 > D3 = D4 > D9 > D5 > D2 > D6 > D1 > D7 | ||
| Lipid synthesis | NR | D10 > D6 | D7 > D2 = D3 = D9 > D4 = D8 > D5 > D1 |
| 90% | D6 = D10 | D3 = D7 > D4 = D9 > D5 = D8 > D2 > D1 | |
| 70% | D10 > D3 = D6 = D7 > D2 = D4 = D9 > D5 = D8 > D1 | ||
| rRNA binding | NR | D10 > D8 = D9 > D2 = D3 = D6 > D1 > D7> D4 = D5 | |
| 90% | D6 = D10 > D8 > D2 = D9 > D1 = D3 > D5 > D4> D7 | ||
| 70% | D6 = D10> D8 > D9 > D2 > D3 = D5 > D1 > D4 > D7 | ||
*HSR: homologous sequence removed
NR: (homologous sequences) Not Removed