| Literature DB >> 24250217 |
Chiou-Yi Hor1, Chang-Biau Yang, Zih-Jie Yang, Chiou-Ting Tseng.
Abstract
Essential proteins include the minimum required set of proteins to support cell life. Identifying essential proteins is important for understanding the cellular processes of an organism. However, identifying essential proteins experimentally is extremely time-consuming and labor-intensive. Alternative methods must be developed to examine essential proteins. There were two goals in this study: identifying the important features and building learning machines for discriminating essential proteins. Data for Saccharomyces cerevisiae and Escherichia coli were used. We first collected information from a variety of sources. We next proposed a modified backward feature selection method and build support vector machines (SVM) predictors based on the selected features. To evaluate the performance, we conducted cross-validations for the originally imbalanced data set and the down-sampling balanced data set. The statistical tests were applied on the performance associated with obtained feature subsets to confirm their significance. In the first data set, our best values of F-measure and Matthews correlation coefficient (MCC) were 0.549 and 0.495 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.770 and 0.545, respectively. In the second data set, our best values of F-measure and MCC were 0.421 and 0.407 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.718 and 0.448, respectively. The experimental results show that our selected features are compact and the performance improved. Prediction can also be conducted by users at the following internet address: http://bio2.cse.nsysu.edu.tw/esspredict.aspx.Entities:
Keywords: essential protein; feature selection; protein-protein interaction; statistical test; support vector machine
Year: 2013 PMID: 24250217 PMCID: PMC3795531 DOI: 10.4137/EBO.S11975
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Protein features.
| ID | Property name | Type | Size | Sub-names | ||
|---|---|---|---|---|---|---|
| 1 | Amino acid occurrence | S | 20 | • | • | |
| 2 | Average amino acid PSSM | S | 20 | • | • | |
| 3 | Average cysteine position | S | 1 | • | • | |
| 4 | Average distance of every two cysteines | S | 1 | • | • | |
| 5 | Average hydrophobic | S | 1 | • | • | |
| 6 | Average hydrophobicity around cysteine | S | 4 | 1 … 4 | • | • |
| 7 | Cysteine count | S | 1 | • | • | |
| 8 | Cysteine location | S | 5 | 1 … 5 | • | • |
| 9 | Cysteine odd-even index | S | 1 | • | • | |
| 10 | Protein length | S | 1 | • | • | |
| 11 | Cell cycle | P | 1 | • | ||
| 12 | Cytoplasm | P | 1 | • | ||
| 13 | Endoplasmic reticulum | P | 1 | • | ||
| 14 | Metabolic process | P | 1 | • | ||
| 15 | Mitochondrion | P | 1 | • | ||
| 16 | Nucleus | P | 1 | • | ||
| 17 | Other process | P | 1 | • | ||
| 18 | Other localization | P | 1 | • | ||
| 19 | Signal transduction | P | 1 | • | ||
| 20 | Transport | P | 1 | • | ||
| 21 | Transcription | P | 1 | • | ||
| 22 | Betweenness centrality related to all interactions | T | 1 | • | • | |
| 23 | Betweenness centrality related to metabolic interactions | T | 1 | • | ||
| 24 | Betweenness centrality related to physical interactions | T | 1 | • | • | |
| 25 | Betweenness centrality transcriptional regulation interactions | T | 1 | • | ||
| 26 | Bit string of double screening scheme [this paper] | T | 1 | • | • | |
| 27 | Bottleneck | T | 1 | • | • | |
| 28 | Clique level | T | 1 | • | • | |
| 29 | Closeness centrality | T | 1 | • | • | |
| 30 | Clustering coefficient | T | 1 | • | • | |
| 31 | Degree related to all interactions | T | 1 | • | • | |
| 32 | Degree related to physical interactions | T | 1 | • | • | |
| 33 | Density of maximum neighborhood component | T | 1 | • | • | |
| 34 | Edge percolated component | T | 1 | • | • | |
| 35 | Indegree related to metabolic interaction | T | 1 | • | ||
| 36 | Indegree related to transcriptional regulation | T | 1 | • | ||
| 37 | Maximum neighborhood component | T | 1 | • | • | |
| 38 | Neighbors’ intra-degree | T | 1 | • | • | |
| 39 | Outdegree related to metabolic interaction | T | 1 | • | ||
| 40 | Outdegree related to transcriptional regulation interaction | T | 1 | • | ||
| 41 | Betweenness centrality related to integrated functional interaction | T | 1 | • | ||
| 42 | Betweenness centrality related to integrated PI and GC network | T | 1 | • | ||
| 43 | Degree related to integrated functional interaction | T | 1 | • | ||
| 44 | Degree related to integrated PI and GC network | T | 1 | • | ||
| 45 | Common function degree | O | 1 | • | ||
| 46 | Essential index | O | 1 | • | ||
| 47 | Identicalness | O | 1 | • | ||
| 48 | Open reading frame length | O | 1 | • | • | |
| 49 | Phyletic retention | O | 1 | • | • | |
| 50 | Number of paralagous genes | O | 1 | • | ||
| 51 | Codon Adaptation Index (CAI) | O | 1 | • | ||
| 52 | Codon Bias Index (CBI) | O | 1 | • | ||
| 53 | Frequency of optimal codons | O | 1 | • | ||
| 54 | Aromaticity score | O | 1 | • | ||
| 55 | Leading strand of the circular chromosome | O | 1 | • | ||
| Total | 100 | 90 | 80 |
Notes:S. cere and E. coli mean Saccharomyces cerevisiae and Escherichia coli datasets, respectively. For topological features, if not particularly mentioned, they are related to physical interactions. Due to coverage or availability issue, we adopt different features for S. cere and E. coli datasets. For example, interactions in E. coli data set contain integrated functional, PI, and GC network information while those in S. cere include metabolic, transcriptional regulation and PI network information.
Abbreviations: GC, genomic context; PI, physical interactions.
Ranking by two different methods, where smaller numbers indicate higher ranks.
| Protein name | Ranking method | |
|---|---|---|
|
| ||
| A (DMNC) | B (MNC) | |
| W | 1 | 4 |
| X | 2 | 2 |
| Y | 3 | 1 |
| Z | 4 | 3 |
Bit strings by the double screening method.
| Protein name | Sum of bit string | Sum | |||
|---|---|---|---|---|---|
|
| |||||
| 1st | 2nd | ||||
| W | 0 | 0 | 0 | 0 | 0 |
| X | 1 | 1 | 2 | 2 | 4 |
| Y | 0 | 1 | 1 | 3 | 4 |
| Z | 0 | 0 | 0 | 1 | 1 |
Figure 1Flowchart for the construction of SVM models and performance comparison.
Selected features for S. cerevisiae data set.
| Feature | N5 | N6 | N7 | N8 | N9 | N10 | N11 | N12 | N13 | N14 | N15 | N16 | N17 | N18 | m31 | C32 | TOT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PR (phyletic retention) | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | 16 |
| 2 | EI (essentiality index) | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | 16 |
| 3 | Cytoplasm | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | 15 | |
| 4 | Nucleus | • | • | • | • | • | • | • | • | • | • | • | • | • | • | • | 15 | |
| 5 | Occurrence of A.A. I | • | • | • | • | • | • | • | • | • | • | • | • | • | 13 | |||
| 6 | Bit string of DSS | • | • | • | • | • | • | • | • | • | • | • | • | 12 | ||||
| 7 | Occurrence of A.A. W | • | • | • | • | • | • | • | • | • | • | • | • | 12 | ||||
| 8 | Endoplasmic reticulum | • | • | • | • | • | • | • | • | • | • | • | 11 | |||||
| 9 | Other process | • | • | • | • | • | • | • | 7 | |||||||||
| 10 | Occurrence of A.A. S | • | • | • | • | • | • | • | 7 | |||||||||
| 11 | Occurrence of A.A. G | • | • | • | • | • | • | 6 | ||||||||||
| 12 | KLV (clique level) | • | • | • | • | • | • | 6 | ||||||||||
| 13 | Cell cycle | • | • | • | • | • | 5 | |||||||||||
| 14 | Average hydrophobic | • | • | • | • | • | 5 | |||||||||||
| 15 | Average PSSM of A.A. R | • | • | • | • | • | 5 | |||||||||||
| 16 | B.C. related to PI | • | • | • | • | 4 | ||||||||||||
| 17 | Occurrence of A.A. E | • | • | • | • | 4 | ||||||||||||
| 18 | Average PSSM of A.A. P | • | • | • | • | 4 | ||||||||||||
| 19 | ID related to T.R. | • | • | • | 3 | |||||||||||||
| 20 | B.C. T.R. interactions | • | • | • | 3 | |||||||||||||
| 21 | Other localization | • | • | • | 3 | |||||||||||||
| 22 | DMNC | • | • | • | 3 | |||||||||||||
| 23 | Average HYD around C-2 | • | • | • | 3 | |||||||||||||
| 24 | Signal transduction | • | • | 2 | ||||||||||||||
| 25 | Edge percolated component | • | • | 2 | ||||||||||||||
| 26 | Occurrence of A.A. P | • | • | 2 | ||||||||||||||
| 27 | Occurrence of A.A. T | • | • | 2 | ||||||||||||||
| 28 | Occurrence of A.A. Y | • | • | 2 | ||||||||||||||
| 29 | Average PSSM of A.A. Q | • | • | 2 | ||||||||||||||
| 30 | Average PSSM of A.A. E | • | • | 2 | ||||||||||||||
| 31 | CLC (clustering coefficient) | • | • | 2 | ||||||||||||||
| 32 | FunK (common function degree) | • | • | 2 | ||||||||||||||
| 33 | OD related to T.R. interaction | • | 1 | |||||||||||||||
| 34 | OD related to M.I. | • | 1 | |||||||||||||||
| 35 | ID related to M.I. | • | 1 | |||||||||||||||
| 36 | B.C. related to M.I. | • | 1 | |||||||||||||||
| 37 | Degree related to PI | • | 1 | |||||||||||||||
| 38 | Metabolic process | • | 1 | |||||||||||||||
| 39 | Bottleneck | • | 1 | |||||||||||||||
| 40 | MNC | • | 1 | |||||||||||||||
| 41 | Occurrence of A.A. A | • | 1 | |||||||||||||||
| 42 | Occurrence of A.A. C | • | 1 | |||||||||||||||
| 43 | Occurrence of A.A. D | • | 1 | |||||||||||||||
| 44 | Occurrence of A.A. H | • | 1 | |||||||||||||||
| 45 | Occurrence of A.A. K | • | 1 | |||||||||||||||
| 46 | Occurrence of A.A. M | • | 1 | |||||||||||||||
| 47 | Average C position | • | 1 | |||||||||||||||
| 48 | Protein length | • | 1 | |||||||||||||||
| 49 | Cysteine count | 1 | ||||||||||||||||
| 50 | Cysteine odd-even index | • | • | 1 | ||||||||||||||
| 51 | Average HYD around C-1 | • | 1 | |||||||||||||||
| 52 | Cysteine location-1 | • | 1 | |||||||||||||||
| 53 | Average PSSM of A.A. A | • | 1 | |||||||||||||||
| 54 | Average PSSM of A.A. D | • | 1 | |||||||||||||||
| 55 | Average PSSM of A.A. S | • | 1 | |||||||||||||||
| 56 | Average PSSM of A.A. W | • | 1 | |||||||||||||||
| 57 | Average PSSM of A.A. Y | • | 1 | |||||||||||||||
| 58 | ORFL (ORF length) | • | 1 | |||||||||||||||
| 59 | CC (closeness centrality) | • | 1 | |||||||||||||||
| 60 | BC (B.C.) | • | 1 |
Abbreviations: DSS, double screening scheme; A.A., amino acid; B.C., betweenness centrality; T.R., transcriptional regulation; HYD, hydrophobicity; PI, physical interaction; A … Y, amino acid abbreviation; M.I., metabolic interaction; OD, outdegree; ID, indegree; m31, mRMR31; C32, CMIM32; FunK, Common function degree; TOT, total.
Selected features for E. coli data set.
| Feature | N4 | N5 | N6 | N7 | N8 | N9 | N10 | N11 | N12 | N13 | C9 | m13 | TOT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PR (phyletic retention) | • | • | • | • | • | • | • | • | • | • | • | • | 12 |
| 2 | Open reading frame length | • | • | • | • | • | • | • | • | 8 | ||||
| 3 | Average PSSM of A.A. C | • | • | • | • | • | • | 6 | ||||||
| 4 | Degree related to F.I. | • | • | • | • | • | • | 6 | ||||||
| 5 | Degree related to A.I. | • | • | • | • | • | 5 | |||||||
| 6 | Degree related to PI | • | • | • | • | • | 5 | |||||||
| 7 | Average PSSM of A.A. A | • | • | • | • | 4 | ||||||||
| 8 | Average PSSM of A.A. R | • | • | • | • | 4 | ||||||||
| 9 | Average hydrophobic | • | • | • | • | 4 | ||||||||
| 10 | Bit string of DSS for PI | • | • | • | • | 4 | ||||||||
| 11 | Paralog count | • | • | • | • | 4 | ||||||||
| 12 | Occurrence of A.A. M | • | • | • | 3 | |||||||||
| 13 | Occurrence of A.A. W | • | • | • | 3 | |||||||||
| 14 | Occurrence of A.A. E | • | • | 2 | ||||||||||
| 15 | Occurrence of A.A. F | • | • | 2 | ||||||||||
| 16 | Occurrence of A.A. G | • | • | 2 | ||||||||||
| 17 | Occurrence of A.A. I | • | • | 2 | ||||||||||
| 18 | Average PSSM of A.A. Y | • | • | 2 | ||||||||||
| 19 | Cysteine location-4 | • | • | 2 | ||||||||||
| 20 | KLV (clique level) for PI | • | • | 2 | ||||||||||
| 21 | Degree related to PI and GC | • | • | 2 | ||||||||||
| 22 | Strand bias | • | • | 2 | ||||||||||
| 23 | Occurrence of A.A. A | • | 1 | |||||||||||
| 24 | Occurrence of A.A. C | • | 1 | |||||||||||
| 25 | Occurrence of A.A. H | • | 1 | |||||||||||
| 26 | Occurrence of A.A. P | • | 1 | |||||||||||
| 27 | Occurrence of A.A. S | • | 1 | |||||||||||
| 28 | Average PSSM of A.A. N | • | 1 | |||||||||||
| 29 | Average PSSM of A.A. G | • | 1 | |||||||||||
| 30 | Average PSSM of A.A. K | • | 1 | |||||||||||
| 31 | Average PSSM of A.A. F | • | 1 | |||||||||||
| 32 | Average PSSM of A.A. T | • | 1 | |||||||||||
| 33 | Average PSSM of A.A. V | • | 1 | |||||||||||
| 34 | Average distance of every two Cs | • | 1 | |||||||||||
| 35 | Average HYD around C-2 | • | 1 | |||||||||||
| 36 | Cysteine location-1 | • | 1 | |||||||||||
| 37 | Cysteine location-5 | • | 1 | |||||||||||
| 38 | Cysteine odd-even index | • | 1 | |||||||||||
| 39 | Protein length | • | 1 | |||||||||||
| 40 | Bottleneck for PI | • | 1 | |||||||||||
| 41 | CC (closeness centrality) for PI | • | 1 | |||||||||||
| 42 | MNC for PI | • | 1 | |||||||||||
| 43 | B.C. related to all F.I. | • | 1 |
Abbreviations: C9, CMIM09; m13, mRMR13; TOT, total; DSS, double screening scheme; F.I., integrated functional interaction; A.I., all interactions. PI, physical interaction; HYD, hydrophobicity; A.A., amino acid; A … Y, amino acid abbreviation.
Figure 2The P-value of the normality test in S. cerevisiae data set.
Figure 3The P-value of the normality test in S. cerevisiae data set.
Performance comparison for the imbalanced S. cerevisiae data set.
| AUC | Precision | Recall | F-measure | MCC | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM32 | 0.825 | * | (+) | (+) | 0.744 | * | (+) | 0.369 | * | (+) | 0.493 | * | (+) | 0.450 | * | (+) | |||||||||
| mRMR31 | 0.821 | (−) | * | (+) | 0.738 | * | 0.372 | * | (+) | 0.495 | * | (+) | 0.449 | * | (+) | ||||||||||
| Hwang(10) | 0.775 | (−) | (−) | * | 0.743 | (−) | * | 0.343 | (−) | (−) | * | 0.469 | (−) | (−) | * | 0.432 | (−) | (−) | * | ||||||
| Acencio(23) | 0.707 | (−) | (−) | (−) | 0.675 | (−) | (−) | (−) | 0.121 | (−) | (−) | (−) | 0.204 | (−) | (−) | (−) | 0.228 | (−) | (−) | (−) | |||||
| N4 | 0.744 | (−) | (−) | (−) | 0.327 | (−) | (−) | (−) | 0.461 | (−) | (−) | (−) | 0.439 | (−) | (−) | ||||||||||
| N5 | 0.727 | (−) | (−) | (−) | 0.741 | (−) | (−) | 0.387 | (−) | (−) | (+) | 0.509 | (−) | (−) | (+) | 0.461 | (−) | (−) | (+) | ||||||
| N6 | 0.730 | (−) | (−) | (−) | 0.752 | (−) | 0.395 | (−) | (−) | (+) | 0.518 | (−) | (−) | 0.472 | (−) | (−) | |||||||||
| N7 | 0.761 | (−) | (−) | (+) | 0.767 | (−) | 0.386 | (−) | (−) | (+) | 0.513 | (−) | (−) | (+) | 0.473 | (−) | (−) | ||||||||
| N8 | 0.772 | (−) | (−) | 0.755 | 0.371 | (−) | (−) | (−) | 0.498 | (−) | (−) | (+) | (−) | 0.457 | (−) | (−) | (−) | ||||||||
| N9 | 0.782 | (−) | (−) | (+) | 0.749 | 0.382 | (−) | (+) | (+) | 0.506 | (−) | (+) | 0.462 | (−) | (+) | ||||||||||
| N10 | 0.781 | (−) | (−) | (+) | 0.751 | 0.399 | (+) | 0.521 | (+) | 0.474 | (+) | ||||||||||||||
| N11 | 0.786 | (−) | (−) | (+) | 0.752 | 0.402 | (+) | 0.524 | (+) | 0.476 | (+) | ||||||||||||||
| N12 | 0.798 | (−) | (−) | (+) | 0.759 | 0.409 | (+) | 0.532 | (+) | 0.485 | (+) | ||||||||||||||
| N13 | 0.789 | (−) | (−) | (+) | 0.748 | (+) | (+) | (+) | (+) | ||||||||||||||||
| N14 | 0.802 | (−) | (+) | 0.749 | 0.397 | (+) | (−) | 0.519 | (+) | (−) | 0.471 | (+) | (−) | ||||||||||||
| N15 | 0.801 | (−) | (+) | 0.763 | 0.406 | (+) | 0.530 | (+) | 0.485 | (+) | (+) | ||||||||||||||
| N16 | 0.814 | (−) | (+) | (+) | 0.762 | 0.401 | (+) | 0.525 | (+) | 0.480 | (+) | ||||||||||||||
| N17 | 0.814 | (−) | (+) | 0.761 | 0.407 | (+) | 0.530 | (+) | 0.484 | (+) | |||||||||||||||
| N18 | 0.811 | (−) | (+) | 0.751 | 0.411 | (+) | 0.531 | (+) | 0.482 | (+) | |||||||||||||||
| N90 | (+) | (+) | (+) | (+) | 0.738 | (+) | (+) | 0.355 | (+) | (+) | (−) | 0.479 | (+) | (+) | (+) | (−) | 0.438 | (+) | (+) | (+) | (−) | ||||
Note: With the polynomial kernel function, the values of precision, recall and MCC are reported as 0.77, 0.23, and 0.36, respectively, in the original paper of Hwang et al.7
Performance comparison for the balanced S. cerevisiae data set.
| AUC | Precision | Recall | F-measure | MCC | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM32 | 0.842 | * | (+) | 0.772 | * | * | (+) | 0.769 | * | (+) | 0.540 | * | (+) | ||||||||||||
| mRMR31 | 0.836 | * | (+) | 0.765 | * | 0.741 | * | (+) | 0.752 | * | (+) | 0.513 | * | (+) | |||||||||||
| Hwang(10) | 0.822 | (−) | (−) | * | 0.778 | * | 0.720 | (−) | (−) | * | 0.748 | (−) | (−) | * | 0.516 | (−) | (−) | * | |||||||
| Acencio(23) | 0.768 | (−) | (−) | (−) | 0.696 | (−) | (−) | (−) | 0.734 | (−) | (−) | 0.714 | (−) | (−) | (−) | 0.414 | (−) | (−) | (−) | ||||||
| N4 | 0.811 | (−) | (−) | (−) | (+) | 0.777 | (−) | (+) | 0.716 | (−) | (−) | 0.745 | (−) | (−) | (+) | 0.512 | (−) | (−) | (+) | ||||||
| N5 | 0.824 | (−) | (−) | (+) | 0.778 | (−) | 0.735 | (−) | (−) | (+) | 0.756 | (−) | (−) | 0.527 | (−) | (−) | |||||||||
| N6 | 0.827 | (−) | (−) | (+) | 0.778 | (−) | 0.739 | (−) | (−) | 0.758 | (−) | (−) | 0.530 | (−) | (−) | ||||||||||
| N7 | 0.831 | (−) | (−) | 0.779 | 0.733 | (−) | (−) | 0.755 | (−) | (−) | 0.526 | (−) | (−) | ||||||||||||
| N8 | 0.826 | (−) | (−) | 0.786 | 0.721 | (−) | (−) | 0.752 | (−) | (−) | 0.527 | (−) | (−) | ||||||||||||
| N9 | 0.833 | (−) | (−) | (+) | 0.735 | (−) | (−) | (+) | 0.762 | (−) | (−) | 0.541 | (−) | (−) | |||||||||||
| N10 | 0.834 | (−) | (−) | 0.789 | 0.736 | (−) | (−) | 0.761 | (−) | (−) | 0.540 | (−) | (−) | ||||||||||||
| N11 | 0.831 | (−) | (−) | 0.784 | 0.737 | (−) | (−) | 0.760 | (−) | (−) | 0.535 | (−) | (−) | ||||||||||||
| N12 | 0.829 | (−) | (−) | 0.779 | 0.732 | 0.755 | (+) | 0.526 | (+) | ||||||||||||||||
| N13 | 0.834 | (−) | (−) | 0.788 | 0.730 | (−) | (−) | 0.758 | (−) | (−) | 0.535 | (−) | (−) | ||||||||||||
| N14 | 0.836 | (−) | (−) | (+) | 0.777 | 0.743 | (−) | (−) | (+) | 0.759 | (−) | (−) | (+) | 0.530 | (−) | (−) | |||||||||
| N15 | 0.843 | (−) | (−) | (+) | 0.784 | 0.748 | (−) | (−) | (+) | 0.766 | (−) | (+) | 0.542 | (−) | (+) | ||||||||||
| N16 | 0.842 | (+) | 0.777 | 0.756 | (+) | 0.767 | (+) | 0.540 | (+) | ||||||||||||||||
| N17 | (+) | 0.778 | 0.763 | (+) | (+) | (+) | |||||||||||||||||||
| N18 | 0.840 | (−) | (−) | (+) | (−) | 0.779 | (−) | 0.740 | (−) | (−) | (+) | (−) | 0.759 | (−) | (−) | 0.531 | (−) | (−) | |||||||
| N90 | 0.839 | (+) | (+) | 0.760 | 0.753 | (+) | 0.757 | (+) | 0.516 | (+) | |||||||||||||||
Note: In the original paper of Hwang et al7 the values of precision, recall, F-measure and MCC are reported as 0.763, 0.713, 0.737, and 0.492, respectively, with the polynomial kernel function.
Performance comparison for imbalanced E. coli data set.
| AUC | Precision | Recall | F1 | MCC | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM09 | 0.701 | * | (−) | 0.720 | * | 0.271 | * | (−) | 0.394 | * | (−) | 0.382 | * | (−) | ||||||||||
| mRMR13 | 0.715 | * | (−) | 0.713 | * | 0.250 | * | (−) | 0.370 | * | (−) | 0.360 | * | (−) | ||||||||||
| Gustafson(29) | 0.711 | (+) | (+) | * | 0.720 | * | 0.290 | (+) | (+) | * | 0.420 | (+) | (+) | * | 0.413 | (+) | (+) | * | ||||||
| N4 | 0.691 | (−) | (−) | (−) | 0.725 | 0.280 | (+) | (−) | (−) | 0.404 | (+) | (−) | 0.391 | (−) | ||||||||||
| N5 | 0.690 | (−) | (−) | (−) | 0.737 | (+) | (+) | (+) | (+) | (+) | (−) | (+) | (+) | (−) | (+) | |||||||||
| N6 | 0.701 | (−) | 0.287 | (+) | (+) | 0.414 | (+) | (+) | 0.403 | (+) | (+) | |||||||||||||
| N7 | 0.714 | (−) | 0.735 | 0.275 | (+) | (+) | (−) | 0.400 | (+) | (+) | (−) | 0.392 | (+) | |||||||||||
| N8 | 0.705 | (−) | 0.742 | 0.288 | (+) | (+) | (−) | (+) | 0.415 | (+) | (+) | (−) | (+) | 0.405 | (+) | (−) | ||||||||
| N9 | 0.707 | (−) | 0.726 | (−) | 0.293 | (+) | (+) | 0.417 | (+) | (+) | 0.401 | (+) | (+) | |||||||||||
| N10 | 0.711 | (−) | 0.724 | 0.294 | (+) | (+) | 0.418 | (+) | (+) | 0.401 | (+) | (+) | ||||||||||||
| N11 | 0.714 | (−) | 0.732 | 0.278 | (+) | (+) | (−) | 0.403 | (+) | (+) | 0.393 | (+) | (+) | |||||||||||
| N12 | 0.712 | (+) | 0.725 | 0.292 | (+) | (+) | 0.416 | (+) | (+) | 0.400 | (+) | (+) | ||||||||||||
| N13 | 0.714 | (+) | 0.733 | 0.287 | (+) | (+) | 0.413 | (+) | (+) | 0.400 | (+) | (+) | ||||||||||||
| N80 | (+) | (+) | (+) | 0.677 | (+) | (+) | (+) | (−) | 0.237 | (+) | (+) | (−) | 0.352 | (+) | (+) | (−) | 0.339 | (+) | (+) | (+) | (−) | |||
Performance comparison for balanced E. coli data set.
| AUC | Precision | Recall | F1 | MCC | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM09 | 0.767 | * | (+) | 0.720 | * | 0.700 | * | (+) | 0.710 | * | (+) | 0.421 | * | |||||||||
| mRMR13 | 0.762 | (−) | * | (−) | 0.728 | * | 0.654 | (−) | * | (−) | 0.689 | (−) | * | (−) | 0.396 | * | (−) | |||||
| Gustafson(29) | 0.777 | (+) | * | 0.722 | * | (+) | * | (+) | * | 0.440 | (+) | * | ||||||||||
| N4 | 0.780 | (+) | 0.733 | 0.701 | (+) | 0.717 | (+) | 0.446 | (+) | |||||||||||||
| N5 | 0.779 | (+) | 0.730 | 0.706 | (+) | 0.718 | (+) | 0.445 | (+) | |||||||||||||
| N6 | 0.762 | (−) | (−) | 0.735 | 0.663 | (−) | (−) | 0.696 | 0.425 | |||||||||||||
| N7 | (+) | (+) | 0.696 | (+) | 0.716 | (+) | (+) | (+) | ||||||||||||||
| N8 | 0.781 | (+) | 0.723 | 0.711 | (+) | 0.717 | (+) | 0.439 | (+) | |||||||||||||
| N9 | 0.782 | (+) | 0.715 | 0.703 | (+) | 0.709 | (+) | 0.423 | ||||||||||||||
| N10 | 0.781 | (+) | 0.725 | 0.702 | (+) | 0.713 | (+) | 0.436 | (+) | |||||||||||||
| N11 | 0.777 | (+) | 0.719 | 0.700 | (+) | 0.709 | (+) | 0.426 | ||||||||||||||
| N12 | 0.776 | (+) | 0.715 | 0.695 | (+) | 0.705 | (+) | 0.418 | ||||||||||||||
| N13 | 0.776 | (+) | 0.731 | 0.695 | (+) | 0.712 | (+) | 0.439 | (+) | |||||||||||||
| N80 | 0.769 | 0.711 | 0.715 | (+) | (+) | 0.713 | (+) | 0.424 | ||||||||||||||
Figure 4The average ROC curves and AUCs for the imbalanced S. cerevisiae data set.
Figure 5The average ROC curves and AUCs for the balanced S. cerevisiae data set.
Figure 6The average ROC curves and AUCs for the imbalanced E. coli data set.
Figure 7The average ROC curves and AUCs for the balanced E. coli data set.
Percentage of essential proteins in the imbalanced S. cerevisiae data.
| Top 5% | Top 10% | Top 15% | Top 20% | Top 25% | Top 30% | Top 50% | Top 75% | Top 100% | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM32 | 0.939* | – | – | 0.918* | 0.910* | 0.892* | 0.839* | 0.743* | 0.645* | 0.582* | |||||||||||||||||
| mRMR31 | 0.955 | * | – | 0.905– | * | – | 0.884–* | 0.862–* | 0.834–* | – | 0.820–* | 0.740–* | 0.641–* | 0.572–* | |||||||||||||
| Hwang(10) | 0.959 | * | 0.918 | * | 0.871– | –* | 0.853– | –* | 0.843– | * | 0.816– | –* | 0.720– | –* | 0.637– | –* | 0.563– | –* | |||||||||
| Acencio(23) | 0.800– | – | – | 0.741– | – | – | 0.693– | – | – | 0.661– | – | – | 0.646– | – | – | 0.625– | – | – | 0.578– | – | – | 0.519– | – | – | 0.457– | – | – |
| N4 | 0.930 | 0.905– | 0.877– | 0.865– | 0.850 | 0.727– | – | 0.632– | – | – | 0.559– | – | – | ||||||||||||||
| N5 | 0.843– | – | – | 0.861– | – | – | 0.859– | – | – | 0.852– | – | – | 0.841– | – | 0.827– | 0.751 | 0.641– | 0.530– | – | – | |||||||
| N6 | 0.908– | – | – | 0.894– | – | – | 0.875– | – | 0.857– | – | 0.850– | 0.834– | 0.763 | 0.635– | – | – | 0.526– | – | – | ||||||||
| N7 | 0.861– | – | – | 0.892– | – | – | 0.897– | 0.885– | 0.854– | 0.832– | 0.770 | 0.645– | 0.570– | – | |||||||||||||
| N8 | 0.892– | – | – | 0.904– | – | – | 0.895– | 0.877– | 0.868– | 0.850 | 0.751 | 0.657 | 0.574– | ||||||||||||||
| N9 | 0.880– | – | – | 0.911– | – | 0.895– | 0.875– | 0.860– | 0.832– | 0.753 | 0.665 | 0.585 | |||||||||||||||
| N10 | 0.882– | – | – | 0.896– | – | – | 0.893– | 0.882– | 0.858– | 0.846 | 0.762 | 0.665 | 0.581– | ||||||||||||||
| N11 | 0.900– | – | – | 0.900– | – | – | 0.888– | 0.875– | 0.861– | 0.769 | 0.667 | 0.580– | |||||||||||||||
| N12 | 0.941 | – | – | 0.924 | 0.899– | 0.872– | 0.866– | 0.854 | 0.776 | 0.664 | 0.588 | ||||||||||||||||
| N13 | 0.941 | – | – | 0.910– | – | 0.886– | 0.870– | 0.853– | 0.840 | 0.672 | 0.578– | ||||||||||||||||
| N14 | 0.949 | – | – | 0.867– | 0.845 | 0.759 | 0.667 | 0.587 | |||||||||||||||||||
| N15 | 0.906– | – | – | 0.894– | – | – | 0.897– | 0.884– | 0.866– | 0.851 | 0.776 | 0.672 | 0.584 | ||||||||||||||
| N16 | 0.933– | – | – | 0.901– | – | – | 0.895– | 0.886– | 0.864– | 0.851 | 0.771 | 0.677 | |||||||||||||||
| N17 | 0.943 | – | – | 0.903– | – | – | 0.879– | – | 0.871– | 0.866– | 0.856 | 0.777 | 0.679 | 0.595 | |||||||||||||
| N18 | 0.937– | – | – | 0.892– | – | – | 0.880– | – | 0.870– | 0.864– | 0.854 | 0.778 | 0.595 | ||||||||||||||
| N90 | 0.939 | – | – | 0.911– | – | 0.884– | 0.869– | 0.856– | 0.835– | 0.728– | – | 0.639– | – | 0.572– | |||||||||||||
Figure 8The average top percentage curves for the imbalanced S. cerevisiae data set.
Percentage of essential proteins in the imbalanced E. coli experiment.
| Top 5% | Top 10% | Top 15% | Top 20% | Top 25% | Top 30% | Top 50% | Top 75% | Top 100% | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CMIM09 | 0.745* | 0.775* | 0.730* | 0.730* | 0.737* | 0.727* | 0.644* | 0.542* | 0.440* | – | ||||||||||||
| mRMR13 | 0.719–* | 0.725–* | 0.714–* | 0.701–* | – | 0.690–* | – | 0.679–* | – | 0.614–* | – | 0.531–* | 0.446* | |||||||||
| Gustafson(29) | 0.706– | –* | 0.692– | –* | 0.705– | –* | 0.707– | * | 0.695– | * | 0.685– | * | 0.624– | * | 0.522– | –* | 0.436– | – | * | |||
| N4 | 0.610– | – | – | 0.689– | – | – | 0.765 | 0.760 | 0.752 | 0.649 | 0.534– | 0.449 | ||||||||||
| N5 | 0.655– | – | – | 0.705– | – | 0.747 | 0.747 | 0.745 | 0.743 | 0.653 | 0.535– | 0.443 | – | |||||||||
| N6 | 0.719– | 0.723– | – | 0.717– | 0.736 | 0.744 | 0.748 | 0.658 | 0.525– | – | 0.435– | – | – | |||||||||
| N7 | 0.568– | – | – | 0.700– | – | 0.713– | – | 0.730 | 0.748 | 0.655 | 0.540– | |||||||||||
| N8 | 0.671– | – | – | 0.703– | – | 0.723– | 0.736 | 0.728– | 0.731 | 0.652 | 0.535– | 0.449 | ||||||||||
| N9 | 0.754 | 0.734– | 0.726– | 0.548 | 0.459 | |||||||||||||||||
| N10 | 0.794 | 0.751– | 0.728– | 0.732 | 0.741 | 0.745 | 0.668 | 0.463 | ||||||||||||||
| N11 | 0.719– | 0.721– | – | 0.734 | 0.742 | 0.748 | 0.748 | 0.668 | 0.539– | 0.457 | ||||||||||||
| N12 | 0.735– | 0.738– | 0.745 | 0.750 | 0.743 | 0.740 | 0.667 | 0.548 | 0.458 | |||||||||||||
| N13 | 0.655– | – | – | 0.672– | – | – | 0.713– | – | 0.739 | 0.736– | 0.732 | 0.668 | 0.548 | 0.457 | ||||||||
| N80 | 0.674– | – | – | 0.690– | – | – | 0.703– | – | – | 0.705– | – | 0.703– | 0.691– | 0.632– | 0.529– | – | 0.452 | |||||
Figure 9The average top percentage curves for the imbalanced E. coli data set.
Confidence intervals of performance measures (×100) and informational odds ratios for models produced by the imbalanced S. cerevisiae data set.
| AUC | Precision | Recall | F1 | MCC | IOR | |
|---|---|---|---|---|---|---|
| CMIM32 | 82.5 ± 1.2 | 74.4 ± 3.1 | 36.9 ± 4.5 | 49.3 ± 3.8 | 45.0 ± 3.6 | 5.2 ± 0.5 |
| mRMR31 | 82.1 ± 1.6 | 73.8 ± 3.2 | 37.2 ± 4.3 | 49.5 ± 3.6 | 44.9 ± 3.4 | 5.2 ± 0.5 |
| Hwang | 77.5 ± 2.2 | 74.3 ± 3.7 | 34.3 ± 4.3 | 46.9 ± 4.0 | 43.2 ± 3.6 | 5.1 ± 0.4 |
| Acencio | 70.7 ± 3.4 | 67.5 ± 6.3 | 12.1 ± 5.5 | 20.4 ± 7.6 | 22.8 ± 6.0 | 3.7 ± 0.4 |
| N4 | 74.4 ± 2.7 | 78.2 ± 3.7 | 32.7 ± 4.1 | 46.1 ± 4.1 | 43.9 ± 3.5 | 5.3 ± 0.4 |
| N5 | 72.7 ± 3.6 | 74.1 ± 4.1 | 38.7 ± 4.7 | 50.9 ± 4.1 | 46.1 ± 3.8 | 5.3 ± 0.5 |
| N6 | 73.0 ± 3.2 | 75.2 ± 4.2 | 39.5 ± 4.4 | 51.8 ± 3.8 | 47.2 ± 3.6 | 5.5 ± 0.5 |
| N7 | 76.1 ± 2.4 | 76.7 ± 3.7 | 38.6 ± 4.4 | 51.3 ± 3.9 | 47.3 ± 3.6 | 5.5 ± 0.5 |
| N8 | 77.2 ± 2.4 | 75.5 ± 3.4 | 37.1 ± 4.9 | 49.8 ± 4.3 | 45.7 ± 3.9 | 5.3 ± 0.5 |
| N9 | 78.2 ± 2.4 | 74.9 ± 3.4 | 38.2 ± 4.5 | 50.6 ± 3.9 | 46.2 ± 3.6 | 5.4 ± 0.5 |
| N10 | 78.1 ± 2.2 | 75.1 ± 3.5 | 39.9 ± 4.1 | 52.1 ± 3.6 | 47.4 ± 3.5 | 5.5 ± 0.5 |
| N11 | 78.6 ± 2.1 | 75.2 ± 3.2 | 40.2 ± 4.2 | 52.4 ± 3.6 | 47.6 ± 3.4 | 5.5 ± 0.5 |
| N12 | 79.8 ± 2.0 | 75.9 ± 3.2 | 40.9 ± 4.2 | 53.2 ± 3.6 | 48.5 ± 3.4 | 5.7 ± 0.5 |
| N13 | 78.9 ± 1.9 | 74.8 ± 3.2 | 43.3 ± 4.3 | 54.9 ± 3.4 | 49.5 ± 3.4 | 5.8 ± 0.5 |
| N14 | 80.2 ± 1.8 | 74.9 ± 3.2 | 39.7 ± 4.3 | 51.9 ± 3.5 | 47.1 ± 3.4 | 5.5 ± 0.5 |
| N15 | 80.1 ± 1.9 | 76.3 ± 3.3 | 40.6 ± 4.2 | 53.0 ± 3.5 | 48.5 ± 3.5 | 5.7 ± 0.5 |
| N16 | 81.4 ± 1.7 | 76.2 ± 3.2 | 40.1 ± 4.6 | 52.5 ± 3.8 | 48.0 ± 3.6 | 5.6 ± 0.5 |
| N17 | 81.4 ± 1.7 | 76.1 ± 3.3 | 40.7 ± 4.5 | 53.0 ± 3.8 | 48.4 ± 3.6 | 5.7 ± 0.5 |
| N18 | 81.1 ± 1.8 | 75.1 ± 3.2 | 41.1 ± 4.3 | 53.1 ± 3.6 | 48.2 ± 3.5 | 5.6 ± 0.5 |
| N90 | 82.9 ± 1.0 | 73.8 ± 2.8 | 35.5 ± 4.5 | 47.9 ± 3.6 | 43.8 ± 3.4 | 5.1 ± 0.4 |
Confidence intervals of performance measures (×100) and informational odds ratios for models produced by the balanced S. cerevisiae data set.
| AUC | Precision | Recall | F1 | MCC | IOR | |
|---|---|---|---|---|---|---|
| CMIM32 | 84.2 ± 1.5 | 77.2 ± 2.2 | 76.6 ± 2.9 | 76.9 ± 2.1 | 54.0 ± 3.9 | 3.3 ± 0.4 |
| mRMR31 | 83.6 ± 1.6 | 76.5 ± 2.4 | 74.1 ± 3.0 | 75.2 ± 2.1 | 51.3 ± 4.1 | 3.0 ± 0.3 |
| Hwang | 82.2 ± 1.7 | 77.8 ± 2.6 | 72.0 ± 3.7 | 74.8 ± 2.3 | 51.6 ± 3.9 | 3.0 ± 0.3 |
| Acencio | 76.8 ± 2.2 | 69.6 ± 2.4 | 73.4 ± 4.0 | 71.4 ± 2.3 | 41.4 ± 4.3 | 2.5 ± 0.3 |
| N4 | 81.1 ± 1.8 | 77.7 ± 2.5 | 71.6 ± 3.6 | 74.5 ± 2.3 | 51.2 ± 3.9 | 3.0 ± 0.3 |
| N5 | 82.4 ± 1.8 | 77.8 ± 2.6 | 73.5 ± 3.5 | 75.6 ± 2.2 | 52.7 ± 4.1 | 3.1 ± 0.3 |
| N6 | 82.7 ± 1.8 | 77.8 ± 2.6 | 73.9 ± 3.6 | 75.8 ± 2.3 | 53.0 ± 4.1 | 3.1 ± 0.3 |
| N7 | 83.1 ± 1.8 | 77.9 ± 2.5 | 73.3 ± 3.5 | 75.5 ± 2.2 | 52.6 ± 4.0 | 3.1 ± 0.3 |
| N8 | 82.6 ± 1.8 | 78.6 ± 2.5 | 72.1 ± 3.5 | 75.2 ± 2.3 | 52.7 ± 4.0 | 3.1 ± 0.3 |
| N9 | 83.3 ± 1.8 | 79.1 ± 2.4 | 73.5 ± 3.4 | 76.2 ± 2.2 | 54.1 ± 3.8 | 3.2 ± 0.3 |
| N10 | 83.4 ± 1.7 | 78.9 ± 2.4 | 73.6 ± 3.3 | 76.1 ± 2.1 | 54.0 ± 3.8 | 3.2 ± 0.3 |
| N11 | 83.1 ± 1.7 | 78.4 ± 2.4 | 73.7 ± 3.5 | 76.0 ± 2.2 | 53.5 ± 3.8 | 3.2 ± 0.3 |
| N12 | 82.9 ± 1.8 | 77.9 ± 2.5 | 73.2 ± 3.1 | 75.5 ± 2.2 | 52.6 ± 4.2 | 3.1 ± 0.3 |
| N13 | 83.4 ± 1.7 | 78.8 ± 2.4 | 73.0 ± 3.5 | 75.8 ± 2.2 | 53.5 ± 3.9 | 3.1 ± 0.3 |
| N14 | 83.6 ± 1.6 | 77.7 ± 2.3 | 74.3 ± 3.4 | 75.9 ± 2.1 | 53.0 ± 3.8 | 3.2 ± 0.3 |
| N15 | 84.3 ± 1.7 | 78.4 ± 2.4 | 74.8 ± 3.3 | 76.6 ± 2.1 | 54.2 ± 3.9 | 3.3 ± 0.4 |
| N16 | 84.2 ± 1.6 | 77.7 ± 2.2 | 75.6 ± 3.1 | 76.7 ± 2.0 | 54.0 ± 3.7 | 3.3 ± 0.4 |
| N17 | 84.7 ± 1.6 | 77.8 ± 2.3 | 76.3 ± 3.0 | 77.0 ± 2.0 | 54.5 ± 3.8 | 3.3 ± 0.4 |
| N18 | 84.0 ± 1.6 | 77.9 ± 2.4 | 74.0 ± 3.3 | 75.9 ± 2.0 | 53.1 ± 3.8 | 3.1 ± 0.3 |
| N90 | 83.9 ± 1.4 | 76.0 ± 2.0 | 75.3 ± 2.7 | 75.7 ± 1.8 | 51.6 ± 3.5 | 3.1 ± 0.3 |
Confidence intervals of performance measures (×100) and informational odds ratios for models produced by the imbalanced E. coli data set.
| AUC | Precision | Recall | F1 | MCC | IOR | |
|---|---|---|---|---|---|---|
| CMIM09 | 70.1 ± 0.9 | 72.0 ± 1.4 | 27.1 ± 0.7 | 39.4 ± 0.9 | 38.2 ± 0.9 | 5.2 ± 0.6 |
| mRMR13 | 71.5 ± 2.4 | 71.3 ± 5.4 | 25.0 ± 5.8 | 37.0 ± 6.8 | 36.0 ± 5.8 | 4.7 ± 0.6 |
| Gustafson | 71.1 ± 2.3 | 66.5 ± 4.6 | 25.5 ± 5.0 | 36.8 ± 5.2 | 34.7 ± 4.8 | 4.9 ± 0.6 |
| N4 | 69.1 ± 1.8 | 72.5 ± 1.0 | 28.0 ± 0.6 | 40.4 ± 0.7 | 39.1 ± 0.7 | 5.5 ± 0.6 |
| N5 | 69.0 ± 2.0 | 73.7 ± 1.4 | 29.5 ± 0.8 | 42.1 ± 0.9 | 40.7 ± 1.0 | 5.7 ± 0.6 |
| N6 | 70.1 ± 1.7 | 74.2 ± 1.4 | 28.7 ± 0.9 | 41.4 ± 1.1 | 40.3 ± 1.0 | 5.7 ± 0.6 |
| N7 | 71.4 ± 1.4 | 73.5 ± 1.3 | 27.5 ± 0.7 | 40.0 ± 0.9 | 39.2 ± 0.9 | 5.5 ± 0.6 |
| N8 | 70.5 ± 1.3 | 74.2 ± 1.1 | 28.8 ± 0.8 | 41.5 ± 0.9 | 40.5 ± 0.9 | 5.7 ± 0.6 |
| N9 | 70.7 ± 1.5 | 72.6 ± 1.4 | 29.3 ± 1.0 | 41.7 ± 1.2 | 40.1 ± 1.2 | 5.6 ± 0.6 |
| N10 | 71.1 ± 1.5 | 72.4 ± 1.6 | 29.4 ± 1.0 | 41.8 ± 1.1 | 40.1 ± 1.2 | 5.6 ± 0.6 |
| N11 | 71.4 ± 1.4 | 73.2 ± 1.3 | 27.8 ± 0.8 | 40.3 ± 1.0 | 39.3 ± 1.0 | 5.5 ± 0.6 |
| N12 | 71.2 ± 1.4 | 72.5 ± 1.9 | 29.2 ± 1.1 | 41.6 ± 1.2 | 40.0 ± 1.3 | 5.6 ± 0.6 |
| N13 | 71.4 ± 1.3 | 73.3 ± 1.8 | 28.7 ± 1.2 | 41.3 ± 1.4 | 40.0 ± 1.4 | 5.6 ± 0.6 |
| N80 | 71.6 ± 0.9 | 67.7 ± 2.1 | 23.7 ± 1.3 | 35.2 ± 1.6 | 33.9 ± 1.6 | 4.9 ± 0.6 |
Confidence intervals of performance measures (×100) and informational odds ratios for models produced by balanced E. coli data set.
| AUC | Precision | Recall | F1 | MCC | IOR | |
|---|---|---|---|---|---|---|
| CMIM09 | 76.7 ± 1.7 | 72.0 ± 2.4 | 70.0 ± 3.7 | 71.0 ± 2.2 | 42.1 ± 4.0 | 2.4 ± 0.3 |
| mRMR13 | 76.2 ± 2.0 | 72.8 ± 3.1 | 65.4 ± 6.4 | 68.9 ± 3.3 | 39.6 ± 3.9 | 2.2 ± 0.2 |
| Gustafson | 77.7 ± 2.6 | 72.2 ± 3.3 | 71.5 ± 4.0 | 71.9 ± 2.8 | 44.0 ± 5.5 | 2.6 ± 0.3 |
| N4 | 78.0 ± 1.6 | 73.3 ± 2.5 | 70.1 ± 2.7 | 71.7 ± 1.9 | 44.6 ± 3.8 | 2.6 ± 0.3 |
| N5 | 77.9 ± 1.7 | 73.0 ± 2.5 | 70.6 ± 2.8 | 71.8 ± 1.8 | 44.5 ± 3.8 | 2.6 ± 0.3 |
| N6 | 76.2 ± 1.7 | 73.5 ± 2.6 | 66.3 ± 4.3 | 69.6 ± 2.6 | 42.5 ± 4.1 | 2.4 ± 0.3 |
| N7 | 78.3 ± 1.7 | 73.7 ± 2.5 | 69.6 ± 2.7 | 71.6 ± 1.8 | 44.8 ± 3.6 | 2.6 ± 0.3 |
| N8 | 78.1 ± 1.7 | 72.3 ± 2.3 | 71.1 ± 3.3 | 71.7 ± 2.1 | 43.9 ± 3.8 | 2.5 ± 0.3 |
| N9 | 78.2 ± 1.6 | 71.5 ± 2.2 | 70.3 ± 4.1 | 70.9 ± 2.3 | 42.3 ± 3.8 | 2.4 ± 0.3 |
| N10 | 78.1 ± 1.6 | 72.5 ± 2.4 | 70.2 ± 3.3 | 71.3 ± 2.1 | 43.6 ± 3.9 | 2.5 ± 0.3 |
| N11 | 77.7 ± 1.7 | 71.9 ± 2.2 | 70.0 ± 3.3 | 70.9 ± 2.0 | 42.6 ± 3.6 | 2.5 ± 0.3 |
| N12 | 77.6 ± 1.9 | 71.5 ± 2.3 | 69.5 ± 4.5 | 70.5 ± 2.5 | 41.8 ± 4.0 | 2.4 ± 0.3 |
| N13 | 77.6 ± 1.7 | 73.1 ± 2.4 | 69.5 ± 3.0 | 71.2 ± 2.1 | 43.9 ± 3.9 | 2.5 ± 0.3 |
| N80 | 76.9 ± 1.8 | 71.1 ± 2.4 | 71.5 ± 2.4 | 71.3 ± 1.8 | 42.4 ± 3.8 | 2.5 ± 0.3 |
Performance comparison of our method vs. mRMR for the imbalanced S. cerevisiae data set with the same sizes of feature subsets, where the > symbol represents that the values are significantly higher.
| AUC | Precision | Recall | F-measure | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.744 | 0.762 | 0.782 | 0.756 | 0.327 | 0.331 | 0.461 | 0.461 | 0.439 | 0.430 |
| N5 | 0.727 | 0.718 | 0.741 | 0.740 | 0.387 | 0.359 | 0.509 | 0.484 | 0.461 | 0.442 |
| N6 | 0.730 | 0.753 | 0.752 | 0.753 | 0.395 > | 0.333 | 0.518 > | 0.462 | 0.472 > | 0.430 |
| N7 | 0.761 | 0.763 | 0.767 | 0.761 | 0.386 > | 0.330 | 0.513 > | 0.460 | 0.473 > | 0.431 |
| N8 | 0.772 > | 0.771 | 0.755 | 0.757 | 0.371 > | 0.326 | 0.498 > | 0.456 | 0.457 > | 0.427 |
| N9 | 0.782 | 0.776 | 0.749 | 0.749 | 0.382 > | 0.341 | 0.506 > | 0.469 | 0.462 > | 0.434 |
| N10 | 0.781 > | 0.778 | 0.751 | 0.752 | 0.399 > | 0.340 | 0.521 > | 0.469 | 0.474 > | 0.434 |
| N11 | 0.786 > | 0.774 | 0.752 | 0.750 | 0.402 > | 0.341 | 0.524 > | 0.469 | 0.476 > | 0.434 |
| N12 | 0.798 > | 0.781 | 0.759 | 0.757 | 0.409 > | 0.334 | 0.532 > | 0.463 | 0.485 > | 0.432 |
| N13 | 0.789 > | 0.774 | 0.748 | 0.746 | 0.433 > | 0.342 | 0.549 > | 0.469 | 0.495 > | 0.432 |
| N14 | 0.802 > | 0.775 | 0.749 | 0.750 | 0.397 > | 0.340 | 0.519 > | 0.468 | 0.471 > | 0.433 |
| N15 | 0.801 > | 0.798 | 0.763 | 0.764 | 0.406 > | 0.318 | 0.530 > | 0.449 | 0.485 > | 0.424 |
| N16 | 0.814 > | 0.799 | 0.762 | 0.762 | 0.401 > | 0.318 | 0.525 > | 0.449 | 0.480 > | 0.423 |
| N17 | 0.814 > | 0.799 | 0.761 | 0.759 | 0.407 > | 0.326 | 0.530 > | 0.456 | 0.484 > | 0.427 |
| N18 | 0.811 > | 0.797 | 0.751 | 0.749 | 0.411 > | 0.342 | 0.531 > | 0.469 | 0.482 > | 0.434 |
Performance comparison of our new method vs. CMIM for the imbalanced S. cerevisiae data set when identical number of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.744 | 0.761 | 0.782 | 0.762 | 0.327 | 0.344 | 0.461 | 0.474 | 0.439 | 0.442 |
| N5 | 0.727 | 0.735 | 0.741 | 0.738 | 0.387 | 0.371 | 0.509 | 0.494 | 0.461 | 0.449 |
| N6 | 0.730 | 0.757 | 0.752 | 0.749 | 0.395 | 0.359 | 0.518 | 0.485 | 0.472 | 0.446 |
| N7 | 0.761 | 0.779 | 0.767 | 0.763 | 0.386 > | 0.339 | 0.513 > | 0.470 | 0.473 > | 0.439 |
| N8 | 0.772 | 0.779 | 0.755 | 0.754 | 0.371 > | 0.350 | 0.498 > | 0.478 | 0.457 > | 0.442 |
| N9 | 0.782 | 0.776 | 0.749 | 0.750 | 0.382 > | 0.357 | 0.506 > | 0.483 | 0.462 > | 0.445 |
| N10 | 0.781 | 0.782 | 0.751 | 0.751 | 0.399 > | 0.353 | 0.521 > | 0.480 | 0.474 > | 0.443 |
| N11 | 0.786 | 0.786 | 0.752 | 0.752 | 0.402 > | 0.363 | 0.524 > | 0.490 | 0.476 > | 0.450 |
| N12 | 0.798 | 0.799 | 0.759 | 0.758 | 0.409 > | 0.354 | 0.532 > | 0.483 | 0.485 > | 0.447 |
| N13 | 0.789 | 0.797 | 0.748 | 0.750 | 0.433 > | 0.360 | 0.549 > | 0.487 | 0.495 > | 0.447 |
| N14 | 0.802 > | 0.801 | 0.749 | 0.749 | 0.397 > | 0.348 | 0.519 > | 0.475 | 0.471 > | 0.438 |
| N15 | 0.801 > | 0.797 | 0.763 | 0.760 | 0.406 > | 0.330 | 0.530 > | 0.460 | 0.485 > | 0.430 |
| N16 | 0.814 > | 0.796 | 0.762 | 0.759 | 0.401 > | 0.338 | 0.525 > | 0.468 | 0.480 > | 0.436 |
| N17 | 0.814 > | 0.795 | 0.761 | 0.756 | 0.407 > | 0.339 | 0.530 > | 0.469 | 0.484 > | 0.435 |
| N18 | 0.811 | 0.799 | 0.751 | 0.756 | 0.411 > | 0.338 | 0.531 > | 0.467 | 0.482 > | 0.435 |
Performance comparison of our method vs. mRMR for the balanced S. cerevisiae data set with the same sizes of feature subsets, where the > symbol indicates that the values are significantly higher.
| AUC | Precision | Recall | F-measure | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.811 | 0.815 | 0.777 | 0.770 | 0.716 | 0.725 | 0.745 | 0.747 | 0.512 | 0.510 |
| N5 | 0.824 | 0.818 | 0.778 | 0.771 | 0.735 | 0.722 | 0.756 | 0.745 | 0.527 | 0.508 |
| N6 | 0.827 | 0.814 | 0.778 | 0.775 | 0.739 | 0.709 | 0.758 | 0.740 | 0.530 | 0.504 |
| N7 | 0.831 | 0.824 | 0.779 | 0.779 | 0.733 | 0.718 | 0.755 | 0.747 | 0.526 | 0.516 |
| N8 | 0.826 | 0.827 | 0.786 | 0.781 | 0.721 | 0.721 | 0.752 | 0.750 | 0.527 | 0.521 |
| N9 | 0.833 | 0.834 | 0.791 | 0.783 | 0.735 | 0.734 | 0.762 | 0.758 | 0.541 | 0.531 |
| N10 | 0.834 | 0.835 | 0.789 | 0.783 | 0.736 | 0.733 | 0.761 | 0.757 | 0.540 | 0.531 |
| N11 | 0.831 | 0.834 | 0.784 | 0.780 | 0.737 | 0.730 | 0.760 | 0.754 | 0.535 | 0.525 |
| N12 | 0.829 | 0.834 | 0.779 | 0.778 | 0.732 | 0.734 | 0.755 | 0.755 | 0.526 > | 0.525 |
| N13 | 0.834 | 0.834 | 0.788 | 0.779 | 0.730 | 0.732 | 0.758 | 0.754 | 0.535 | 0.525 |
| N14 | 0.836 | 0.832 | 0.777 | 0.777 | 0.743 | 0.731 | 0.759 | 0.753 | 0.530 | 0.522 |
| N15 | 0.843 | 0.835 | 0.784 | 0.778 | 0.748 | 0.734 | 0.766 | 0.756 | 0.542 | 0.526 |
| N16 | 0.842 > | 0.836 | 0.777 | 0.777 | 0.756 > | 0.735 | 0.767 > | 0.755 | 0.540 > | 0.525 |
| N17 | 0.847 > | 0.834 | 0.778 | 0.777 | 0.763 | 0.733 | 0.770 > | 0.754 | 0.545 > | 0.523 |
| N18 | 0.840 | 0.835 | 0.779 | 0.778 | 0.740 | 0.735 | 0.759 | 0.756 | 0.531 | 0.526 |
Performance comparison of our method vs. CMIM for the balanced S. cerevisiae data set when identical number of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.811 | 0.813 | 0.777 | 0.777 | 0.716 | 0.724 | 0.745 | 0.749 | 0.512 | 0.517 |
| N5 | 0.824 | 0.817 | 0.778 | 0.775 | 0.735 | 0.740 | 0.756 | 0.757 | 0.527 | 0.526 |
| N6 | 0.827 | 0.821 | 0.778 | 0.777 | 0.739 | 0.742 | 0.758 | 0.759 | 0.530 | 0.529 |
| N7 | 0.831 | 0.830 | 0.779 | 0.772 | 0.733 | 0.744 | 0.755 | 0.758 | 0.526 | 0.524 |
| N8 | 0.826 | 0.833 | 0.786 | 0.776 | 0.721 | 0.738 | 0.752 | 0.756 | 0.527 | 0.525 |
| N9 | 0.833 | 0.834 | 0.791 | 0.775 | 0.735 | 0.740 | 0.762 | 0.757 | 0.541 | 0.526 |
| N10 | 0.834 | 0.835 | 0.789 | 0.776 | 0.736 | 0.739 | 0.761 | 0.757 | 0.540 | 0.527 |
| N11 | 0.831 | 0.836 | 0.784 | 0.778 | 0.737 | 0.739 | 0.760 | 0.758 | 0.535 | 0.528 |
| N12 | 0.829 | 0.838 | 0.779 | 0.779 | 0.732 | 0.742 | 0.755 | 0.760 | 0.526 | 0.532 |
| N13 | 0.834 | 0.837 | 0.788 | 0.778 | 0.730 | 0.741 | 0.758 | 0.759 | 0.535 | 0.530 |
| N14 | 0.836 | 0.836 | 0.777 | 0.777 | 0.743 | 0.743 | 0.759 | 0.759 | 0.530 | 0.530 |
| N15 | 0.843 | 0.836 | 0.784 | 0.777 | 0.748 | 0.739 | 0.766 | 0.758 | 0.542 | 0.528 |
| N16 | 0.842 > | 0.837 | 0.777 | 0.776 | 0.756 | 0.741 | 0.767 > | 0.758 | 0.540 > | 0.528 |
| N17 | 0.847 > | 0.838 | 0.778 | 0.777 | 0.763 > | 0.744 | 0.770 > | 0.760 | 0.545 > | 0.531 |
| N18 | 0.840 | 0.837 | 0.779 | 0.778 | 0.740 | 0.746 | 0.759 | 0.762 | 0.531 | 0.533 |
Performance comparison of our method vs. mRMR for the imbalanced E. coli data set when identical numbers of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.691 | 0.651 | 0.725 | 0.678 | 0.280 | 0.269 | 0.404 | 0.385 | 0.391 | 0.363 |
| N5 | 0.690 | 0.675 | 0.737 | 0.687 | 0.295 | 0.254 | 0.421 | 0.371 | 0.407 | 0.356 |
| N6 | 0.701 | 0.681 | 0.742 | 0.708 | 0.287 | 0.220 | 0.414 | 0.336 | 0.403 | 0.338 |
| N7 | 0.714 | 0.686 | 0.735 | 0.712 | 0.275 | 0.212 | 0.400 | 0.326 | 0.392 | 0.333 |
| N8 | 0.705 | 0.692 | 0.742 | 0.713 | 0.288 | 0.209 | 0.415 | 0.323 | 0.405 | 0.330 |
| N9 | 0.707 | 0.692 | 0.726 | 0.713 | 0.293 > | 0.199 | 0.417 > | 0.312 | 0.401 | 0.322 |
| N10 | 0.711 | 0.697 | 0.724 | 0.703 | 0.294 > | 0.193 | 0.418 > | 0.302 | 0.401 | 0.313 |
| N11 | 0.714 < | 0.702 | 0.732 | 0.697 | 0.278 > | 0.187 | 0.403 | 0.295 | 0.393 | 0.306 |
| N12 | 0.712 < | 0.704 | 0.725 | 0.683 | 0.292 | 0.192 | 0.416 | 0.300 | 0.400 | 0.305 |
| N13 | 0.714 < | 0.715 | 0.733 | 0.713 | 0.287 | 0.250 | 0.413 | 0.370 | 0.400 | 0.360 |
Performance comparison of our method vs. CMIM for the imbalanced E. coli data set when identical numbers of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.691 > | 0.663 | 0.725 | 0.717 | 0.280 | 0.271 | 0.404 | 0.393 | 0.391 | 0.381 |
| N5 | 0.690 | 0.686 | 0.737 | 0.710 | 0.295 > | 0.264 | 0.421 > | 0.385 | 0.407 > | 0.373 |
| N6 | 0.701 | 0.697 | 0.742 | 0.715 | 0.287 > | 0.265 | 0.414 > | 0.387 | 0.403 > | 0.376 |
| N7 | 0.714 | 0.693 | 0.735 | 0.711 | 0.275 > | 0.261 | 0.400 > | 0.382 | 0.392 > | 0.371 |
| N8 | 0.705 | 0.690 | 0.742 | 0.709 | 0.288 > | 0.254 | 0.415 > | 0.373 | 0.405 > | 0.364 |
| N9 | 0.707 | 0.701 | 0.726 | 0.720 | 0.293 > | 0.271 | 0.417 > | 0.394 | 0.401 > | 0.382 |
| N10 | 0.711 | 0.702 | 0.724 | 0.692 | 0.294 > | 0.248 | 0.418 > | 0.364 | 0.401 > | 0.353 |
| N11 | 0.714 | 0.698 | 0.732 | 0.690 | 0.278 > | 0.247 | 0.403 > | 0.363 | 0.393 > | 0.351 |
| N12 | 0.712 | 0.690 | 0.725 | 0.683 | 0.292 > | 0.239 | 0.416 > | 0.353 | 0.400 > | 0.342 |
| N13 | 0.714 | 0.688 | 0.733 | 0.678 | 0.287 > | 0.236 | 0.413 > | 0.349 | 0.400 > | 0.337 |
Performance comparison of our method vs. mRMR for the balanced E. coli data set when identical numbers of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.780 > | 0.773 | 0.733 | 0.726 | 0.701 > | 0.651 | 0.717 > | 0.686 | 0.446 > | 0.407 |
| N5 | 0.779 | 0.772 | 0.730 | 0.720 | 0.706 > | 0.654 | 0.718 > | 0.684 | 0.445 > | 0.401 |
| N6 | 0.762 | 0.771 | 0.735 | 0.717 | 0.663 | 0.649 | 0.696 | 0.680 | 0.425 | 0.394 |
| N7 | 0.783 > | 0.768 | 0.737 | 0.716 | 0.696 | 0.649 | 0.716 > | 0.680 | 0.448 > | 0.394 |
| N8 | 0.781 > | 0.764 | 0.723 | 0.715 | 0.711 > | 0.641 | 0.717 > | 0.675 | 0.439 > | 0.387 |
| N9 | 0.782 > | 0.764 | 0.715 | 0.713 | 0.703 | 0.643 | 0.709 > | 0.675 | 0.423 > | 0.386 |
| N10 | 0.781 > | 0.765 | 0.725 | 0.716 | 0.702 > | 0.636 | 0.713 > | 0.673 | 0.436 > | 0.386 |
| N11 | 0.777 > | 0.766 | 0.719 | 0.720 | 0.700 | 0.643 | 0.709 > | 0.678 | 0.426 | 0.394 |
| N12 | 0.776 | 0.765 | 0.715 | 0.714 | 0.695 | 0.643 | 0.705 | 0.676 | 0.418 | 0.388 |
| N13 | 0.776 > | 0.762 | 0.731 | 0.728 | 0.695 > | 0.654 | 0.712 > | 0.689 | 0.439 > | 0.396 |
Performance comparison of our method vs. CMIM for the balanced E. coli data set when identical numbers of features are selected.
| AUC | Precision | Recall | F1 | MCC | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| N4 | 0.780 | 0.769 | 0.733 | 0.719 | 0.701 | 0.696 | 0.717 | 0.707 | 0.446 | 0.424 |
| N5 | 0.779 | 0.771 | 0.730 | 0.715 | 0.706 | 0.696 | 0.718 | 0.705 | 0.445 | 0.419 |
| N6 | 0.762 < | 0.771 | 0.735 | 0.716 | 0.663 | 0.684 | 0.696 | 0.699 | 0.425 | 0.413 |
| N7 | 0.783 | 0.769 | 0.737 | 0.711 | 0.696 | 0.696 | 0.716 | 0.703 | 0.448 | 0.413 |
| N8 | 0.781 | 0.767 | 0.723 | 0.709 | 0.711 | 0.697 | 0.717 | 0.703 | 0.439 | 0.412 |
| N9 | 0.782 | 0.767 | 0.715 | 0.720 | 0.703 | 0.700 | 0.709 | 0.710 | 0.423 | 0.421 |
| N10 | 0.781 | 0.767 | 0.725 | 0.705 | 0.702 | 0.702 | 0.713 | 0.704 | 0.436 | 0.409 |
| N11 | 0.777 | 0.765 | 0.719 | 0.706 | 0.700 | 0.700 | 0.709 | 0.703 | 0.426 | 0.408 |
| N12 | 0.776 | 0.764 | 0.715 | 0.703 | 0.695 | 0.698 | 0.705 | 0.700 | 0.418 | 0.404 |
| N13 | 0.776 | 0.765 | 0.731 | 0.704 | 0.695 | 0.700 | 0.712 | 0.702 | 0.439 | 0.406 |