| Literature DB >> 28203291 |
Julio Cesar Dias Lopes1, Fábio Mendes Dos Santos1, Andrelly Martins-José1, Koen Augustyns2, Hans De Winter2.
Abstract
A new metric for the evaluation of model performance in the field of virtual screening and quantitative structure-activity relationship applications is described. This metric has been termed the power metric and is defined as the fraction of the true positive rate divided by the sum of the true positive and false positive rates, for a given cutoff threshold. The performance of this metric is compared with alternative metrics such as the enrichment factor, the relative enrichment factor, the receiver operating curve enrichment factor, the correct classification rate, Matthews correlation coefficient and Cohen's kappa coefficient. The performance of this new metric is found to be quite robust with respect to variations in the applied cutoff threshold and ratio of the number of active compounds to the total number of compounds, and at the same time being sensitive to variations in model quality. It possesses the correct characteristics for its application in early-recognition virtual screening problems.Entities:
Keywords: Area under the curve (AUC); Cohen’s kappa coefficient (CKC); Correct classification rate (CCR); Enrichment factor; Matthews correlation coefficient (MCC); Metric; Model performance; Power metric (PM); Receiver operating curve enrichment factor (ROCE); Relative enrichment factor (REF); Virtual screening
Year: 2017 PMID: 28203291 PMCID: PMC5289935 DOI: 10.1186/s13321-016-0189-4
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Illustration of the relation between cutoff χ and number of predicted actives/non-actives. Assuming a list of compounds ranked according to their predicted activity values, all compounds that are located on the left side of χ on this ranked list are predicted to be active, while all compounds that are located on the right-hand side of χ on this list are predicted to be non-active. All compounds that fall along the left-hand side of χ define the ‘selection set’; in this example this includes five compounds. The total number of compounds in the selection set is N (here: 5), while the total number of compounds in the entire collection is N (here: 15). The number of true actives in the selection set is n (here: 3) and the number of true actives in the entire data collection is n (here: 4). Using these abbreviations, one can define the number of true positives TP as being equal to n , the number of true negatives TN equal to (N − N − n + n ), the number of false positives FP equal to (N − n ), and the number of false negatives FN being equal to (n − n )
Fig. 2Distribution curves of null and alternative hypothesis. The red area is the area defined by the alternative hypothesis minus the area defined by the null hypothesis, or, put differently, as the true positive rate (TPR, or 1 − β) minus the false positive rate (FPR, or α or type I error), hence ‘net power’ = 1 – β − α. The cutoff point is defined as the crossing point of the two distributions
Fig. 3Visualization of the quality of the test models. In each case, 100 active compounds were distributed in a total set of 1000 compounds according to the distribution as defined by Eq. 17. The term ‘quality’ corresponds to the λ value of Eq. 17. The ‘ideal’ case was generated by positioning all 100 actives at the top-100 positions of the dataset. For each model, the AUC was calculated by integration using the composite trapezoidal rule
Dependency on the model quality parameter λ using models generated from datasets with 100 actives (n) on 10,000 compounds in total (N)
| Metric |
|
| ||||
|---|---|---|---|---|---|---|
| 2 | 5 | 10 | 20 | 40 | ||
| PM | 0.51 ± 0.35 | 0.74 ± 0.24 | 0.89 ± 0.09 | 0.95 ± 0.02 | 0.98 ± 0.01 | 0.5 |
| ROCE | 2.35 ± 2.18 | 5.13 ± 3.39 | 10.46 ± 4.99 | 22.34 ± 7.86 | 49.96 ± 14.35 | |
| EF | 2.28 ± 2.06 | 4.83 ± 3.03 | 9.38 ± 4.04 | 18.08 ± 5.17 | 32.94 ± 6.22 | |
| REF | 2.28 ± 2.06 | 4.83 ± 3.03 | 9.38 ± 4.04 | 18.08 ± 5.17 | 32.94 ± 6.22 | |
| CCR | 0.50 ± 0.01 | 0.51 ± 0.01 | 0.52 ± 0.01 | 0.54 ± 0.01 | 0.58 ± 0.02 | |
| MCC | 0.01 ± 0.01 | 0.03 ± 0.02 | 0.06 ± 0.03 | 0.12 ± 0.04 | 0.23 ± 0.04 | |
| CKC | 0.01 ± 0.01 | 0.03 ± 0.02 | 0.06 ± 0.03 | 0.11 ± 0.03 | 0.21 ± 0.04 | |
| SEN | 0.01 ± 0.01 | 0.02 ± 0.02 | 0.05 ± 0.02 | 0.09 ± 0.03 | 0.16 ± 0.03 | |
| SPE | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | |
| PRE | 0.02 ± 0.02 | 0.05 ± 0.03 | 0.09 ± 0.04 | 0.18 ± 0.05 | 0.33 ± 0.06 | |
| ACC | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | |
| PM | 0.61 ± 0.23 | 0.80 ± 0.11 | 0.90 ± 0.04 | 0.95 ± 0.01 | 0.98 ± 0.00 | 1 |
| ROCE | 2.32 ± 1.55 | 5.07 ± 2.31 | 10.19 ± 3.34 | 20.97 ± 5.08 | 44.00 ± 8.22 | |
| EF | 2.26 ± 1.48 | 4.83 ± 2.09 | 9.25 ± 2.75 | 17.33 ± 3.46 | 30.54 ± 3.95 | |
| REF | 2.26 ± 1.48 | 4.83 ± 2.09 | 9.25 ± 2.75 | 17.33 ± 3.46 | 30.54 ± 3.95 | |
| CCR | 0.51 ± 0.01 | 0.52 ± 0.01 | 0.54 ± 0.01 | 0.58 ± 0.02 | 0.65 ± 0.02 | |
| MCC | 0.01 ± 0.02 | 0.04 ± 0.02 | 0.08 ± 0.03 | 0.17 ± 0.03 | 0.30 ± 0.04 | |
| CKC | 0.01 ± 0.02 | 0.04 ± 0.02 | 0.08 ± 0.03 | 0.17 ± 0.03 | 0.30 ± 0.04 | |
| SEN | 0.02 ± 0.01 | 0.05 ± 0.02 | 0.09 ± 0.03 | 0.17 ± 0.03 | 0.31 ± 0.04 | |
| SPE | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | |
| PRE | 0.02 ± 0.01 | 0.05 ± 0.02 | 0.09 ± 0.03 | 0.17 ± 0.03 | 0.31 ± 0.04 | |
| ACC | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.99 ± 0.00 | |
| PM | 0.66 ± 0.13 | 0.82 ± 0.06 | 0.90 ± 0.02 | 0.95 ± 0.01 | 0.97 ± 0.00 | 2 |
| ROCE | 2.30 ± 1.08 | 4.91 ± 1.56 | 9.69 ± 2.18 | 18.75 ± 3.08 | 35.21 ± 4.06 | |
| EF | 2.26 ± 1.03 | 4.70 ± 1.43 | 8.88 ± 1.82 | 15.87 ± 2.19 | 26.17 ± 2.23 | |
| REF | 4.52 ± 2.07 | 9.40 ± 2.85 | 17.76 ± 3.65 | 31.74 ± 4.38 | 52.34 ± 4.45 | |
| CCR | 0.51 ± 0.01 | 0.54 ± 0.01 | 0.58 ± 0.02 | 0.65 ± 0.02 | 0.75 ± 0.02 | |
| MCC | 0.02 ± 0.01 | 0.05 ± 0.02 | 0.11 ± 0.03 | 0.21 ± 0.03 | 0.36 ± 0.03 | |
| CKC | 0.02 ± 0.01 | 0.05 ± 0.02 | 0.11 ± 0.02 | 0.20 ± 0.03 | 0.34 ± 0.03 | |
| SEN | 0.05 ± 0.02 | 0.09 ± 0.03 | 0.18 ± 0.04 | 0.32 ± 0.04 | 0.52 ± 0.04 | |
| SPE | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.98 ± 0.00 | 0.99 ± 0.00 | |
| PRE | 0.02 ± 0.01 | 0.05 ± 0.01 | 0.09 ± 0.02 | 0.16 ± 0.02 | 0.26 ± 0.02 | |
| ACC | 0.97 ± 0.00 | 0.97 ± 0.00 | 0.97 ± 0.00 | 0.98 ± 0.00 | 0.98 ± 0.00 | |
Metric abbreviations are given in the Methods section. All metrics are dependent on the model quality, but in case of the ROCE, EF, REF, MCC, CKC, SEN and PRE metrics there is at least a tenfold increase when moving from a bad model (λ = 2) to a good model (λ = 40), while for the PM metric there is a doubling of the value. The accuracy ACC and specificity SPE metrics are not dependent on the quality of model, while the correct classification rate metric (CCR) shifts from 0.5 in the case of a bad model to a maximum of 0.75 for the best model. Good models have a PM of >0.9; for good models this value is largely independent on the applied cutoff value χ (see Table 3 as well)
Dependency on the χ cutoff value using models generated from datasets with 250 actives (n) on 10,000 compounds in total (N)
| Metric |
|
| ||||
|---|---|---|---|---|---|---|
| 0.5% | 1% | 2.5% | 5% | 10% | ||
| PM | 0.52 ± 0.25 | 0.57 ± 0.15 | 0.60 ± 0.08 | 0.60 ± 0.06 | 0.60 ± 0.04 | 1 |
| ROCE | 1.60 ± 1.19 | 1.59 ± 0.81 | 1.58 ± 0.51 | 1.55 ± 0.35 | 1.52 ± 0.23 | |
| EF | 1.54 ± 1.10 | 1.56 ± 0.76 | 1.55 ± 0.48 | 1.53 ± 0.33 | 1.50 ± 0.22 | |
| REF | 3.86 ± 2.75 | 3.89 ± 1.90 | 3.88 ± 1.20 | 7.63 ± 1.65 | 14.97 ± 2.22 | |
| CCR | 0.50 ± 0.00 | 0.50 ± 0.00 | 0.51 ± 0.01 | 0.51 ± 0.01 | 0.53 ± 0.01 | |
| MCC | 0.01 ± 0.01 | 0.01 ± 0.01 | 0.01 ± 0.01 | 0.02 ± 0.01 | 0.03 ± 0.01 | |
| CKC | 0.00 ± 0.01 | 0.01 ± 0.01 | 0.01 ± 0.01 | 0.02 ± 0.01 | 0.02 ± 0.01 | |
| SEN | 0.01 ± 0.01 | 0.02 ± 0.01 | 0.04 ± 0.01 | 0.08 ± 0.02 | 0.15 ± 0.02 | |
| SPE | 1.00 ± 0.00 | 0.99 ± 0.00 | 0.98 ± 0.00 | 0.95 ± 0.00 | 0.90 ± 0.00 | |
| PRE | 0.04 ± 0.03 | 0.04 ± 0.02 | 0.04 ± 0.01 | 0.04 ± 0.01 | 0.04 ± 0.01 | |
| ACC | 0.97 ± 0.00 | 0.97 ± 0.00 | 0.95 ± 0.00 | 0.93 ± 0.00 | 0.88 ± 0.00 | |
| PM | 0.96 ± 0.01 | 0.96 ± 0.01 | 0.96 ± 0.00 | 0.94 ± 0.00 | 0.91 ± 0.00 | 20 |
| ROCE | 28.80 ± 8.24 | 26.73 ± 5.20 | 22.13 ± 2.46 | 16.72 ± 1.11 | 10.49 ± 0.34 | |
| EF | 16.67 ± 2.70 | 16.12 ± 1.85 | 14.44 ± 1.03 | 11.99 ± 0.56 | 8.48 ± 0.22 | |
| REF | 41.68 ± 6.74 | 40.30 ± 4.62 | 36.09 ± 2.56 | 59.97 ± 2.79 | 84.78 ± 2.18 | |
| CCR | 0.54 ± 0.01 | 0.58 ± 0.01 | 0.67 ± 0.01 | 0.78 ± 0.01 | 0.88 ± 0.01 | |
| MCC | 0.18 ± 0.03 | 0.24 ± 0.03 | 0.34 ± 0.03 | 0.40 ± 0.02 | 0.40 ± 0.01 | |
| CKC | 0.13 ± 0.02 | 0.22 ± 0.03 | 0.34 ± 0.03 | 0.38 ± 0.02 | 0.31 ± 0.01 | |
| SEN | 0.08 ± 0.01 | 0.16 ± 0.02 | 0.36 ± 0.03 | 0.60 ± 0.03 | 0.85 ± 0.02 | |
| SPE | 1.00 ± 0.00 | 0.99 ± 0.00 | 0.98 ± 0.00 | 0.96 ± 0.00 | 0.92 ± 0.00 | |
| PRE | 0.42 ± 0.07 | 0.40 ± 0.05 | 0.36 ± 0.03 | 0.30 ± 0.01 | 0.21 ± 0.01 | |
| ACC | 0.97 ± 0.00 | 0.97 ± 0.00 | 0.97 ± 0.00 | 0.96 ± 0.00 | 0.92 ± 0.00 | |
The PM is not so much dependent on the applied cutoff value. For good models the EF and ROCE metrics decrease when the cutoff is increased, while the REF, CCR, MCC and CKC values always increase when the cutoff is increased from 2.5% up to 10%
Dependency on the R value
| Metric |
|
|
| ||
|---|---|---|---|---|---|
| 0.01 ( | 0.05 ( | 0.2 ( | |||
| PM | 0.39 ± 0.36 | 0.57 ± 0.15 | 0.62 ± 0.07 | 1 | 1 |
| ROCE | 1.59 ± 1.83 | 1.62 ± 0.85 | 1.73 ± 0.54 | ||
| EF | 1.55 ± 1.75 | 1.54 ± 0.74 | 1.48 ± 0.32 | ||
| REF | 1.55 ± 1.75 | 7.69 ± 3.71 | 29.58 ± 6.38 | ||
| CCR | 0.50 ± 0.01 | 0.50 ± 0.00 | 0.50 ± 0.00 | ||
| MCC | 0.01 ± 0.02 | 0.01 ± 0.02 | 0.02 ± 0.02 | ||
| CKC | 0.01 ± 0.02 | 0.01 ± 0.01 | 0.01 ± 0.01 | ||
| SEN | 0.02 ± 0.02 | 0.02 ± 0.01 | 0.01 ± 0.00 | ||
| SPE | 0.99 ± 0.00 | 0.99 ± 0.00 | 0.99 ± 0.00 | ||
| PRE | 0.02 ± 0.02 | 0.08 ± 0.04 | 0.30 ± 0.06 | ||
| ACC | 0.98 ± 0.00 | 0.94 ± 0.00 | 0.80 ± 0.00 | ||
| PM | 0.58 ± 0.09 | 0.60 ± 0.04 | 0.62 ± 0.02 | 10 | |
| ROCE | 1.50 ± 0.51 | 1.53 ± 0.24 | 1.62 ± 0.15 | ||
| EF | 1.49 ± 0.49 | 1.49 ± 0.22 | 1.44 ± 0.09 | ||
| REF | 14.88 ± 4.95 | 14.88 ± 2.16 | 28.73 ± 1.87 | ||
| CCR | 0.52 ± 0.03 | 0.53 ± 0.01 | 0.53 ± 0.01 | ||
| MCC | 0.02 ± 0.02 | 0.04 ± 0.02 | 0.07 ± 0.02 | ||
| CKC | 0.01 ± 0.01 | 0.03 ± 0.02 | 0.07 ± 0.01 | ||
| SEN | 0.15 ± 0.05 | 0.15 ± 0.02 | 0.14 ± 0.01 | ||
| SPE | 0.90 ± 0.00 | 0.90 ± 0.00 | 0.91 ± 0.00 | ||
| PRE | 0.01 ± 0.00 | 0.07 ± 0.01 | 0.29 ± 0.02 | ||
| ACC | 0.89 ± 0.00 | 0.86 ± 0.00 | 0.76 ± 0.00 | ||
| PM | 0.95 ± 0.02 | 0.98 ± 0.01 | 1.00 ± 0.00 | 1 | 20 |
| ROCE | 21.06 ± 7.29 | 46.82 ± 15.58 | nana | ||
| EF | 17.24 ± 4.92 | 13.94 ± 1.27 | 5.00 ± 0.00 | ||
| REF | 17.24 ± 4.92 | 69.71 ± 6.35 | 100.00 ± 0.00 | ||
| CCR | 0.58 ± 0.02 | 0.57 ± 0.01 | 0.53 ± 0.00 | ||
| MCC | 0.16 ± 0.05 | 0.30 ± 0.03 | 0.20 ± 0.00 | ||
| CKC | 0.16 ± 0.05 | 0.22 ± 0.02 | 0.08 ± 0.00 | ||
| SEN | 0.17 ± 0.05 | 0.14 ± 0.01 | 0.05 ± 0.00 | ||
| SPE | 0.99 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | ||
| PRE | 0.17 ± 0.05 | 0.70 ± 0.06 | 1.00 ± 0.00 | ||
| ACC | 0.98 ± 0.00 | 0.95 ± 0.00 | 0.81 ± 0.00 | ||
| PM | 0.90 ± 0.01 | 0.93 ± 0.00 | 1.00 ± 0.00 | 10 | |
| ROCE | 9.30 ± 0.57 | 13.38 ± 0.59 | 1612.74 ± 529.71 | ||
| EF | 8.58 ± 0.49 | 8.26 ± 0.22 | 4.99 ± 0.01 | ||
| REF | 85.82 ± 4.86 | 82.60 ± 2.15 | 99.84 ± 0.18 | ||
| CCR | 0.88 ± 0.02 | 0.88 ± 0.01 | 0.75 ± 0.00 | ||
| MCC | 0.25 ± 0.02 | 0.56 ± 0.02 | 0.67 ± 0.00 | ||
| CKC | 0.14 ± 0.01 | 0.52 ± 0.02 | 0.61 ± 0.00 | ||
| SEN | 0.86 ± 0.05 | 0.83 ± 0.02 | 0.50 ± 0.00 | ||
| SPE | 0.91 ± 0.00 | 0.94 ± 0.00 | 1.00 ± 0.00 | ||
| PRE | 0.09 ± 0.00 | 0.41 ± 0.01 | 1.00 ± 0.00 | ||
| ACC | 0.91 ± 0.00 | 0.93 ± 0.00 | 0.90 ± 0.00 | ||
In the case of bad model quality (λ = 1), the metrics most sensitive to variations in the R value include the REF, PRE and ACC metrics, and also the CKC metric in the case of a large cutoff value of χ = 10%. This dependency is not so outspoken for the PM metric, except in the case when a very bad model is combined with a low cutoff value (χ = 1%). In cases with better model quality (λ = 20), significant dependencies are observed for the ROCE, EF, REF, MCC, CKC, SEN, PRE and ACC metrics, while the PM, CCR and SPE metrics are more stable. The metric that is least sensitive to variations in the R value, irrespective of the underlying model quality or cutoff threshold, is the CCR metric
aIn this case the ROCE metric could not be calculated from Eq. 10 since (N − n ) is equal to 0
Fig. 4Comparison of the power metric with the five main other metrics (CCR, ROCE, MCC, REF and CKC) using a model dataset of 250 active compounds on a total number of 10,000. The logarithm of the quality parameter λ is varied along the abscissa [a log(λ) of 2 corresponds to a quality λ of 100] while the applied cutoff threshold χ is varied along the ordinates. The black dotted line at a cutoff value χ of 2.5% indicates the boundary of 250 compounds on a total of 10,000. In a perfect model, all 250 active compounds would be located along the topside of this boundary