| Literature DB >> 35967970 |
Felipe Restrepo1, Namrata Mali2, Alan Abrahams3, Peter Ractham4.
Abstract
Conventional binary classification performance metrics evaluate either general measures (accuracy, F score) or specific aspects (precision, recall) of a model's classifying ability. As such, these metrics, derived from the model's confusion matrix, provide crucial insight regarding classifier-data interactions. However, modern- day computational capabilities have allowed for the creation of increasingly complex models that share nearly identical classification performance. While traditional performance metrics remain as essential indicators of a classifier's individual capabilities, their ability to differentiate between models is limited. In this paper, we present the methodology for MARS (Method for Assessing Relative Sensitivity/ Specificity) ShineThrough and MARS Occlusion scores, two novel binary classification performance metrics, designed to quantify the distinctiveness of a classifier's predictive successes and failures, relative to alternative classifiers. Being able to quantitatively express classifier uniqueness adds a novel classifier-classifier layer to the process of model evaluation and could improve ensemble model-selection decision making. By calculating both conventional performance measures, and proposed MARS metrics for a simple classifier prediction dataset, we demonstrate that the proposed metrics' informational strengths synergize well with those of traditional metrics, delivering insight complementary to that of conventional metrics. Copyright:Entities:
Keywords: Binary classification; Classifier comparative uniqueness; Classifier performance evaluation; Classifier selection optimization; Machine learning
Mesh:
Year: 2022 PMID: 35967970 PMCID: PMC9350436 DOI: 10.12688/f1000research.110567.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Format of a conventional classifier confusion matrix.
Glossary of symbols used.
| Symbol | Definition |
|---|---|
|
| Observation number |
|
| Classifier number |
|
| Total number of observations |
|
| Predicted class label for observation
|
|
| True class label for observation
|
|
| Set of classifiers |
|
| Classifier of interest |
|
| Classifier
|
|
| Constant defined in
|
|
| Constant defined in
|
|
| Total number of unique true positives across all classifiers |
|
| Exclusive true positives found by classifier
|
|
| Exclusive false negatives for classifier
|
Sample classifier prediction matrix.
| Observation ID, for Observation
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
|
|
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
|
|
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
|
| 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
|
| 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | |
|
| 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | |
Sample ShineThrough calculations for C 1. Z , constant defined for observation i.
| Observation (
| Pred. class (
| True class (
|
| Inner sum -
|
|---|---|---|---|---|
|
| 1 | 0 | 0 | (1 × 0) − max (1 × 0, 0 × 0, 0 × 0) × 0 =
|
|
| 0 | 1 | 0 | (0 × 1) − max (1 × 1, 1 × 1, 1 × 1) × 0 =
|
|
| 0 | 0 | 0 | (0 × 0) − max (1 × 0, 0 × 0, 1 × 0) × 0 =
|
|
| 0 | 1 | 0 | (0 × 1) − max (1 × 1, 0 × 1, 1 × 1) × 0 =
|
|
| 1 | 0 | 0 | (1 × 0) − max (0 × 0, 1 × 0, 0 × 0) × 0 =
|
|
| 1 | 1 | 1 | (1 × 1) − max (0 × 1, 0 × 1, 0 × 1) × 1 =
|
|
| 1 | 1 | 1 | (1 × 1) − max (0 × 1, 0 × 1, 1 × 1) × 1 =
|
|
| 1 | 1 | 1 | (1 × 1) − max (0 × 1, 0 × 1, 0 × 1) × 1 =
|
|
| 0 | 0 | 0 | (0 × 0) − max (1 × 0, 1 × 0, 0 × 1) × 0 =
|
|
| 0 | 1 | 0 | (0 × 1) − max (0 × 1, 0 × 1, 1 × 1) × 0 =
|
Figure 2. MARS ShineThrough Chart, comparing count (represented by bubble radius) of target-class observations (True Positives) exclusively spotted by classifiers C1 and the pairwise classifier combinations.
Bubble size is proportional to ShineThrough score: the larger the bubble, the higher the classifier(s) ShineThrough score.
Figure 4. MARS ShineThrough Bar Chart, comparing count of target-class observations exclusively found by classifiers C1-4.
Figure 3. MARS Occlusion Chart, comparing count (represented by bubble radius) of target-class observations (False Negatives) exclusively missed by classifiers C1-4 and the pairwise classifier combinations.
Bubble size is proportional to Occlusion score: the larger the bubble, the higher the classifier(s) Occlusion score.
Traditional vs MARS Metrics for the worked example.
| Classifier | Metrics | ||||
|---|---|---|---|---|---|
| Accuracy | Precision | Recall | ST | OCC | |
|
| 0.50 | 0.60 | 0.50 |
|
|
|
| 0.20 | 0.40 | 0.33 | 0.0 | 0.0 |
|
| 0.30 | 0.33 | 0.16 | 0.0 | 0.0 |
|
|
|
|
| 0.16 | 0.0 |
For brevity, we show only arbitrary selected classifier combinations here, rather than all possible classifier combinations. The best performing individual, and combined, classifier, on each metric, is shown with cell bolded.