| Literature DB >> 31906931 |
André M Carrington1, Paul W Fieguth2, Hammad Qazi3, Andreas Holzinger4,5, Helen H Chen3, Franz Mayr6, Douglas G Manuel7,8,9,10,11,12,13.
Abstract
BACKGROUND: In classification and diagnostic testing, the receiver-operator characteristic (ROC) plot and the area under the ROC curve (AUC) describe how an adjustable threshold causes changes in two types of error: false positives and false negatives. Only part of the ROC curve and AUC are informative however when they are used with imbalanced data. Hence, alternatives to the AUC have been proposed, such as the partial AUC and the area under the precision-recall curve. However, these alternatives cannot be as fully interpreted as the AUC, in part because they ignore some information about actual negatives.Entities:
Keywords: Area under the ROC curve; C statistic; Classification; Concordance; Diagnostic testing; Explainable artificial intelligence; Imbalanced data; Partial area index; Prevalence; Receiver operating characteristic
Year: 2020 PMID: 31906931 PMCID: PMC6945414 DOI: 10.1186/s12911-019-1014-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The partial AUC versus our proposed concordant partial AUC. a The partial AUC (pAUC) provides a vertical perspective that represents the average TPR for part of the ROC curve (thick line) multiplied by the horizontal width. b The concordant partial AUC (pAUC) combines vertical and horizontal perspectives and equals the partial c statistic
Fig. 2An overview of our proposed measures and concepts (red). For a set of partial ROC curves which span the whole curve, without overlap, the sum ∑ of partial measures/concepts equals the whole measure; and the continuous ROC/AUC concepts equal their discrete c statistic and concordance matrix counterparts
An overview of definitions for proposed measures and concepts in sections that follow with the same name
1. The horizontal partial area under the curve (a section that follows) This partial area denoted | |
2. The concordant partial area under the curve (a section that follows) This partial area denoted | |
3. The concordance matrix for ROC data (a section that follows) A matrix that depicts the exact relationship between the unique scores of positives and negatives in data and their corresponding points along a matrix border that exactly matches the (empirical) ROC curve. It geometrically and procedurally equates area measures | |
4. The partial This statistic denoted |
Fig. 4The concordance matrix and ROC plot. a The proposed concordance matrix visualizes how the c statistic is computed—as the proportion of correctly ranked pairs (green) out of all pairs. b The empirical ROC plot (above) equals the border in the concordance matrix (left), visualizing the known equivalence between the c statistic and the AUC
Fig. 3To integrate horizontally perform two simple transformations (swap the axes, flip the new vertical) and then integrate normally (vertically). The two transformations have the same effect as a 90 degree clockwise rotation
Fig. 5Local concordance for one versus all parts of the border. a Local concordance for the middle part of the concordance matrix border split into three disjoint parts. b Local concordance for all three parts of the concordance matrix border does not use all of the cells in the matrix
Fig. 6Partial concordance versus concordant partial AUC. a The partial c statistic for part of the concordance matrix border (or ROC curve). b The concordant partial AUC in green corresponds to the green (positive) cells highlighted in the matrix at left
Fig. 7Imbalanced data and partial ROC curves without a horizontal area component. a Our measures work with imbalanced data—e.g, five positives to fifteen negatives. b Our measures also work with partial ROC curves that have no horizontal area component (or no vertical area component)
Fig. 8Interpolation and ties in the concordance matrix. At left, ties in score exist along both axes. At right, the partial curve’s right boundary has a height of 0.85, hence interpolation is required to compute the partial c statistic
Area measures and c statistics are shown for 3 parts of an ROC curve i = {1 . . . 3} as well as the whole curve, for a classifier, a support vector machine, applied to Ljubljana breast cancer remission data. Best values per column are shown in bold font
| 1 | [0 | [0 | 21 | |||
| 2 | [0 | [0 | 29 | 17 | 5 | 17 |
| 3 | [0 | [0 | 17 | 1 | 17 | |
| sum | – | – | 84 | 84 | 84 | 84 |
| whole | ||||||
Normalized area measures , and sPA are shown for 3 parts of an ROC curve using a support vector machine classifier on Ljubljana breast cancer remission data. Best values per column are shown in bold font
| 1 | [0 | [0 | 64 | 84 | 78 |
| 2 | [0 | [0 | 89 | 79 | 89 |
| 3 | [0 | [0 |
We report the performance of four classifiers in one experiment with best values per row shown in bold font
| Measures | LDA | LogR | SVM | NN | NN-SVM |
|---|---|---|---|---|---|
| Whole Area | |||||
| | 82 | 77 | 84.8% | 1 | |
| | 60 | 53 | 71.0% | −1.2% | |
| | 54.5% | 53 | 53 | −0.4% | |
| Partial Area | |||||
| | 75 | 69 | 78.8% | 0 | |
| | 19 | 16 | 21.3% | 0 | |
| | 47 | 37 | 48.0% | ||
| Partial Area | |||||
| | 90.0% | 82 | 89 | 2 | |
| | 29.7% | 27 | 29 | 0 | |
| | 18 | 17 | 21.0% | 3 | |
| Partial Area | |||||
| | 99.7% | 0 | |||
| | 0% | ||||
| | 17.0% | 17.0% | 17.0% | ||
| 3 | |||||
| 1 | |||||
| 1 | |||||
Fig. 9A comparison of the leftmost partial curve and area between two classifiers applied to Ljubljana breast cancer remission data. a Neural network (NN) ROC plot. b Support vector machine (SVM) ROC plot