| Literature DB >> 29423442 |
Xin Tong1, Yang Feng2, Jingyi Jessica Li3.
Abstract
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.Entities:
Mesh:
Year: 2018 PMID: 29423442 PMCID: PMC5804623 DOI: 10.1126/sciadv.aao1659
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Fig. 1Classical versus NP oracle classifiers in a binary classification example.
In this toy example, the two classes have equal marginal probabilities, that is, . Suppose that a user prefers a type I error of ≤0.05. The classical classifier I(X > 1) that minimizes the risk would result in a type I error of 0.16. On the other hand, the NP classifier I(X > 1.65) that minimizes the type II error under the type I error constraint (α = 0.05) delivers a desirable type I error. This figure is adapted from Tong et al. () and Li and Tong (), with permissions.
Fig. 2Choose a threshold such that the type I error is below α with high probability.
(A) The naïve approach. (B) The fivefold CV–based approach. (C) The NP approach. D NP classifies, based on D = 1000 data sets, shown as overlapping red and teal “x” (A and B, respectively) and blue “+” (C) on the curves.
Fig. 3Illustration of choosing NP classifiers from empirical ROC curves and NP-ROC lower curves in Simulation S1 (see the Supplementary Materials).
(A) Distributions of empirical type I errors and population type I errors of 1000 classifiers, with each classifier chosen from one empirical ROC curve corresponding to the largest empirical type I error no greater than 0.05. (B) Distributions of empirical type I errors and population type I errors of 1000 classifiers, with each classifier chosen from one NP-ROC lower curve (δ = 0.05) corresponding to α = 0.05.
Fig. 4Illustration of NP-ROC bands.
(A) How to draw an NP-ROC band. Each blue dashed line represents one NP classifier, with horizontal coordinate α and vertical coordinates 1 − β (lower) and 1 − β (upper). Right-continuous and left-continuous step functions are used to interpolate points on the upper and lower ends, respectively. (B) Use of NP-ROC bands to compare the two LDA methods in Simulation 2.
Fig. 5NP-ROC bands and ROC-CV curves (generated by fivefold CV) of three classification methods in real data application 1.
(A) NP-ROC bands of RFs versus penLR versus SVMs. RF dominates the other two methods for a wide range of α values. (B) ROC-CV curves of RF, penLR, and SVM. Among the three methods, RF has the largest area under the curve. (C) NP-ROC band and ROC-CV curve of RF. The black dashed vertical line marks α = 0.21, the smallest α such that the conditional type II error is no greater than 0.4. The green dashed vertical line marks α = 0.249, the value that maximizes the vertical distance from the lower curve to the diagonal line, a criterion motivated by the Youden index (). (D) NP-ROC band and ROC-CV curve of SVM.
Fig. 6NP-ROC bands of three classification methods in real data application 2.
(A) penLR versus RFs. No method dominates the other for any α values. (B) penLR versus NB. The black bar at the bottom indicates the α values where penLR is better than NB with high probability.
| 1: | ||
| 2: | ||
| 3: | ◃ For each rank threshold candidate | |
| 4: | ◃ Calculate the violation rate upper bound | |
| 5: | ◃ Pick the rank threshold | |
| 6: | ||
| 7: | ||
| 8: | ◃ Denote half of the size of | |
| 9: | ◃ Find the rank threshold | |
| 10: | ◃ Randomly split | |
| 11: | ◃ Each time randomly split | |
| 12: | ◃ Combine | |
| 13: | ◃ Write | |
| 14: | ◃ Train a scoring function | |
| 15: | ◃ Apply the scoring function | |
| 16: { | ◃ Sort elements of | |
| 17: | ◃ Find the score threshold corresponding to the rank threshold | |
| 18: | ◃ Construct an NP classifier based on the scoring function | |
| 19: | ◃ By majority vote | |