| Literature DB >> 30864316 |
Rashika Ramola1, Shantanu Jain, Predrag Radivojac.
Abstract
Accurately estimating performance accuracy of machine learning classifiers is of fundamental importance in biomedical research with potentially societal consequences upon the deployment of bestperforming tools in everyday life. Although classification has been extensively studied over the past decades, there remain understudied problems when the training data violate the main statistical assumptions relied upon for accurate learning and model characterization. This particularly holds true in the open world setting where observations of a phenomenon generally guarantee its presence but the absence of such evidence cannot be interpreted as the evidence of its absence. Learning from such data is often referred to as positive-unlabeled learning, a form of semi-supervised learning where all labeled data belong to one (say, positive) class. To improve the best practices in the field, we here study the quality of estimated performance in positive-unlabeled learning in the biomedical domain. We provide evidence that such estimates can be wildly inaccurate, depending on the fraction of positive examples in the unlabeled data and the fraction of negative examples mislabeled as positives in the labeled data. We then present correction methods for four such measures and demonstrate that the knowledge or accurate estimates of class priors in the unlabeled data and noise in the labeled data are sufficient for the recovery of true classification performance. We provide theoretical support as well as empirical evidence for the efficacy of the new performance estimation methods.Entities:
Mesh:
Year: 2019 PMID: 30864316 PMCID: PMC6417800
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
(a) Confusion matrix of on a labeled data set. (b) Standard estimation of γ, η, π and θ.
|
|
Fig. 1:Traditional vs. non-traditional performance accuracy as a function of decision threshold τ. The circles and vertical lines in all four panels indicate the threshold values and the corresponding best performances in both traditional and non-traditional setting. (Upper left) Classification accuracy: top traditional performance accmax = 0.86 is reached at the threshold value τ = 0.42, whereas the top non-traditional performance is reached at τ = 5; (Upper right) Balanced accuracy: top traditional performance baccmax = 0.84 and non-traditional performance are both reached at τ = 0; (Lower left) F-measure: top traditional performance Fmax = 0.77 is reached at τ = 0.19, whereas the top non-traditional performance is reached at τ = 0.50; (Lower right) Matthews Correlation Coefficient: top traditional performance mccmax = 0.66 and non-traditional performance are both reached at τ = 0.29.
Fig. 2:Error in the non-traditionally evaluated performance measures before and after correction for 14 biomedical data sets. PU represents the estimates on the Positive Unlabeled data without bias-correction. CR and CE represent the bias-Corrected estimates with the Real and Estimated values of α and β. In each run, the optimal decision threshold was selected first, to maximize the performance, and then the resulting performance was compared with the true performance at that same threshold. (Upper left) Classification accuracy: Eq. (12) was used for correction. All estimates were clipped between 0 and 1; (Upper right) Balanced accuracy: Eq. (13) was used for correction. All estimates were clipped between and 1; (Lower left) F-measure: Eq. (14) was used for correction. All estimates were clipped between 0 and 1; (Lower right) Matthews Correlation Coefficient: the formula from Theorem 2.1 was used for a direct correction from the mccpu estimate. All estimates were clipped between −1 and 1. The x-axis is the true value of β, according to which the box plots were grouped.