| Literature DB >> 22211175 |
Xiaoqian Jiang1, Melanie Osl, Jihoon Kim, Lucila Ohno-Machado.
Abstract
Predictive models are critical for risk adjustment in clinical research. Evaluation of supervised learning models often focuses on predictive model discrimination, sometimes neglecting the assessment of their calibration. Recent research in machine learning has shown the benefits of calibrating predictive models, which becomes especially important when probability estimates are used for clinical decision making. By extending the isotonic regression method for recalibration to obtain a smoother fit in reliability diagrams, we introduce a novel method that combines parametric and non-parametric approaches. The method calibrates probabilistic outputs smoothly and shows better generalization ability than its ancestors in simulated as well as real world biomedical data sets.Entities:
Year: 2011 PMID: 22211175 PMCID: PMC3248752
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1:Calibration plot with fitted probabilities by (a) sigmoid fitting and (b) isotonic regression.
Figure 2:Comparison of different calibration methods on synthetic data. Row one shows histograms of the original predicted probabilities by LR (blue bars for class c = 0 and red bars for class c = 1). Row two to five show calibration plots for the originall predicted probabilities of LR and the recalibrated probabilities after sigmoid fitting, isotonic regression, and smooth isotonic regression. The caption of each figure contains the discriminatory ability in terms of the area under the ROC curve (AUC) and the p-value of the HL test for the visualized probabilities.
Real world data sets used.
| Data | # Attr | Train size | Test size | % POS |
|---|---|---|---|---|
|
| ||||
| GSE2034 | 15 | 125 | 84 | 54 |
| GSE2990 | 15 | 54 | 36 | 67 |
| ADULT | 14 | 4,000 | 41,222 | 25 |
| BANKRUPTCY | 2 | 40 | 26 | 48 |
| HEIGHT WEIGHT | 2 | 126 | 84 | 64 |
| HOSPITAL | 22 | 2,891 | 1,927 | 8 |
| MNISTALL | 784 | 42,000 | 28,000 | 9.8 |
| PIMATR | 8 | 120 | 80 | 33 |
% POS indicates the percentage of positive cases.
Figure 3:Comparison of different calibration methods on real world data. Row one shows histograms of the original predicted values by LR (no color discrimination for classes is used). Row two to five show calibration plots for the originally predicted probabilities of LR and the recalibrated probabilities after sigmoid fitting, isotonic regression, and smooth isotonic regression. The caption of each figure contains the discriminatory ability in terms of the area under the ROC curve (AUC) and the p-value of the HL test for the visualized probabilities.