| Literature DB >> 36011844 |
Barbara Więckowska1, Katarzyna B Kubiak1, Paulina Jóźwiak2, Wacław Moryson3, Barbara Stawińska-Witoszyńska3.
Abstract
The need to search for new measures describing the classification of a logistic regression model stems from the difficulty in searching for previously unknown factors that predict the occurrence of a disease. A classification quality assessment can be performed by testing the change in the area under the receiver operating characteristic curve (AUC). Another approach is to use the Net Reclassification Improvement (NRI), which is based on a comparison between the predicted risk, determined on the basis of the basic model, and the predicted risk that comes from the model enriched with an additional factor. In this paper, we draw attention to Cohen's Kappa coefficient, which examines the actual agreement in the correction of a random agreement. We proposed to extend this coefficient so that it may be used to detect the quality of a logistic regression model reclassification. The results provided by Kappa's reclassification were compared with the results obtained using NRI. The random variables' distribution attached to the model on the classification change, measured by NRI, Kappa, and AUC, was presented. A simulation study was conducted on the basis of a cohort containing 3971 Poles obtained during the implementation of a lower limb atherosclerosis prevention program.Entities:
Keywords: AUC; Cohen’s Kappa coefficient; NRI net reclassification ratio; logistic regression model; reclassification
Mesh:
Substances:
Year: 2022 PMID: 36011844 PMCID: PMC9407914 DOI: 10.3390/ijerph191610213
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Observed reclassification to a group of people who were diagnosed with a disease and a group of disease-free individuals, and the numbers expected for random reclassification.
|
| ||||
| disease-free | diseased | total | ||
| reclassification | down |
|
|
|
| no changes |
|
|
| |
| up |
|
|
| |
| total |
|
|
| |
|
| ||||
| reclassification | down |
|
| |
| no changes |
|
| ||
| up |
|
| ||
* expected frequencies were calculated in a standard way, i.e., by multiplying the marginal sums of rows and columns and dividing by sample size; # denotes the number of.
Location of the hidden category in the table with observed reclassification to the group of people who were diagnosed with the disease and the group of disease-free individuals, and table with expected numbers for a random reclassification.
|
| |||||
| disease-free | hidden category | diseased | total | ||
| reclassification | down |
| 0 |
|
|
| no changes |
| 0 |
|
| |
| up |
| 0 |
|
| |
| total |
| 0 |
|
| |
|
| |||||
| reclassification | down |
| 0 |
| |
| no changes |
| 0 |
| ||
| up |
| 0 |
| ||
* expected frequencies were calculated in a standard way, i.e., by multiplying the marginal sums of rows and columns and dividing by the sample size; # denotes the number of.
Observed reclassification to the group of people who were diagnosed with a disease and the group of disease-free individuals, and the numbers expected random reclassification using a 50-element sample.
|
| |||||
| disease-free | hidden category | diseased | total | ||
| reclassification | down | 0 | |||
| no changes | 0 | ||||
| up | 0 | ||||
| total | 0 | ||||
|
| |||||
| reclassification | down | 0 | |||
| no changes | 0 | ||||
| up | 0 | ||||
* expected frequencies were calculated in a standard way, i.e., by multiplying the marginal sums of rows and columns and dividing by sample size.
Figure 1Method for presenting variable x, i.e., the number of correct reclassifications reduced by the random number of correct reclassifications after adding a new variable to the logistic regression model.
Summary of results for one-dimensional logistic regression models describing CVD risk depending on BMI, place of residence, marital status, income, daily activity, education, SCORE result, and random variables with planned distributions.
| Independent Variables | Frequency (%) | OR [95%CI] | R2 # | BASIC MODEL * | |
|---|---|---|---|---|---|
|
| |||||
|
| 0.02 |
| |||
| underweight | 21 (0.5) | 0.2287 | 1.7 [0.72, 4.05] | ||
| standard | 931 (23.5) | reference | |||
| overweight | 1780 (44.8) | <0.0001 | 1.52 [1.29, 1.79] | ||
| obesity | 1239 (31.2) | <0.0001 | 1.97 [1.65, 2.35] | ||
|
| 0.0003 | ||||
| rural area | 1496 (37.7) | 0.3207 | 0.94 [0.82, 1.07] | ||
| urban area | 2475 (62.3) | reference | |||
|
| 0.0004 | ||||
| single | 1143 (28.8) | 0.1202 | 0.9 [0.78, 1.03] | ||
| in a relationship | 2828 (71.2) | reference | |||
|
| 0.007 |
| |||
| low | 1034 (26.0) | 0.0007 | 0.77 [0.67, 0.9] | ||
| average | 2226 (56.1) | reference | |||
| high | 711 (17.9) | 0.0001 | 0.7 [0.59, 0.83] | ||
|
| 0.003 |
| |||
| passive | 1335 (33.6) | 0.0035 | 1.29 [1.09, 1.53] | ||
| mixed | 1793 (43.8) | 0.6809 | 0.97 [0.82, 1.14] | ||
| active | 897 (22.6) | reference | |||
|
| |||||
|
| |||||
|
| 0.04 | ||||
| basic | 830 (20.9) | reference | |||
| professional | 1065 (26.8) | <0.0001 | 0.63 [0.52, 0.76] | ||
| medium | 1408 (35.5) | <0.0001 | 0.45 [0.38, 0.53] | ||
| higher | 668 (16.8) | <0.0001 | 0.40 [0.33, 0.49] | ||
|
| 0.39 | ||||
| high | 2573 (64.8) | <0.0001 | 22.66 [18.27, 28.12] | ||
| low | 1398 (35.2) | reference | |||
|
| assumed parameters | ||||
|
| interval: [0, 100] | 0.3049 | 1.00 [1.00, 1.00] | 0.0003 | |
|
| mean (sd) = 0 (1) | 0.4043 | 1.03 [0.96, 1.09] | 0.0003 | |
|
| λ = 4 | 0.0443 | 1.03 [1.00, 1.07] | 0.001 | |
|
| λ = 1 | 0.5114 | 1.02 [0.96, 1.09] | 0.0001 | |
|
| 0.7362 | 0.96 [0.78, 1.19] | 0.00003 | ||
|
| 0.6574 | 0.97 [0.86, 1.10] | 0.00009 | ||
OR [95%CI]—Odds Ratio with 95% Confidence Interval. p-value of the Wald test. * variables remaining in basic model based on forward stepwise regression. # R2 (Nagelkerke)—the measure of the model fit quality.
Results of logistic regression models describing CVD risk: basic model (variables in the model: BMI, income, and daily activity) and models expanded by education, SCORE, and random variables with uniform, normal, Poisson, exponential, binomial (p = 0.1), and binomial (p = 0.5) distributions.
| Model | Wald Test | Likelihood Ratio Test | AUC [95%CI] | AUC Change after Adding Marker |
|---|---|---|---|---|
| basic | 0.59 [0.57, 0.60] | |||
| basic + education | ( | <0.0001 | 0.63 [0.61, 0.65] | <0.0001 |
| basic + SCORE | <0.0001 | <0.0001 | 0.79 [0.78, 0.81] | <0.0001 |
| basic + uniform | 0.2532 | 0.2532 | 0.59 [0.57, 0.61] | 0.3989 |
| basic + normal | 0.5251 | 0.5251 | 0.59 [0.57, 0.60] | 0.7074 |
| basic + Poisson | 0.0550 | 0.0549 | 0.59 [0.57, 0.61] | 0.3206 |
| basic + exponential | 0.4761 | 0.4764 | 0.59 [0.57, 0.61] | 0.4795 |
| basic + binomial ( | 0.7848 | 0.7847 | 0.59 [0.57, 0.60] | 0.4742 |
| basic + binomial ( | 0.8866 | 0.8866 | 0.59 [0.57, 0.60] | 0.6523 |
Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a continuous change of CVD risk between the base and extended logistic regression models.
| Model | κ | NRI | |||
|---|---|---|---|---|---|
| basic + education | 311 | <0.0001 | 0.16 | <0.0001 | 0.32 |
| basic + SCORE | 1035 | <0.0001 | 0.50 | <0.0001 | 1.06 |
| basic + uniform | 30 | 0.3470 | 0.01 | 0.3470 | 0.03 |
| basic + normal | 17 | 0.6068 | 0.01 | 0.6068 | 0.02 |
| basic + Poisson | 54 | 0.0876 | 0.03 | 0.0874 | 0.05 |
| basic + exponential | −22 | 0.4733 | −0.01 | 0.4736 | 0.02 |
| basic + binomial ( | 0 | 0.6690 | 0.00 | 0.6684 | 0.01 |
| basic + binomial ( | 14 | 0.6574 | 0.01 | 0.6574 | 0.01 |
x–number: the number of correct reclassifications reduced by the random number of correct reclassifications. κ [95%CI]: Kappa coefficient of agreement with 95% Confidence Interval. * testing the significance of the Kappa coefficient. NRI [95%CI]: Net Reclassification Improvement with 95% Confidence Interval. # testing the significance of the NRI.
Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a change of CVD risk (between the base and extended logistic regression models) exceeding 1%.
| Model | Unit-κ | Unit-NRI | |||
|---|---|---|---|---|---|
| basic + education | 310 | <0.0001 | 0.15 | <0.0001 | 0.31 |
| basic + SCORE | 1035 | <0.0001 | 0.50 | <0.0001 | 1.05 |
| basic + uniform | 24 | 0.1965 | 0.007 | 0.1984 | 0.02 |
| basic + normal | −6 | 0.3595 | −0.002 | 0.3599 | −0.006 |
| basic + Poisson | 30 | 0.1588 | 0.010 | 0.1590 | 0.030 |
| basic + exponential | 8 | 0.2918 | 0.002 | 0.2950 | 0.008 |
| basic + binomial ( | 0 | 1.0000 | 0.000 | NA | 0.000 |
| basic + binomial ( | 0 | 1.0000 | 0.000 | NA | 0.000 |
x–number: the number of correct reclassifications reduced by the random number of correct reclassifications, determined for the unit probability change. unit-κ [95%CI]: Kappa coefficient of agreement with 95% Confidence Interval for unit probability change. * testing the significance of the Kappa coefficient. unit-NRI [95%CI]: Net Reclassification Improvement with 95% Confidence Interval for unit probability change. # testing the significance of the NRI.
Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a categorial risk of the CVD (between the base and extended logistic regression models). The two risk categories were built based on a cut-off value p = 0.4444.
| Model | κ ( | NRI ( | |||
|---|---|---|---|---|---|
| basic + education | 52 | 0.0012 | 0.01 | <0.0001 | 0.06 |
| basic + SCORE | 397 | <0.0001 | 0.13 | <0.0001 | 0.40 |
| basic + uniform | −6 | 0.3936 | −0.002 | 0.3934 | −0.007 |
| basic + normal | 2 | 0.3332 | 0.001 | 0.2749 | 0.004 |
| basic + Poisson | 9 | 0.2882 | 0.002 | 0.2879 | 0.009 |
| basic + exponential | 4 | 0.1987 | 0.001 | 0.1999 | 0.004 |
| basic + binomial ( | 0 | 0.9092 | 0.000 | 0.9998 | 0.000 |
| basic + binomial ( | 0 | 1.0000 | 0.000 | NA | 0.000 |
x–number: the number of correct reclassifications reduced by the random number of correct reclassifications, determined for the probability cut-off. unit-κ [95%CI]: Kappa coefficient of agreement with 95% Confidence Interval for the probability cut-off. * testing the significance of the Kappa coefficient. unit-NRI [95%CI]: Net Reclassification Improvement with 95% Confidence Interval for the probability cut-off. # testing the significance of the NRI. NA—not available.