| Literature DB >> 26406594 |
Katya L Masconi1, Tandi E Matsha2, Rajiv T Erasmus3, Andre P Kengne4.
Abstract
BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.Entities:
Mesh:
Year: 2015 PMID: 26406594 PMCID: PMC4583496 DOI: 10.1371/journal.pone.0139210
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of the performance of the undiagnosed diabetes risk prediction models across the five imputation methods before (original) and after intercept adjustment (adjusted).
| Models | Performance measure | Deletion | Simple | Conditional | Stochastic | Multiple | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Original | Adjusted | Original | Adjusted | Original | Adjusted | Original | Adjusted | Original | Adjusted | ||
|
| |||||||||||
| E/O (95% CI) | 1.81 (1.09; 2.52) | 1.22 (0.61; 1.83) | 2.07 (1.40; 2.75) | 1.28 (0.69–1.87) | 2.01 (1.28; 2.75) | 1.27 (0.64–1.90) | 2.17 (1.41; 2.93) | 1.27 (0.64–1.90) | 2.16 (1.40; 2.92) | 1.30 (0.66–1.94) | |
| Brier score | 0.193 | 0.181 | 0.185 | 0.186 | 0.189 | ||||||
| Yates slope | 0.379 | -1.401 | -1.374 | -1.399 | -1.441 | ||||||
| C-statistic (95% CI) | 0.67 (0.62–0.72) | 0.69 (0.65–0.73) | 0.68 (0.63–0.72) | 0.68 (0.64–0.73) | 0.68 (0.64–0.72) | ||||||
|
| |||||||||||
| E/O (95% CI) | 0.72 (0.40; 1.12) | 0.94 (0.47–1.41) | 0.79 (0.44; 1.14) | 0.96(0.51–1.41) | 0.79 (0.39; 1.20) | 0.96 (0.45–1.47) | 0.82 (0.44; 1.20) | 0.96 (0.45–1.47) | 0.82 (0.42; 1.22) | 0.96 (0.55; 1.37) | |
| Brier score | 0.141 | 0.122 | 0.126 | 0.125 | 0.123 | ||||||
| Yates slope | 0.496 | -0.459 | -0.514 | -0.473 | -0.534 | ||||||
| C-statistic (95% CI) | 0.68 (0.63–0.73) | 0.70 (0.66–0.74) | 0.69 (0.65–0.73) | 0.69 (0.65–0.74) | 0.69 (0.65–0.73) | ||||||
|
| |||||||||||
| E/O (95% CI) | 1.28 (0.63; 1.93) | 1.06 (0.47; 1.66) | 1.40 (0.82; 1.98) | 1.08 (0.56; 1.60) | 1.40 (0.75; 2.05) | 1.08 (0.50; 1.66) | 1.56 (0.81; 2.30) | 1.08 (0.50; 1.66) | 1.54 (0.77; 2.31) | 1.11 (0.51; 1.71) | |
| Brier score | 0.164 | 0.141 | 0.149 | 0.142 | 0.153 | ||||||
| Yates slope | 0.392 | -1.065 | -1.104 | -1.049 | -1.196 | ||||||
| C-statistic (95% CI) | 0.66 (0.61–0.70) | 0.67 (0.63–0.71) | 0.65 (0.61–0.70) | 0.67 (0.63–0.72) | 0.65 (0.61–0.69) | ||||||
|
| |||||||||||
| E/O (95% CI) | 0.54 (0.50; 1.04) | 0.98 (0.91–1.05) | 0.65 (0.56; 0.74) | 0.99 (0.83–1.14) | 0.59 (0.48; 0.71) | 0.99 (0.93–1.04) | 0.65 (0.57; 0.74) | 0.99 (0.93–1.04) | 0.65 (0.57; 0.73) | 0.99 (0.87–1.11) | |
| Brier score | 0.147 | 0.126 | 0.130 | 0.129 | 0.127 | ||||||
| Yates slope | 0.971 | 0.558 | 0.539 | 0.535 | 0.498 | ||||||
| C-statistic (95% CI) | 0.64 (0.59–0.69) | 0.65 (0.61–0.70) | 0.65 (0.60–0.69) | 0.65 (0.60–0.70) | 0.65 (0.61–0.70) | ||||||
|
| |||||||||||
| E/O (95% CI) | 0.26 (0.13; 0.39) | 0.89 (0.51; 1.26) | 0.34 (0.17; 0.52) | 0.92 (0.53; 1.31) | 0.34 (0.18; 0.50) | 0.92 (0.56; 1.28) | 0.35 (0.17; 0.52) | 0.92 (0.56; 1.28) | 0.35 (0.17; 0.53) | 0.92 (0.53;– 1.32) | |
| Brier score | 0.157 | 0.133 | 0.136 | 0.136 | 0.133 | ||||||
| Yates slope | 0.491 | -0.021 | 0.080 | -0.053 | -0.045 | ||||||
| C-statistic (95% CI) | 0.67 (0.62–0.71) | 0.66 (0.62–0.70) | 0.67 (0.63–0.72) | 0.66 (0.62–0.70) | 0.66 (0.62–0.70) | ||||||
Missingness analysis.
| Variable | % |
|---|---|
| Outcome (prevalent diabetes) | 0.7 |
| Age | 1.4 |
| Gender | 1.6 |
| Body mass index | 3.4 |
| Waist circumference | 2.1 |
| Systolic blood pressure | 1.9 |
| Diastolic blood pressure | 1.9 |
| Mother family history | 25.1 |
| Father family history | 24.9 |
| Sister family history | 25.0 |
| Brother family history | 25.1 |
| Corticosteroid use | 4.3 |
| Hypertensive drugs | 2.5 |
| Smoking status | 6.1 |
Fig 1Histogram showing the proportion of missing for each variable.
*BMI, Body Mass Index; WC, Waist Circumference; SBP, Systolic Blood Pressure; DBP, Diastolic Blood Pressure; FH, Family History; Cort, Corticosteroids; med, medication; Hpt, Hypertensive.
Fig 2Aggregation plot showing all combinations of missing (red) and non-missing (blue) values in the variables, from the highest to lowest frequency.
*BMI, Body Mass Index; WC, Waist Circumference; SBP, Systolic Blood Pressure; DBP, Diastolic Blood Pressure; FH, Family History; Cort, Corticosteroids; med, medication; Hpt, Hypertensive.
Characteristics comparison of participants for the original database and five imputation methods.
| Imputation methods | ||||||
|---|---|---|---|---|---|---|
| Original | Pairwise deletion (754) | Simple (1083) | Conditional (Varied) | Stochastic (1083) | Multiple (1083) | |
| Prevalent undiagnosed diabetes (Yes/No) | 162/913 | 132/622 | 162/921 | 162/916 | 163/920 | 162/921 |
| Age (years) | 51.9 (15.0) | 52.5 (14.6) | 51.9 (14.9) | 51.9 (15.0) | 51.8 (15.0) | 51.8 (15.1) |
| Body mass index (kg/m2) | 29.7 (7.2) | 29.6 (7.1) | 29.7 (7.0) | 29.7 (7.1) | 29.8 (7.2) | 29.8 (7.2) |
| Gender (Male/Female) | 249/810 | 160/594 | 251/832 | 251/826 | 254/829 | 257/826 |
| Systolic blood pressure (mmHg) | 124.3 (20.2) | 122.0 (18.7) | 124.3 (20.0) | 124.3 (20.2) | 124.3 (20.2) | 124.4 (20.4) |
| Diastolic blood pressure (mmHg) | 76.0 (12.9) | 74.7 (12.0) | 76.0 (12.7) | 76.0 (12.8) | 76.0 (12.9) | 76.1 (14.1) |
| Waist circumference (cm) | 95.8 (15.5) | 95.9 (14.9) | 95.8 (15.3) | 95.8 (15.4) | 95.8 (15.5) | 95.7 (16.9) |
| Hypertensive medication (Yes/No) | 374/682 | 262/492 | 374/709 | 383/688 | 387/696 | 382/701 |
| Using corticosteroids (Yes/No) | 12/1025 | 5/749 | 12/1071 | 12/1050 | 12/1071 | 13/1070 |
| Mother having diabetes (Yes/No) | 124/687 | 114/640 | 124/959 | 124/880 | 182/901 | 165/198 |
| Father having diabetes (Yes/No) | 61/752 | 60/694 | 61/1022 | 61/944 | 73/1010 | 78/1005 |
| Sister having diabetes (Yes/No) | 103/709 | 98/656 | 103/980 | 107/897 | 143/940 | 128/955 |
| Brother having diabetes (Yes/No) | 67/744 | 64/690 | 67/1016 | 67/936 | 79/1004 | 87/996 |
| Smoking status (Current/Past/No) | 433/105/479 | 327/89/338 | 433/105/545 | 437/105/496 | 456/114/513 | 458/113/512 |
Characteristics comparison of participants for five multiple imputation datasets.
| Multiple imputation datasets | |||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| Prevalent undiagnosed diabetes (Yes/No) | 162/921 | 163/920 | 162/921 | 162/921 | 163/920 |
| Age (years) | 51.9 (15.1) | 51.9 (15.0) | 51.8 (15.0) | 51.9 (15.1) | 51.8 (15.0) |
| Body mass index (kg/m2) | 29.8 (7.2) | 29.8 (7.2) | 29.8 (7.2) | 29.8 (7.2) | 29.7 (7.2) |
| Gender (Male/Female) | 258/825 | 257/826 | 257/826 | 256/827 | 258/825 |
| Systolic blood pressure (mmHg) | 124.5 (20.4) | 124.5 (20.3) | 124.4 (20.3) | 124.5 (20.4) | 124.3 (20.3) |
| Diastolic blood pressure (mmHg) | 76.1 (12.8) | 76.1 (12.8) | 76.2 (13.3) | 76.1 (12.9) | 76.1 (12.9) |
| Waist circumference (cm) | 95.8 (15.9) | 95.7 (15.5) | 95.8 (15.5) | 95.8 (15.4) | 95.7 (15.5) |
| Hypertensive medication (Yes/No) | 383/700 | 378/705 | 381/702 | 382/701 | 384/699 |
| Using corticosteroids (Yes/No) | 13/1070 | 12/1071 | 12/1071 | 13/1070 | 13/1070 |
| Mother having diabetes (Yes/No) | 157/926 | 168/915 | 155/928 | 179/904 | 164/919 |
| Father having diabetes (Yes/No) | 71/1012 | 72/1011 | 83/1000 | 78/1005 | 84/999 |
| Sister having diabetes (Yes/No) | 132/951 | 130/953 | 121/962 | 125/958 | 134/949 |
| Brother having diabetes (Yes/No) | 87/996 | 88/995 | 83/1000 | 88/997 | 88/995 |
| Smoking status (Current/Ex/No) | 464/110/509 | 455/118/510 | 459/115/509 | 452/113/518 | 460/111/512 |
Overview of the performance of the undiagnosed diabetes risk prediction models across the five multiple imputation datasets.
| Multiple imputation datasets | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|
| Cambridge | E/O (95% CI) | 2.17 (1.35–2.99) | 2.13 (1.40–2.87) | 2.15 (1.49–2.81) | 2.18 (1.34–3.01) | 2.16 (1.46–3.86) |
| Diabetes Risk model | Brier score | 0.190 | 0.188 | 0.186 | 0.190 | 0.190 |
| Yates slope | -1.451 | -1.435 | -1.433 | -1.454 | -1.434 | |
| C-statistic (95% CI) | 0.68 (0.64–0.72) | 0.68 (0.64–0.72) | 0.69 (0.65–0.73) | 0.68 (0.64–0.73) | 0.69 (0.64–0.73) | |
| Kuwaiti Risk model | E/O (95% CI) | 0.83 (0.42–1.24) | 0.82 (0.40–1.23) | 0.82 (0.45–1.19) | 0.83 (0.41–1.24) | 0.82 (0.44–1.19) |
| Brier score | 0.124 | 0.124 | 0.122 | 0.123 | 0.123 | |
| Yates slope | -0.563 | -0.558 | -0.496 | -0.542 | -0.509 | |
| C-statistic (95% CI) | 0.69 (0.65–0.73) | 0.69 (0.64–0.73) | 0.70 (0.66–0.74) | 0.69 (0.65–0.73) | 0.69 (0.65–0.74) | |
| Omani Diabetes | E/O (95% CI) | 1.55 (0.76–2.33) | 1.54 (0.72–2.37) | 1.52 (0.87–2.17) | 1.57 (0.78–2.37) | 1.54 (0.80–2.29) |
| Risk model | Brier score | 0.154 | 0.156 | 0.149 | 0.155 | 0.153 |
| Yates slope | -1.211 | -1.232 | -1.151 | -1.214 | -1.174 | |
| C-statistic (95% CI) | 0.65 (0.61–0.69) | 0.64 (0.60–0.68) | 0.66 (0.62–0.70) | 0.65 (0.61–0.70) | 0.66 (0.61–0.70) | |
| Rotterdam | E/O (95% CI) | 0.66 (0.57–0.75) | 0.65 (0.58–0.72) | 0.65 (0.57–0.74) | 0.66 (0.57–0.75) | 0.65 (0.57–0.74) |
| Predictive model | Brier score | 0.126 | 0.127 | 0.126 | 0.127 | 0.127 |
| Yates slope | 0.486 | 0.539 | 0.526 | 0.479 | 0.461 | |
| C-statistic (95% CI) | 0.65 (0.60–0.69) | 0.65 (0.61–0.70) | 0.65 (0.60–0.70) | 0.65 (0.60–0.69) | 0.65 (0.60–0.69) | |
| Simplified Finnish | E/O (95% CI) | 0.35 (0.17–0.52) | 0.34 (0.16–0.52) | 0.35 (0.17–0.53) | 0.35 (0.17–0.52) | 0.34 (0.16–0.52) |
| Diabetes Risk model | Brier score | 0.133 | 0.134 | 0.133 | 0.133 | 0.134 |
| Yates slope | -0.032 | -0.068 | -0.048 | -0.026 | -0.050 | |
| C-statistic (95% CI) | 0.66 (0.62–0.71) | 0.66 (0.62–0.70) | 0.66 (0.62–0.70) | 0.66 (0.62–0.71) | 0.66 (0.62–0.70) | |