| Literature DB >> 26560153 |
Gregor Stiglic1,2, Majda Pajnkihar1.
Abstract
Classical paper-and-pencil based risk assessment questionnaires are often accompanied by the online versions of the questionnaire to reach a wider population. This study focuses on the loss, especially in risk estimation performance, that can be inflicted by direct transformation from the paper to online versions of risk estimation calculators by ignoring the possibilities of more complex and accurate calculations that can be performed using the online calculators. We empirically compare the risk estimation performance between four major diabetes risk calculators and two, more advanced, predictive models. National Health and Nutrition Examination Survey (NHANES) data from 1999-2012 was used to evaluate the performance of detecting diabetes and pre-diabetes. American Diabetes Association risk test achieved the best predictive performance in category of classical paper-and-pencil based tests with an Area Under the ROC Curve (AUC) of 0.699 for undiagnosed diabetes (0.662 for pre-diabetes) and 47% (47% for pre-diabetes) persons selected for screening. Our results demonstrate a significant difference in performance with additional benefits for a lower number of persons selected for screening when statistical methods are used. The best AUC overall was obtained in diabetes risk prediction using logistic regression with AUC of 0.775 (0.734) and an average 34% (48%) persons selected for screening. However, generalized boosted regression models might be a better option from the economical point of view as the number of selected persons for screening of 30% (47%) lies significantly lower for diabetes risk assessment in comparison to logistic regression (p < 0.001), with a significantly higher AUC (p < 0.001) of 0.774 (0.740) for the pre-diabetes group. Our results demonstrate a serious lack of predictive performance in four major online diabetes risk calculators. Therefore, one should take great care and consider optimizing the online versions of questionnaires that were primarily developed as classical paper questionnaires.Entities:
Mesh:
Year: 2015 PMID: 26560153 PMCID: PMC4641713 DOI: 10.1371/journal.pone.0142827
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Presence of questions in four compared diabetes online risk calculators with corresponding score intervals used in this study.
Variables in bold represent a set of common variables that are used in the majority of compared online calculators.
| Question | ADA | eCANRISK | LRA | AUSDRISK |
|---|---|---|---|---|
|
| + | + | + | + |
|
| + | + | + | + |
|
| + | + | + | + |
|
| + | + | + | |
| Blood pressure medication history | + | + | ||
| Taking blood pressure medication | + | |||
|
| + | + | + | |
|
| + | + | + | |
| Gestational diabetes | + | + | ||
|
| + | + | + | |
| Eats vegetables and fruit | + | + | ||
| High blood sugar history | + | + | ||
| Ethnic group/Country of birth | + | + | +1 | |
| Level of education | + | |||
| Smoking | + | |||
| Score interval | 0–10 | 0–87 | 0–46 | 0–27 |
1Not used in the experiments due to limited availability in NHANES 1999–2012.
2Used to remove individuals with already diagnosed diabetes.
Summary of questionnaires used in this study.
Comparison is provided for maximal number of points a user can score, cutpoint value for high risk and number of questions in the questionnaire. Additionally, datasets used for development and validation of the model are provided along with the corresponding reported predictive performance values.
| Questionnaire | Max points | Cutpoint | Num. questions | Datasets used | Reported predictive performance (AUC) |
|---|---|---|---|---|---|
|
| 11 | ≥ 5 | 7 | Development: NHANES 1999–2004 Validation: NHANES 2005–2006, ARIC and CHS | 0.83 (NHANES) 0.72–0.74 (ARIC/CHS) |
|
| 86 | ≥ 21 MR ≥ 33 HR | 13 | CANRISK study (6223 participants) | 0.75 |
|
| 47 | ≥ 14 | 8 | LRA study (6390 participants) | 0.72 |
|
| 35 | ≥ 12 | AusDiab 1999–2005 (6060 participants) Validation: BMES (1993 participants), NWAHS (1465 participants) | 0.78 |
MR–moderate risk, HR–high risk
Overview of variables and their mappings to variables and questions from seven waves of 1999–2012 NHANES data.
| Question / Variable | NHANES Variable | NHANES Questions with additional notes |
|---|---|---|
| Age | RIDAGEYR | Age in years (capped at 85) |
| Sex | RIAGENDR | Gender (Male, Female) |
| Diabetes in family | MCQ260Ax (Waves 1–3) | Which biological family member? (Mother, Father, Grandmother, Grandfather, Brother, Sister, Other family member) |
| MCQ300C(Waves 4–7) | Including living and deceased, were any of your close biological (blood) relatives including father, mother, sisters or brothers, ever told by a health professional that they had diabetes? | |
| High blood pressure history | BPQ020 | Have you ever been told by a doctor or other health professional that you had hypertension, also called high blood pressure? |
| Blood pressure medication history | BPQ040A | Because of your (high blood pressure/hypertension), have you ever been told to take prescribed medicine? |
| Taking blood pressure medication | BPQ050A | Are you now taking prescribed medicine? |
| Physical activity | PAD200 (Waves 1–4) | Over the past 30 days, did you do any vigorous activities for at least 10 minutes that caused heavy sweating, or large increases in breathing or heart rate? Some examples are running, lap swimming, aerobics classes or fast bicycling. |
| PAQ650 (Waves 5–7) | Do you do any vigorous-intensity sports, fitness, or recreational activities that cause large increases in breathing or heart rate, like running or basketball for at least 10 minutes continuously? | |
| Obesity / BMI | BMXBMI | Body Mass Index (kg/m2) |
| Gestational diabetes | RHQ162 (Waves 6–7) | During any pregnancy, were you ever told by a doctor or other health professional that you had diabetes, sugar diabetes or gestational diabetes? Please do not include diabetes that you may have known about before the pregnancy. Note: only available in NHANES 2009–2010 and 2011–2012! |
| Waist measurement | BMXWAIST | Waist circumference in cm. |
| Eat vegetables and fruit | FFQ0016-19 FFQ0022 FFQ0032-34 FFQ0036 FFQ0039 (Waves 3–4) | How often did you eat fruit {apples, pears, bananas, pineapples, grapes}? AND How often did you eat vegetables {carrots, string beans, peas, broccoli, onions}? Note: only available in waves 3 and 4 of NHANES data, therefore we did not include this variable in the final set of variables for this study. |
| Ethnic group / Country of birth | RIDRETH1 | Race/Ethnicity (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black and Other Race) |
| Level of education | DMDEDUC2 | What is the highest grade or level of school you have completed? (Less than 9th Grade, 9–11th Grade, High School Grad/GED or Equivalent, Some College or AA degree, College Graduate or above) |
| Smoking | SMQ040 | Do you now smoke cigarettes? (Every day, Some days, Not at all) Note: everything above “Not at all” was treated as a positive answer. |
1Not used in the experiments due to limited availability in NHANES 1999–2012
Fig 1Comparison of AUC for six risk estimation approaches using all available variables.
Mean Specificity, Sensitivity, PPV, NPV and Percentage of selected persons (PSP) for 1000 holdout evaluations with corresponding 95% confidence intervals for all available variables.
| Model | Sensitivity | Specificity | PPV | NPV | Percentage Selected |
|---|---|---|---|---|---|
| Pre-diabetes | |||||
| ADA | .667 [.655, .679] | .657 [.647, .667] | .567 [.559, .575] | .745 [.738, .753] | .474 [.466, .481] |
| eCANRISK | .660 [.595, .756] | .630 [.530, .691] | .547 [.519, .568] | .734 [.715, .764] | .487 [.424, .585] |
| LRA | .690 [.582, .781] | .554 [.462, .655] | .512 [.492, .533] | .728 [.698, .762] | .544 [.440, .635] |
| AUSDRISK |
| .590 [.558, .693] | .539 [.528, .569] | .752 [.717, .768] | .531 [.423, .562] |
| GLM | .687 [.648, .719] | .661 [.633, .693] | .578 [.566, .591] | .758 [.744, .772] | .479 [.445, .508] |
| GBM | .685 [.653, .717] |
|
|
|
|
| Type 2 Diabetes | |||||
| ADA |
| .546 [.536, .553] | .079 [.076, .082] |
| .471 [.464, .481] |
| eCANRISK | .756 [.639, .819] | .596 [.548, .682] | .079 [.074, .085] | .982 [.977, .986] | .420 [.333, .467] |
| LRA | .769 [.684, .829] | .559 [.530, .619] | .074 [.070, .079] | .982 [.977, .986] | .455 [.395, .484] |
| AUSDRISK | .748 [.642, .874] | .631 [.518, .715] | .086 [.074, .098] | .982 [.977, .989] | .386 [.302, .498] |
| GLM | .727 [.639, .803] | .677 [.635, .727] | .093 [.088, .099] | .982 [.978, .986] | .340 [.289, .383] |
| GBM | .668 [.584, .745] |
|
| .979 [.976, .984] |
|
Mean Specificity, Sensitivity, PPV, NPV and Percentage of selected persons (PSP) for 1000 holdout evaluations with corresponding 95% confidence intervals for a set of common variables.
| Model | Sensitivity | Specificity | PPV | NPV | Percentage Selected |
|---|---|---|---|---|---|
| Pre-diabetes | |||||
| ADA | .667 [.655, .679] | .657 [.647, .667] | .567 [.559, .575] | .745 [.738, .753] | .474 [.466, .481] |
| eCANRISK |
| .614 [.571, .671] | .547 [.532, .566] | .747 [.728, .766] | .509 [.452, .550] |
| LRA | .640 [.625, .681] | .630 [.576, .643] | .539 [.521, .548] | .722 [.714, .731] | .479 [.469, .529] |
| AUSDRISK | .661 [.549, .750] | .636 [.544, .742] | .552 [.524, .592] | .737 [.708, .764] | .484 [.377, .574] |
| GLM |
| .657 [.631, .683] | .576 [.565, .587] |
| .483 [.454, .509] |
| GBM | .673 [.639, .706] |
|
| .754 [.741, .766] |
|
| Type 2 Diabetes | |||||
| ADA |
| .546 [.536, .553] | .079 [.076, .082] |
| .471 [.464, .481] |
| eCANRISK | .710 [.561, .871] | .613 [.454, .737] | .079 [.067, .094] | .979 [.973, .988] | .401 [.277, .561] |
| LRA | .700 [.529, .816] | .612 [.483, .765] | .077 [.067, .092] | .978 [.972, .984] | .402 [.247, .530] |
| AUSDRISK | .760 [.645, .865] | .630 [.509, .721] | .086 [.074, .094] | .983 [.978, .988] | .387 [.295, .507] |
| GLM | .736 [.648, .806] | .679 [.637, .722] | .095 [.089, .101] | .983 [.978, .987] | .339 [.296, .381] |
| GBM | .666 [.587, .739] |
|
| .979 [.975, .983] |
|
Fig 2Comparison of AUC for six risk estimation approaches using a set of common variables.
Fig 3Agreement of predictive models by BMI for two classification problems.
Fig 4Agreement of predictive models by age for two classification problems.