| Literature DB >> 35330368 |
Simon Kocbek1, Primož Kocbek2, Lucija Gosak2, Nino Fijačko2, Gregor Štiglic1,2,3.
Abstract
Type 2 diabetes mellitus (T2DM) often results in high morbidity and mortality. In addition, T2DM presents a substantial financial burden for individuals and their families, health systems, and societies. According to studies and reports, globally, the incidence and prevalence of T2DM are increasing rapidly. Several models have been built to predict T2DM onset in the future or detect undiagnosed T2DM in patients. Additional to the performance of such models, their interpretability is crucial for health experts, especially in personalized clinical prediction models. Data collected over 42 months from health check-up examinations and prescribed drugs data repositories of four primary healthcare providers were used in this study. We propose a framework consisting of LogicRegression based feature extraction and Least Absolute Shrinkage and Selection operator based prediction modeling for undiagnosed T2DM prediction. Performance of the models was measured using Area under the ROC curve (AUC) with corresponding confidence intervals. Results show that using LogicRegression based feature extraction resulted in simpler models, which are easier for healthcare experts to interpret, especially in cases with many binary features. Models developed using the proposed framework resulted in an AUC of 0.818 (95% Confidence Interval (CI): 0.812-0.823) that was comparable to more complex models (i.e., models with a larger number of features), where all features were included in prediction model development with the AUC of 0.816 (95% CI: 0.810-0.822). However, the difference in the number of used features was significant. This study proposes a framework for building interpretable models in healthcare that can contribute to higher trust in prediction models from healthcare experts.Entities:
Keywords: LogicRegression; diabetes mellitus type 2; interpretability; prediction model
Year: 2022 PMID: 35330368 PMCID: PMC8950921 DOI: 10.3390/jpm12030368
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Summary table basic predictive and target features for healthcare centers.
| Original Feature Name | Description | FPGL ≤ 6.1 mmol/L | FPGL > 6.1 mmol/L |
|---|---|---|---|
| Age [mean (standard deviation − SD)] | Age in years | 56.07 (SD = 13.2) | 61.77 (SD = 10.98) |
| Gender_M [%( | Percentage of males | 37.16 ( | 54.47 ( |
| BMI [mean (SD)] | Body mass index | 28.89 (SD = 5.39) | 32.16 (SD = 13.21) |
| WC [mean (SD)] | Waist circumference in cm | 96.25 (SD = 13.89) | 103.48 (SD = 13.8) |
| Active_30_min (Q2) [%( | Active at least 30 minutes a day? | 64.88 ( | 52.27 ( |
| Medication (Q3) | Blood pressure medication? | 40.19 ( | 60.18 ( |
| High_BS [%( | Ever measured high blood sugar? | 7.32 ( | 47.47 ( |
| Grocer [%( | Eat vegetable/fruit daily? | 90.59 ( | 78.99 ( |
| Diab_fam [%( | Diabetes in family? | 69.65 ( | 61.74 ( |
| FPGL [mean (SD)] | Fasting plasma glucose level | 5.26 (SD = 0.44) | 6.74 (SD = 0.8) |
List of original features with description and their possible values. Please note that Nominal features were processed in such a way that for each possible value a new feature was generated. For example, the feature Q43 resulted in three features for each possible value (new features were named Q431, Q432 and Q433). Drug features are marked with the Anatomical Therapeutic Chemical (ATC) classification. The final set contained 170 features.
| Name | Description | Value |
|---|---|---|
| Age | Age of the patient | Numeric |
| Gender | Gender of the patient | Male, Female |
| BMI | Body Mass Index of the patient | Numeric |
| Blood_pressure | Blood pressure of the patient | Numeric |
| WC | Waist circumference of the patient | Numeric |
| Heart_beat | Heart beat of the patient | Numeric |
| Body_weight | Body weight of the patient | Numeric |
| Body_height | Body height of the patient | Numeric |
| Smoking_status | Smoking status of the patient | Non-smoker, Smoker, Ex-smoker, Passive smoker |
| Eating_habits | Assessment of eating habits | Adequate, Satisfactory, Inadequate |
| Drinking_status | Drinking status | Abstinent, Less risky drinking, Risky, Harmful, Addictive |
| SDH | Social determinants of health | Not threatened, Medium threatened, Threatened |
| PAS | Physical activity status | Sufficient, Borderline, Insufficient |
| Stress | Level of stress | Not threatened, Threatened |
| RD | Risk of depression | No significant risk of depression, Risk of depression |
| Q18 | How often do you usually eat vegetables? | Never Points, 4-6 times a week, 1x a day, More than 1x a day |
| Q16 | How many meals do you eat on average per day? | 2 or less, 3 to 5, 6 or more |
| Q2 | Are you physically active for at least 30 min/day? | Yes, No |
| Q3 | Do you take medication to lower your blood pressure? | Yes, No |
| Q30 | Do you have a habit of salting dishes at the table? | Yes, No |
| Q32 | On average, which type of fat do you use most in food preparation or as a spread? | Vegetable oils, Cream, Butter, Lard, Hard margarines, Soft margarines, High-fat spreads, Low-fat spreads, Chocolate spread, Peanut butter, Pate, Cream Spread, Mayonnaise |
| Q4 | Have you ever had your blood sugar measured? | Yes, No |
| Q43 | How many times in a typical week do you engage in vigorous physical activity for at least 25 minutes each time to the point where you are breathing and sweating? | 0 or 1 times per week, 2 times per week, 3 or more times per week |
| Q44 | How many times in a typical week do you engage in moderate physical activity for at least 30 minutes each time, to the extent that you breathe a little faster and warm up? | 0 or 1 times per week, 2 to 4 times per week, 5 or more times per week |
| Q47 | How often have you drunk drinks containing alcohol in the last 12 months? | Never, Once a month or less, 2 to 4 times a month, 2 to 3 times a week, 4 or more times a week |
| Q48 | In the last 12 months, how many measures of a drink containing alcohol did you usually have when you were drinking? | Zero to 1 measure, 2 measures, 3 or 4 measures, 5 or 6 measures, 7 or more |
| Q49 | In the last 12 months, how often have you had 6 or more sips on one occasion for men and 4 or more sips on one occasion for women? | Never, Less than once a month, 1 to 3 times a month, 1 to 3 times a week, Daily or almost daily |
| Q51 | In the last 12 months, how often have you needed an alcoholic drink in the morning to recover from excessive drinking the day before? | Never, Less than once a month, 1 to 3 times a month, 1 to 3 times a week, Daily or almost daily |
| Q57 | How often do you feel tense, stressed or under a lot of pressure? | Never, Rarely, Occasionally, Often, Every day |
| Q58 | How do you manage the tensions, stresses and pressures you experience in your life? | Easily, Able to, Able to with more efforts, Very difficult, Can’t |
| Q59 | How often in the past 2 weeks have you felt little interest and satisfaction in the things you do? | Not at all, A few days, More than half the days, Almost every day |
| Q6 | Does family have diabetes? | No, Outer family, Inner family |
| Q60 | How often have you felt depressed, depressed, despairing in the past 2 weeks? | Not at all, A few days, More than half the days, Almost every day |
| Q69 | Please indicate the last school you attended. | Primary school incomplete, Primary school, 2 or 3-year vocational school, 4-year secondary school or gymnasium, Graduate, Postgraduate |
| Q70 | What is your current employment status? | Employed, Self-employed, Unemployed, Student, Retired, Disabled pensioner, Permanently disabled, Housewife |
| Q71 | How do you get through the month based in income? | Good, Occasional problems, I have problems |
| ATC_A02BC01 | Omeprazole | Binary (0,1) |
| ATC_A02BC02 | Pantoprazole | Binary (0,1) |
| ATC_A11CC05 | Colecalciferol | Binary (0,1) |
| ATC_B01AC06 | Acetylsalicylic acid | Binary (0,1) |
| ATC_C03BA11 | Indapamide | Binary (0,1) |
| ATC_C07AB07 | Bisoprolol | Binary (0,1) |
| ATC_C09AA04 | Perindopril | Binary (0,1) |
| ATC_D01AC01 | Clotrimazole | Binary (0,1) |
| ATC_D01AE15 | Terbinafine | Binary (0,1) |
| ATC_D07AC13 | Mometasone | Binary (0,1) |
| ATC_G04BD09 | Trospium | Binary (0,1) |
| ATC_J01CA04 | Amoxicillin | Binary (0,1) |
| ATC_J01CE10 | Benzathine phenoxymethylpenicillin | Binary (0,1) |
| ATC_J01CR02 | Amoxicillin and beta-lactamase inhibitor | Binary (0,1) |
| ATC_J01EE01 | Sulfadiazine /trimethoprim | Binary (0,1) |
| ATC_J01FA10 | Azithromycin | Binary (0,1) |
| ATC_M01AB05 | Diclofenac | Binary (0,1) |
| ATC_M01AE01 | Ibuprofen | Binary (0,1) |
| ATC_M01AE02 | Naproxen | Binary (0,1) |
| ATC_N02AJ13 | Tramadol and paracetamol | Binary (0,1) |
| ATC_N02BB02 | Metamizole sodium | Binary (0,1) |
| ATC_N02BE01 | Paracetamol | Binary (0,1) |
| ATC_N05BA08 | Bromazepam | Binary (0,1) |
| ATC_N05BA12 | Alprazolam | Binary (0,1) |
| ATC_N05CF02 | Zolpidem | Binary (0,1) |
| ATC_R01AD09 | Mometasone | Binary (0,1) |
| ATC_R03AC02 | Salbutamol | Binary (0,1) |
| ATC_R03AL01 | Fenoterol and ipratropium bromide | Binary (0,1) |
| ATC_R06AE07 | Cetirizine | Binary (0,1) |
| ATC_R06AX13 | Loratadine | Binary (0,1) |
| ATC_S01AA12 | Tobramycin | Binary (0,1) |
Extracted logic features with corresponding LogicRegression rules and descriptions.
| Feature | Rule | Description |
|---|---|---|
| L1 | (ATC_J01EE01 or (not Q41)) | Prescribed sulfadiazine and trimethoprim, or never measured high blood sugar. |
| L2 | Q51 | Seldom eat fruit and vegetable. |
| L3 | ((ATC_M01AE02 and ATC_J01CE10) or (not SE)) | Prescribed naproxen and benzathine phenoxymethylpenicillin or not socially endangered. |
| L4 | Q494 | Daily consumption of alcohol in the last 12 months. |
| L5 | (MSE or ATC_D01AE15) | Medium socially endangered or prescribed antifungals for dermatological use. |
Selected features with the Least Absolute Shrinkage and Selection Operator (LASSO) on the dataset with numeric and logic features.
| Feature | Freq | Description |
|---|---|---|
| −Gender | 100 | Gender |
| +Blood_pressure | 100 | Blood pressure |
| +Heart_beat | 100 | Heart_beat |
| +Age | 100 | Age |
| +BMI | 100 | Body mass index |
| +WC | 100 | Waist circumference |
| −L1 | 100 | Logic feature 1 |
| −L2 | 100 | Logic feature 2 |
| −L3 | 100 | Logic feature 3 |
| −Body_height | 99 | Body height |
| +Body_weight | 83 | Body weight |
Selected features with LASSO on the dataset with binary, numeric, and logic features.
| Feature | Freq | Description |
|---|---|---|
| −L3 | 100 | Logic feature 3 |
| −L4 | 100 | Logic feature 4 |
| +L5 | 100 | Logic feature 5 |
| +Blood_pressure | 100 | Blood pressure |
| +WC | 100 | Waist circumference in cm |
| +Heart_beat | 100 | Heart_beat |
| +Age | 100 | Age in years |
| +Q45 | 100 | Ever measured high blood sugar? Yes |
| −Gender | 100 | Gender |
| +Q32 | 93 | Using drug(s) for lowering blood pressure |
| +Body_weight | 87 | Body weight |
| −Non_smoker | 87 | Non-smoker |
| +L2 | 79 | Logic attribute 2 |
| −Q321 | 78 | Most often used oil is vegetable oil |
| −Non_drinker | 77 | No alcohol consumption |
| −Q583 | 75 | Handle stress with hardship |
| +Q62 | 74 | Parent, brother, or sister have diabetes |
| +BMI | 69 | Body Mass Index |
| −Q161 | 63 | 2 meals per day on average |
| −Q301 | 51 | No habit of using salt at the table |
Figure 1Selected features with the Least Absolute Shrinkage and Selection Operator (LASSO) on the dataset with numeric and logic features.