| Literature DB >> 34210320 |
Hao Sen Andrew Fang1, Ngiap Chuan Tan2,3, Wei Ying Tan4, Ronald Wihal Oei4, Mong Li Lee4,5, Wynne Hsu4,5.
Abstract
BACKGROUND: Clinical risk prediction models (CRPMs) use patient characteristics to estimate the probability of having or developing a particular disease and/or outcome. While CRPMs are gaining in popularity, they have yet to be widely adopted in clinical practice. The lack of explainability and interpretability has limited their utility. Explainability is the extent of which a model's prediction process can be described. Interpretability is the degree to which a user can understand the predictions made by a model.Entities:
Keywords: Clinical decision support tool; Explainable artificial intelligence; Interpretable; Patient similarity; Prediction models
Mesh:
Year: 2021 PMID: 34210320 PMCID: PMC8247104 DOI: 10.1186/s12911-021-01566-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
List of variables (and their description) included in computing degree of similarity
| No | Variables | Description |
|---|---|---|
| Demographic | ||
| 1 | Age | Age at base visit date |
| Duration of disease (years) | ||
| 2 | Duration of diabetes | Duration of diabetes at base visit date |
| 3 | Duration of hypertension | Duration of hypertension at base visit date |
| 4 | Duration of hyperlipidemia | Duration of hyperlipidemia at base visit date |
| Biomarkers | ||
| 5 | Body mass index | Body mass index at base visit |
| 6 | HbA1ca level (%) | Hemoglobin A1c level at base visit date |
| 7 | Systolic BPb (mmHg) | Systolic blood pressure at base visit date |
| 8 | Diastolic BPb (mmHg) | Diastolic blood pressure at base visit date |
| 9 | LDLc level (mmol/L) | Low-density lipoprotein level at base visit date |
| 10 | HDLd level (mmol/L) | High-density lipoprotein level at base visit date |
| 11 | TGe level (mmol/L) | Triglyceride level at base visit date |
| Anti-diabetic medications: daily dose | ||
| 12 | Metformin | Total daily dose of each anti-diabetic medication at base visit |
| 13 | Glipizide | |
| 14 | Gliclazide | |
| 15 | Tolbutamide | |
| 16 | Acarbose | |
| 17 | Sitagliptin | |
| 18 | Linagliptin | |
| 19 | Dapagliflozin | |
| 20 | Empagliflozin | |
| 21 | Rapid-acting insulin | |
| 22 | Isophane insulin | |
| 23 | Insulin glargine | |
| 24 | Insulin detemir | |
| 25 | Pre-mixed insulin | |
| Anti-hypertensive medications: daily dose | ||
| 26 | Candesartan | Total daily dose of each anti-hypertensive medication at base visit |
| 27 | Captopril | |
| 28 | Enalapril | |
| 29 | Lisinopril | |
| 30 | Losartan | |
| 31 | Perindopril | |
| 32 | Telmisartan | |
| 33 | Valsartan | |
| 34 | Atenolol | |
| 35 | Bisoprolol | |
| 36 | Propranolol | |
| 37 | Amlodipine | |
| 38 | Nifedipine | |
| 39 | Hydrochlorothiazide | |
| 40 | Indapamide | |
| 41 | Spironolactone | |
| 42 | Hydralazine | |
| 43 | Methyldopa | |
| 44 | Amiloride | |
| Lipid-lowering medications: daily dose | ||
| 45 | Lovastatin | Total daily dose of each lipid-lowering medication at base visit |
| 46 | Pravastatin | |
| 47 | Simvastatin | |
| 48 | Atorvastatin | |
| 49 | Rosuvastatin | |
| 50 | Fenofibrate | |
| 51 | Gemfibrozil | |
| 52 | Ezetimibe | |
| 53 | Cholestyramine | |
| Anti-diabetic medication class (count) | ||
| 54 | Biguanides | Count of number of medications in each class at base visit f |
| 55 | Sulphonylureas | |
| 56 | Alpha-glucosidase inhibitors | |
| 57 | Dipeptidyl peptidase 4 inhibitors | |
| 58 | Sodium-glucose co-transporter 2 inhibitors | |
| 59 | Insulin | |
| Anti-hypertensive medication class (count) | ||
| 60 | Angiotensin-converting enzyme inhibitors and Angiotensin II receptor blockers | Count of number of medications in each class at base visit f |
| 61 | Beta blockers | |
| 62 | Calcium channel blockers | |
| 63 | Diuretics | |
| 64 | Other anti-hypertensive classes | |
| Anti-hypertensive medication class (count) | ||
| 65 | Statins | Count of number of medications in each class at base visit f |
| 66 | Other lipid-lowering medications | |
| Medication purpose (count) | ||
| 67 | Anti-diabetic medications | Count of number of medications for each condition at base visit |
| 68 | Anti-hypertensive medications | |
| 69 | Lipid-lowering medications | |
a HbA1c: Hemoglobin A1c
b BP: Blood pressure
c LDL: Low-density lipoprotein
d HDL: High-density lipoprotein
e TG: Triglyceride
f For these variables, the count is either 0 or 1
International Classification of Diseases 10 codes for eye, foot, kidney and macrovascular complications
| Complication | ICD codes |
|---|---|
| Eye | E1431, 3620 |
| Foot | E1140, E1473, I739, 4439 |
| Kidney | E1122, 25,040, N183, N184, N185, 5859, 585 |
| Macrovascular | I249, I259, 4149, I500, 4280, G459, I64, 4349 |
ICD: International Classification of Diseases
Normal values imputed for the missing data
| Variable | Imputed value |
|---|---|
| Age (years) | 63.2* |
| Body mass index (kg/m2) | 25.2* |
| Systolic blood pressure (mmHg) | 129.8* |
| Diastolic blood pressure (mmHg) | 70.6* |
| HbA1ca (mmol/L) | 6.0 |
| LDLb (mmol/L) | 3.0 |
| HDLc (mmol/L) | 1.0 |
| TGd (mmol/L) | 1.7 |
a HbA1c: Hemoglobin A1c
b LDL: Low-density lipoprotein
c HDL: High-density lipoprotein
d TG: Triglyceride
* Mean value imputed
Variable importance weights derived from expert consensus
| Variable | Importance weight (1-least important, to 10-most important) |
|---|---|
| Age | 5 |
| Number of years with condition (Diabetes, Hypertension, Hyperlipidemia) | 10 |
| Body mass index | 2 |
| HbA1ca | 5 |
| Blood pressure values (Systolic and diastolic) | 2.5 |
| Cholesterol biomarkers (LDLb, HDLc, TGd) | 1.5 |
| Individual medication daily dose | 1 |
| Count of medications in each medication class | 2 |
| Count of medications for each condition | 5 |
a HbA1c: Hemoglobin A1c
b LDL: Low-density lipoprotein
c HDL: High-density lipoprotein
d TG: Triglyceride
Hyperparameters used in the final patient similarity model
| Hyperparameter | Value |
|---|---|
| Nearest neighbours | 10 |
| Weights | Uniform |
| Metric | Euclidean distance |
| Search algorithm | Ball tree |
Baseline characteristics of study patients
| n = 10,059 | Missing, n (%) | |
|---|---|---|
| Characteristics | ||
| Age (years), mean (SD) | 63.2 (11.3) | 0 (0.0) |
| Sex, males, n (%) | 4131 (41.1) | 0 (0.0) |
| Race, n (%) | 0 (0.0) | |
| Chinese | 8455 (84.1) | |
| Malay | 635 (6.3) | |
| Indian | 532 (5.3) | |
| Others | 437 (4.3) | |
| Body mass index (kg/m2), mean (SD) | 25.2 (4.5) | 1433 (14.2) |
| Systolic BP (mmHg), mean (SD) | 129.8 (17.7) | 60 (0.6) |
| Diastolic BP (mmHg), mean (SD) | 70.6 (10.8) | 60 (0.6) |
| HbA1ca (%) | 7.1 (1.4) | 7712 (76.6)* |
| LDLb (mmol/L) | 3.1 (0.9) | 2175 (21.6)# |
| HDLc (mmol/L) | 1.5 (0.4) | 2124 (21.1)# |
| TGd (mmol/L) | 1.4 (0.9) | 2124 (21.1) |
| Diagnosis, n (%) | 0 (0.0) | |
| Diabetes only | 150 (1.5) | |
| Hypertension only | 1501 (14.9) | |
| Hyperlipidemia only | 2223 (22.1) | |
| Diabetes & Hypertension | 149 (1.5) | |
| Diabetes & Hyperlipidemia | 315 (3.1) | |
| Hypertension & Hyperipidemia | 4133 (41.1) | |
| Diabetes, Hypertension & Hyperlipidemia | 1588 (15.8) | |
| Complications, n (%) | Not applicable@ | |
| Eye, n (%) | 1180 (11.7) | |
| Foot, n (%) | 117 (1.2) | |
| Kidney, n (%) | 811 (8.1) | |
| Macrovascular, n (%) | 1119 (11.1) | |
| Any of the above, n (%) | 2590 (25.7) | |
| Look-back duration (years), mean (SD) | 4.1 (1.3) | Not applicable@ |
| Look-forward duration (years), mean (SD) | 4.1 (1.5) | Not applicable@ |
a HbA1c: Hemoglobin A1c
b LDL: Low-density lipoprotein
c HDL: High-density lipoprotein
d TG: Triglyceride
* High number of missing values as not all patients had Diabetes to require a Hemoglobin A1c test
# Discrepancy between LDL and HDL values as some patients had extremely high TG to invalidate calculated LDL
@ Not applicable as these were derived data
Comparison of patient similarity model performance with other models
| Model | AUROC (95% CI) |
|---|---|
| Patient similarity (K = 10)—weighted | 0.718 (0.697 to 0.739) |
| Patient similarity (K = 10)—unweighted | 0.688 (0.667 to 0.709) |
| Logistic regression | 0.695 (0.672 to 0.718) |
| Random forest | 0.764 (0.744 to 0.784) |
| Support vector machine (kernel = linear) | 0.766 (0.746 to 0.785) |
Fig. 1The landing page (zoomed in at 175%) of the prototype web application using the patient similarity model. Users can enter demographic, biomarker and medication inputs to identify similar patients from the database
Fig. 2Data input into the prototype web application. The attending doctor enters the details of Patient X into the web application. Fields are non-mandatory. After entering the details, the attending doctor clicks the “Search” button which triggers the patient similarity model to identify the top-10 most similar patients in the database
Fig. 3An anonymized list of the top-10 most similar patients to Patient X is presented. An aggregate prognostic value is calculated based on the proportion of the top-10 patients who encountered a DHL complication. The green/orange/red indicators represent the outcomes of each patient over the subsequent 5 years from base visit. Green indicates that the patient did well (i.e. no complications). Orange indicates the patient had some complications or worsening in biomarker, while red indicates that the patient did poorly with multiple complications. In this case, four of the ten patients had either orange or red indicators
Fig. 4Timeline of a similar patient (Patient #10845). A particular similar patient can be selected to produce a timeline. In this case, Patient #10845 was selected to illustrate to Patient X a patient like himself who did well, and what Patient #10845 did to achieve the good results