| Literature DB >> 31551457 |
Sajida Perveen1, Muhammad Shahbaz2,3, Karim Keshavjee3,4, Aziz Guergachi3,5,6.
Abstract
Stratifying individuals at risk for developing diabetes could enable targeted delivery of interventional programs to those at highest risk, while avoiding the effort and costs of prevention and treatment in those at low risk. The objective of this study was to explore the potential role of a Hidden Markov Model (HMM), a machine learning technique, in validating the performance of the Framingham Diabetes Risk Scoring Model (FDRSM), a well-respected prognostic model. Can HMM predict 8-year risk of developing diabetes in an individual effectively? To our knowledge, no study has attempted use of HMM to validate the performance of FDRSM. We used Electronic Medical Record (EMR) data, of 172,168 primary care patients to derive the 8-year risk of developing diabetes in an individual using HMM. The Area Under Receiver Operating Characteristic Curve (AROC) in our study sample of 911 individuals for whom all risk factors and follow up data were available is 86.9% compared to AROCs of 78.6% and 85% reported in a previously conducted validation study of FDRSM in the same Canadian population and the Framingham study respectively. These results demonstrate that the discrimination capability of our proposed HMM is superior to the validation study conducted using the FDRSM in a Canadian population and in the Framingham population. We conclude that HMM is capable of identifying patients at increased risk of developing diabetes within the next 8-years.Entities:
Mesh:
Year: 2019 PMID: 31551457 PMCID: PMC6760163 DOI: 10.1038/s41598-019-49563-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of the population in the CPCSSN database.
| Predictors | Findings |
|---|---|
|
| |
| Female, sample size (%) | 100,566 (57) |
| Female age mean ± SD,Years | 49.5 ± 24.8 |
| Male age mean ± SD,Years | 48.2 ± 24.1 |
|
| |
| Systolic BP, mean ± SD, mm Hg | 129.34 ± 17.183 |
| Chronic obstructive pulmonary disease, N (%) | 9939 (2.4) |
| Dementia, N (%) | 12007 (1.8) |
| Depression, N (%) | 32672 (10) |
| Diabetes Mellitus, N (%) | 26077 (6) |
| Epilepsy, N (%) | 5553 (0.8) |
| Hypertension, N (%) | 61370(13) |
| Osteoarthritis, N (%) | 37274(7) |
| Parkinson’s Disease, N (%) | 1825 (0.2) |
|
| |
| Fasting blood glucose, mean ± SD, mmol/L | 5.54 ± 1.91 |
| TG, mean ± SD, mmol/L | 1.523 ± 0.962 |
| LDL, mean ± SD, mmol/L | 2.83 ± 0.99 |
| High density lipoprotein, mean ± SD, mmol/L | 1.3893 ± 0.416 |
| BMI, mean ± SD, kg/m2 | 37.113 ± 1528.71 |
| A1C, mean ± SD, mmol/L | 6.268 ± 0.976 |
| Cholesterol mean ± SD, mmol/L | 4.893 ± 1.159 |
SD, Standard Deviation; Yr, Year; BP, Blood Pressure; LDL, Light Density Lipoprotein; A1C, Glycated Hemoglobin; TG, Triglycerides; BMI, Body Mass Index; HDL, High Density Lipoprotein.
*Some patients have more than 1 disease in the database.
Characteristics of the derived study sample.
| Predictors | Findings |
|---|---|
|
| |
| Sample size without duplicates | 911 |
| Female, sample size (%) | 556, (61.03) |
| Male age mean ± SD,Years | 58.97 ± 11.96 |
| Female age mean ± SD,Years | 58.03 ± 11.02 |
|
| |
| Systolic BP, mean ± SD, mm Hg | 127.611 ± 15.86 |
| Diabetes Mellitus, N (%) | 214 (23.49) |
|
| |
| Fasting blood glucose, mmol/L mean ± SD, mmol/L | 5.573 ± 1.93 |
| Triglycerides, mean ± SD, mmol/L | 1.705 ± 1.027 |
| HDL, sample size, mean ± SD, mmol/L | 1.313 ± 0.366 |
| BMI, mean ± SD, kg/m2 | 28.76 ± 5.818 |
SD, Standard Deviation; BP, Blood Pressure; BMI, Body Mass Index; HDL, High Density Lipoprotein.
Figure 1The receiver operating characteristic curve (AROC) of our proposed model over derived study sample.
Association between individual risk factors and T2DM in the derived dataset.
| Explanatory variables | OR (95% C.I.) | |
|---|---|---|
| Age | 1.006 (0.993–1.020) | 0.000 |
| Male | 0.552 (0.472–0.701) | 0.030 |
| Systolic blood pressure | 0.998 (0.988–1.008) | 0.00 |
| BMI | 1.011 (0.985–1.038) | 0.022 |
| HDL | 0.601 (0.312–0.803) | 0.004 |
| Triglycerides | 1.076 (0.862–1.343) | 0.002 |
| Fasting blood glucose | 9.936 (7.638–12.925) | 0.000 |
| Intercept | 0.000 |
Nagelkerke R2 = 0.546.
Hosmer and Lemeshow Test = 0.360 (Significantly greater than 0.0005).
OR, Odds Ratio; C.I. confidence Interval; BMI, Body Mass Index; HDL, High Density Lipoprotein.
Comparative analysis of our derived research sample with the Framingham study and validation study of FDRSM in Canadian population research samples.
| Research sample in our study | Framingham simple clinical model | Research sample of validation study of FDRSM in Canadian population | |
|---|---|---|---|
| Sample size | 911 | 3140 | 1970 |
| Female (%) | 61.03 | 53.9 | 60.6% |
| Age mean, SD,Years | 58.97 ± 11.965 | 54.0 ± 9.8 | 56.60(5.29) |
| Systolic BP >130/85 mm Hg,% | 49 | 44.2 | 20.1 |
| Triglycerides levels ≥1.7 mmol/L,% | 53 | 31.8 | 27.9 |
| HDL levels <0.9 mmol/L in male and <1.2 mmol/L in female,% | 17 | 36.9 | 18.9 |
| Fasting blood glucose levels 5.5 to 6.9 mmol/L, % | 47 | 27.0 | 30.3 |
| BMI, mean ± SD, kg/m2 | 28.76 ± 5.818 | 27.1 ± 4.7 | 28.28(6.07) |
SD, Standard Deviation; BP, Blood Pressure; BMI, body mass index; HDL, high-density lipoprotein.
Summary of Area Under Receiver Operating Characteristic Curve (AROC) in our derived research dataset.
| AROC | Std. Errora | Asymptotic Sig.b | Asymptotic 95% Confidence Interval | ||
|---|---|---|---|---|---|
| Lower Bound | Upper Bound | ||||
| Over the derived study dataset | 0.869 | 0.054 | 0.000 | 0.763 | 0.975 |
| Over the derived dataset, excluding age | 0.828 | 0.60 | 0.000 | 0.710 | 0.946 |
The test result variable(s): cal has at least one tie between the positive actual state group and the negative actual state group. Statistics may be biased.
Under the non parametric assumption.
Null hypothesis: true area = 0.5.
The comparative analysis of AROCs and 8-year risk for developing diabetes among our research sample, the Framingham research sample (simple clinical model) and FDRSM validation study in Canadian population.
| Proposed HMM based risk model | Framingham simple Clinical model | Validation study of FDRSM in Canadian population | |
|---|---|---|---|
| AROC, % | 86.9 | 85.0 | 78.6 |
| <3, % | 42.2 | 63.8 | 70.1 |
| 3 to 10, % | 44.4 (between 3 to 9) | 20.7 | 16.3 |
| >10, % | 13.3 (equal to 10) | 15.6 | 13.6 |
AROC; Area Under receiver Operating Characteristic Curve.
Figure 2The receiver operating characteristic curve (AROC) of our proposed model over derived study sample excluding age as one of the contributing risk factor.