| Literature DB >> 23676796 |
Bassam Farran1, Arshad Mohamed Channanath, Kazem Behbehani, Thangavel Alphonse Thanaraj.
Abstract
OBJECTIVE: We build classification models and risk assessment tools for diabetes, hypertension and comorbidity using machine-learning algorithms on data from Kuwait. We model the increased proneness in diabetic patients to develop hypertension and vice versa. We ascertain the importance of ethnicity (and natives vs expatriate migrants) and of using regional data in risk assessment.Entities:
Keywords: Kuwait; Machine learning; Risk assessment; predictive models
Year: 2013 PMID: 23676796 PMCID: PMC3657675 DOI: 10.1136/bmjopen-2012-002457
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Performance of various classification models built for modelling diabetes and hypertension
| Type of classification | N for case/control | Classification accuracy at the best random classifier (%) | Classification accuracy for the different models used (%) | |||
|---|---|---|---|---|---|---|
| LR | SVM | k-NN | MDR | |||
| (i) Diabetes in general population | 2853/7779 | 73.2 | 80.7. | 81.3±1.3 | 78.6±0.85 | 78.30 |
| (ii) Diabetes in hypertensive population | 1322/1382 | 51.1 | 70.9 | 87.4±1.1 | 75.6±2.7 | 72.1 |
| (iii) Two-stage aggregate of (i) + (ii) − diabetes | 2853/7779 | 73.2 | N/A | 84.9 | 88.2 | N/A |
| (iv) Hypertension in general population | 6759/3873 | 63.6 | 82.4 | 82.4±0.6 | 80.0±0.8 | 80.9 |
| (v) Hypertension in diabetic population | 2427/5994 | 71.2 | 80.1 | 80.8±1.3 | 76.0±1.4 | 67.3 |
| (vi) Two-stage aggregate of (iv) + (v) − hypertension | 1322/1382 | 51.1 | N/A | 95.3 | 90.3 | N/A |
| Kuwait-specific data sets | ||||||
| (i) Diabetes in general population | 1334/4179 | 75.8 | 79.4 | 79.4 | 77.6 | 75.9 |
| (ii) Hypertension in general population | 3451/2062 | 62.6 | 80 | 79.9 | 76.8 | 77.9 |
| Asian-specific data sets | ||||||
| (i) Diabetes in general population | 976/2061 | 67.9 | 84.3 | 84.3 | 81.4 | 83.6 |
| (ii) Hypertension in general population | 1933/1104 | 63.7 | 86.8 | 86.8 | 83.3 | 83.8 |
LR, logistic regression; SVM, support vector machine; k-NN, k-nearest neighbours; MDR, Multifactor Dimensionality Reduction.
Figure 1Illustration of the methodology and flow of data for two-stage aggregate classification model and the two-stage aggregate risk assessment tool for diabetes. (A) Illustration for the two-stage aggregate classification model for diabetes. A data set is passed through the classification model for diabetes in general population (ie, irrespective of the status on hypertension onset)—the output is classified as TP1, TN1, FP1 and FN1. Of the false-positives and false negatives, the ones that also have the affliction of hypertension are passed through the classification model for diabetes in hypertensive population—the output of the second model can be classified as TP2, TN2, FP2 and FN2. The combined classification accuracy of the aggregate model is then defined as (TP1+TP2+TN1+TN2)/(TP1+TN1+FP1+FP2). FP, false positives; TP, true positives; FN, false negatives; TN, true negatives; HT, hypertension. (FP1 and FN1)HT indicates those patients who are tested false positives and false negatives and are hypertensive. (B) Illustration for the two-stage aggregate risk assessment tool for diabetes. A data set is passed through the classification model for diabetes in general population (ie, irrespective of the status on hypertension onset)—the output is classified as TP1, TN1, FP1 and FN1. Of the false positives and false negatives, the ones that also have the affliction of hypertension are passed through the risk assessment tool for diabetes in hypertensive population; of the false positives and false negatives, the non-hypertensive ones along with the true positives and true negatives are passed through the risk assessment tool for diabetes in general population. The combined risk assignment is the aggregate of risk assignments from the two component risk assessment tools. FP, false positives; TP, true positives; FN, false negatives; TN, true negatives; HT, hypertension. (FP1 and FN1)HT indicates those patients who are tested false positives and false negatives and are hypertensive.
Performance of the IHBI risk assessment tools (as built in this study) and ADA assessment tool for diabetes on Kuwaiti natives and Asian expatriates
| Risk assignment by the ADA tool (%) | Risk assignment by the IHBI (k-NN) tool (%) | Risk assignment by the IHBI_Aggregate (k-NN) tool (%) | ||||
|---|---|---|---|---|---|---|
| Data set | Diabetic patients | Non-diabetic patients | Diabetic patients | Non-diabetic patients | Diabetic patients | Non- diabetic patients (%) |
| All ethnicities (k=7,N=10632) | ||||||
| ‘Low’ risk | 23.4 | 16.7 | 12.4 | 70.7 | 6.6 | 71.4 |
| ‘Borderline’ risk | 32.7 | 32.2 | 28.4 | 20.0 | 18.9 | 23.7 |
| ‘High’ risk | 43.9 | 51.1 | 59.2 | 9.3 | 74.5 | 4.9 |
| Kuwaiti natives (k=8,N=5513)* | ||||||
| ‘Low’ risk | 15.3 | 9.7 | 11.4 | 64.6 | 4.9 | 64.4 |
| ‘Borderline’ risk | 38.1 | 44.2 | 31.6 | 24.4 | 30 | 25.5 |
| ‘High’ risk | 46.6 | 46.1 | 57.0 | 10.9 | 65.2 | 10.2 |
| Asians expatriates (k=7, N=3036) * | ||||||
| ‘Low’ risk | 23.5 | 45.9 | 9.6 | 73.6 | 2.0 | 68.8 |
| ‘Borderline’ risk | 16.8 | 11.3 | 14.9 | 16.6 | 9.6 | 19.1 |
| ‘High’ risk | 59.6 | 42.8 | 75.5 | 9.8 | 88.4 | 12.2 |
*Split schema used is (0–1)—‘low’ risk; (2–)—‘borderline’ risk; (4–8)—‘high’ risk.
ADA, American Diabetes Association; k-NN, k-nearest neighbours.
Performance of the IHBI risk assessment tools for hypertension (as built in this study) on Kuwaiti natives and Asian expatriates
| Risk assignment by the IHBI (k-NN) tool | Risk assignment by the IHBI_Aggregate (k-NN) tool | |||
|---|---|---|---|---|
| Data set | Hypertensive patients (%) | Non-hypertensive patients (%) | Hypertensive patients (%) | Non-hypertensive patients (%) |
| All Ethnicities (k=8,N=10632) | ||||
| ‘Low’ risk | 1.1 | 37.6 | 0.28 | 37.6 |
| ‘Borderline’ risk | 6.5 | 31.8 | 4.9 | 31.9 |
| ‘High’ risk | 92.4 | 30.6 | 94.8 | 30.5 |
| Kuwaiti natives (k=8,N=5513)* | ||||
| ‘Low’ risk | 0.43 | 22.8 | 0.26 | 27 |
| ‘Borderline’ risk | 4.9 | 34.7 | 5.6 | 38.1 |
| ‘High’ risk | 94.6 | 42.5 | 94.2 | 34.9 |
| Asian expatriates (k=8,N=3036)* | ||||
| ‘Low’ risk | 1.2 | 43.6 | 0.1 | 48.5 |
| ‘Borderline’ risk | 4.1 | 27.4 | 2.6 | 26.1 |
| ‘High’ risk | 94.7 | 29.1 | 97.3 | 25.5 |
*Split schema used is (0–1)—‘low’ risk; (2–3)—‘borderline’ risk; (4–8)—‘high’ risk.
k-NN, k-nearest neighbours.