| Literature DB >> 26151207 |
Rahman Ali1, Jamil Hussain2, Muhammad Hameed Siddiqi3, Maqbool Hussain4, Sungyoung Lee5.
Abstract
Diabetes is a chronic disease characterized by high blood glucose level that results either from a deficiency of insulin produced by the body, or the body's resistance to the effects of insulin. Accurate and precise reasoning and prediction models greatly help physicians to improve diagnosis, prognosis and treatment procedures of different diseases. Though numerous models have been proposed to solve issues of diagnosis and management of diabetes, they have the following drawbacks: (1) restricted one type of diabetes; (2) lack understandability and explanatory power of the techniques and decision; (3) limited either to prediction purpose or management over the structured contents; and (4) lack competence for dimensionality and vagueness of patient's data. To overcome these issues, this paper proposes a novel hybrid rough set reasoning model (H2RM) that resolves problems of inaccurate prediction and management of type-1 diabetes mellitus (T1DM) and type-2 diabetes mellitus (T2DM). For verification of the proposed model, experimental data from fifty patients, acquired from a local hospital in semi-structured format, is used. First, the data is transformed into structured format and then used for mining prediction rules. Rough set theory (RST) based techniques and algorithms are used to mine the prediction rules. During the online execution phase of the model, these rules are used to predict T1DM and T2DM for new patients. Furthermore, the proposed model assists physicians to manage diabetes using knowledge extracted from online diabetes guidelines. Correlation-based trend analysis techniques are used to manage diabetic observations. Experimental results demonstrate that the proposed model outperforms the existing methods with 95.9% average and balanced accuracies.Entities:
Keywords: H2RM; RBR; classification; diabetes mellitus; prediction; reasoning; regression; rough set theory; rules mining; trend analysis
Mesh:
Year: 2015 PMID: 26151207 PMCID: PMC4541861 DOI: 10.3390/s150715921
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Hybrid rough set reasoning model for prediction and management of diabetes mellitus.
Figure 2An encounter of type-2 diabetes mellitus patient (T2DM), following subjective, objective, assessment, and plan (SOAP)-based protocol.
List of guidelines used for managing diabetes mellitus.
| S.No | Predictor | Guidelines | References |
|---|---|---|---|
| 1 | BMI | WHO: BMI classification | WHO [ |
| 2 | BP: SBP, DBP | JNC 7 report, AHA | JNC [ |
| 3 | FBS | American Diabetes Association. Diabetes Care | ADA [ |
| 4 | HBA1c | American Diabetes Association, NICE | ADA [ |
| 5 | Lipids: TC, TG, HDL, LDL | NCEP, ADA | NCEP [ |
| 6 | LFT: ALT, AST | Liver disease (LD), Mayo Clinic | LD [ |
Figure 3Distribution of diabetes patient’s observations in subjective, objective, assessment, and plan (SOAP)-based clinical chart.
Rules defined for management of diabetics observations, based on reference ranges, extracted from guidelines (Table 1).
| (−∞, 18.5) | underweight | (−∞, 200) | desirable |
| [18.5, 24.9] | normal | [200, 239] | borderline high |
| [25, 30) | overweight | [240, ∞) | high |
| [30, ∞) | obese | ||
| (−∞, 120) | normal | (−∞, 150) | normal |
| [120, 139] | prehypertension | [150, 199] | borderline-high |
| [140, 159] | hypertension stage 1 | [200, 499] | high |
| [160, 180] | hypertension stage 2 | [500, ∞) | very high |
| [181, ∞) | hypertensive crisis | ||
| (−∞, 80) | normal | (−∞, 100) | optimal |
| [80, 89] | prehypertension | [100, 129] | near or above optimal |
| [90, 99] | hypertension stage 1 | (129, 159] | borderline high |
| [100, 110] | hypertension stage 2 | (159, 189] | high |
| (110, ∞) | hypertensive crisis | (189, ∞) | very high |
| (−∞, 70) | hypoglycemia | (−∞, 40) | low |
| [70, 99] | normal | [40, 60) | normal |
| (99, 126] | pre-diabetic | [60, ∞) | high |
| (126, ∞) | diabetic | ||
| [4, 5.9] | hypoglycemia | (−∞, 5) | low |
| (5.9, 6.4] | prediabetes | [5, 40] | normal |
| (6.4, 7.4] | diabetes | (40, ∞) | high |
| (7.4, ∞) | diabetes with Higher risk | ||
| (−∞, 7) | low | ||
| [7, 56] | normal | ||
| [57, ∞) | high | ||
Legend: “[”or “]” means inclusive, “(” or “)” means exclusive, “∞” means ± infinity.
Missing value treatment, criteria and strategies, applied to the diabetes mellitus dataset.
| Scope | Criteria | Strategy |
|---|---|---|
| Dataset level (whole population) | If any attribute of the dataset has missing values in 20% or more than 20% records of the whole dataset | Drop the attributes from the dataset, this may leads to incorrect results |
| Patient level (whole encounters of one patient) | If any attribute has missing values in 2 or less than 2 encounters of a patient | Use immediate previous/next encounter’s values of the same patient
Immediate previous/next encounter value, if missing values are non-consecutive Immediate previous encounter value for the first missing value and immediate next value for the second missing value, if missing values are consecutive |
| If any attribute has missing values in less than 20% of the encounters of a patient | Use average/frequent value strategy within encounters of the same patient
Compute average of all the values of that attribute for the same patient, if attribute is numeric Compute frequent value within all the encounters of the same patient, if the attribute is nominal | |
| If any attribute has missing values in more than 20% of the encounters of a patient | Use average/frequent value strategy within patients of the same class Compute average of all the values of all the patients in the same class, if attribute is numeric Compute frequent value within all the patients of the same class, if the attribute is nominal |
Encounters of a single patient before and after applying the missing value completion strategies.
| 50 | e1 | 21.1 | 41 | 109 | no | yes | NULL | NULL | 44 | 107 | 36 | T2DM |
| 50 | e2 | 21.1 | 41 | NULL | no | NULL | 144 | 11.6 | 44 | 107 | 46 | T2DM |
| 50 | e3 | 21.3 | 42 | NULL | NULL | NULL | 116 | NULL | 44 | 110 | 36 | T2DM |
| 50 | e4 | 22.2 | 42 | 104 | NULL | yes | 155 | 6.6 | 42 | 150 | 64 | T2DM |
| 50 | e5 | NULL | 42 | 123 | NULL | NULL | NULL | NULL | 42 | NULL | 64 | T2DM |
| 50 | e6 | 22.1 | 42 | 123 | NULL | NULL | 246 | 8.9 | 52 | 165 | 39 | T2DM |
| 50 | e7 | 22.1 | 42 | 114.0 | no | NULL | 240 | 7.2 | 50 | 130 | 40 | T2DM |
| 50 | e8 | 22.2 | 42 | 191.0 | NULL | NULL | 230 | 9 | 51 | 162 | 45 | T2DM |
| 50 | e1 | 21.1 | 41 | 109 | 144 | 8.66 | 44 | 107 | 36 | T2DM | ||
| 50 | e2 | 21.1 | 41 | 109 | 144 | 11.6 | 44 | 107 | 46 | T2DM | ||
| 50 | e3 | 21.3 | 42 | 104 | 116 | 8.66 | 44 | 110 | 36 | T2DM | ||
| 50 | e4 | 22.2 | 42 | 104 | 155 | 6.6 | 42 | 150 | 64 | T2DM | ||
| 50 | e5 | 22.2 | 42 | 123 | 155 | 8.66 | 42 | 150 | 64 | T2DM | ||
| 50 | e6 | 22.1 | 42 | 123 | 246 | 8.9 | 52 | 165 | 39 | T2DM | ||
| 50 | e7 | 22.1 | 42 | 114.0 | 240 | 7.2 | 50 | 130 | 40 | T2DM | ||
| 50 | e8 | 22.2 | 42 | 191.0 | 230 | 9 | 51 | 162 | 45 | T2DM | ||
Clinical characteristics of the diabetes patients.
| Characteristic | Average | Min. Value | Max. Value | Std. Deviation |
|---|---|---|---|---|
| BMI | 23.0 | 16.2 | 32.0 | 3.2 |
| Gender | M (256), F (135) | |||
| Age | 48.8 | 20.0 | 85.0 | 15.4 |
| SBP | 120.8 | 89.0 | 190.0 | 14.9 |
| DBP | 74.5 | 45.0 | 115.0 | 10.2 |
| FBS | 137.6 | 49.0 | 394.0 | 43.9 |
| Hba1c | 8.0 | 4.2 | 14.6 | 2.0 |
| TC | 169.5 | 0.0 | 371.0 | 37.7 |
| TG | 101.0 | 18.0 | 634.0 | 80.9 |
| HDL | 64.5 | 31.0 | 196.0 | 23.7 |
| LDL | 82.2 | 15.0 | 180.0 | 29.4 |
| AST (SGOT) | 22.0 | 11.0 | 65.0 | 7.8 |
| ALT (SGPT) | 26.6 | 8.0 | 120.0 | 18.0 |
| TDM | T2DM (278), T1DM (113) | |||
Set of cut-points and corresponding intervals for discretization of the Diabetes Mellitus Information System (DMIS).
| Attributes | # Cut-Points: Cut-Points Description | # Intervals: Interval Description | Discrete Value for Interval | Guidelines |
|---|---|---|---|---|
| BMI | 3: 18.5; 25; 30 | 4: (−∞, 18.5), [18.5, 24.9], [25, 30), [30, ∞) | 0, 1, 2, 3 | WHO [ |
| Gender | NA | NA | NA | - |
| Age | 2: 30; 50 | 3: (−∞, 30), [30, 50], (50, ∞) | 0, 1, 2 | - |
| SBP | 4: 120; 140; 160; 181 | 5: (−∞, 120), [120, 139], [140, 159], [160, 180], [181, ∞) | 0, 1, 2, 3, 4 | JNC 7 report, AHA [ |
| DBP | 4: 80; 90; 100; 110 | 5: (−∞, 80), [80, 89], [90, 99], [100, 110], (110, ∞) | 0, 1, 2, 3, 4 | JNC 7 report, AHA [ |
| FBS | 3: 70; 99; 126 | 4: (−∞, 70), [70, 99], (99, 126], (126, ∞) | 0, 1, 2, 3 | ADA [ |
| Hba1c | 3: 5.9; 6.4; 7.4 | 4: [4, 5.9], (5.9, 6.4], (6.4, 7.4], (7.4, ∞) | 0, 1, 2, 3 | ADA [ |
| TC | 2: 200; 240 | 3: (−∞, 200), [200, 239], [240, ∞) | 0, 1, 2 | NCPE [ |
| TG | 3: 150; 200; 500 | 4: (−∞, 150), [150, 199], [200, 499], [500, ∞) | 0, 1, 2, 3 | NCEP [ |
| HDL | 2: 40; 60 | 3: (−∞, 40), [40, 60), [60, ∞) | 0, 1, 2 | NCEP [ |
| LDL | 4: 100; 129; 159; 189 | 5: (−∞, 100), [100, 129], (129, 159], (159, 189], (189, ∞) | 0, 1, 2, 3, 4 | NCEP [ |
| AST(SGOT) | 2: 5; 40 | 3: (−∞, 5), [5, 40], (40, ∞) | 0, 1, 2 | LD[ |
| ALT(SGPT) | 2: 7; 57 | 3: (−∞, 7), [7, 56], [57, ∞) | 0, 1, 2 | LD[ |
Legend: “[”or “]” means inclusive, “(”or “)” means exclusive, “∞” means ± infinity.
Partial data of diabetes mellitus Information System in interval format after discretization.
| DiscBMI | Gender | DiscAge | DiscSBP | DiscDBP | DiscFBS | DiscHba1c | DiscTC | DiscTG | DiscHDL | DiscLDL | DiscAST | DiscALT | TDM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [18.5, 24.9] | M | (50, ∞) | [120, 139] | (−∞, 80) | (99, 126] | (7.4, ∞) | (−∞, 200) | (−∞, 150) | (−∞, 40) | (−∞, 100) | [5, 40] | [7, 56] | T2DM |
| [25, 30) | M | [30, 50] | [140, 159] | [100, 110] | [70, 99] | (7.4, ∞) | (−∞, 200) | [150, 199] | [40, 60) | (−∞, 100) | [5, 40] | [7, 56] | T1DM |
| [18.5, 24.9] | F | (50, ∞) | (−∞, 120) | (−∞, 80) | (126, ∞) | (6.4, 7.4] | [200, 239] | (−∞, 150) | [40, 60) | [100, 129] | [5, 40] | [7, 56] | T2DM |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| (−∞, 18.5) | F | [30, 50] | (−∞, 120) | (−∞, 80) | (126, ∞) | (7.4, ∞) | (−∞, 200) | (−∞, 150) | [60, ∞) | (−∞, 100) | [5, 40] | [7, 56] | T1DM |
| [25, 30) | F | (50, ∞) | (−∞, 120) | [80, 89] | (99, 126] | (5.9, 6.4] | (−∞, 200) | (−∞, 150) | [60, ∞) | (−∞, 100) | [5, 40] | [57, ∞) | T2DM |
Legend: “[”or “]” means inclusive, “(” or “)” means exclusive, “∞” means ± infinity.
List of all possible reducts for the Discretized Information System after applying Lattice Reduct Search method.
| Reduct # | # Attributes | Reduct (Attributes) |
|---|---|---|
| 1 | 10 | {BMI, Gender, Age, SBP, DBP, FBS, Hba1c, HDL, LDL, PT} |
| 2 | 10 | {BMI, Age, SBP, DBP, FBS, Hba1c, TG, HDL, LDL, PT} |
| 3 | 10 | {BMI, Gender, Age, SBP, FBS, Hba1c, HDL, LDL, OT, PT} |
| 4 | 10 | {BMI, Age, SBP, FBS, Hba1c, TG, HDL, LDL, OT, PT} |
A Partial list of rules extracted from discretized information system (DIS) using rough set (RS) learning from example module, version 2 (LEM2) algorithm.
| Rule # | Prediction for TDM | Prediction Rule | Significance |
|---|---|---|---|
| 1 | (T1DM) | (BMI = [18.5, 24.9]) and (Age = (50, ∞)) and (SBP = [120, 139]) and (Hba1c = (7.4, ∞)) and (TC = (−∞, 200)) and (SGPT = [7, 56]) | 20 (17.70%) |
| 2 | (T2DM) | (Gender = M) and (SBP = (−∞, 120)) and (Hba1c = (6.4, 7.4]) and (LDL = [100, 129]) | 17 (6.12%) |
| 3 | (T2DM) | (BMI = [18.5, 24.9]) and (Age = [30, 50]) and (SBP = (−∞, 120)) and (TG = (−∞, 150)) and (HDL = [40, 60)) | 23 (8.27%) |
| 4 | (T1DM) | (SBP = [120, 139]) and (DBP = [80, 89]) and (Hba1c = (5.9, 6.4]) and (HDL = [40, 60)) and (SGPT = [7, 56]) | 7 (6.19%) |
| 5 (approximate rule) | (T1DM) OR (T2DM) | (BMI = [18.5, 24.9]) and (Age = (50, ∞)) and (FBS = 3) and (Hba1c = (126, ∞)) and (TG = (−∞, 150)) and (LDL = (−∞, 100)) and (SGPT = [7, 56]) | [5, 5] [2, 3] |
Legend: “[”or “]” means inclusive, “(” or “)” means exclusive, “∞” means ± infinity.
Figure 4Correlation-based trend analysis for prognosis of diabetes mellitus. The bold-faced blue line represents scatterd line graph of the current observations, the dotted black line shows future polynomial trendline for future prediction and the light orange strap represents normal ranges of the observations.
Experimental setup used for validation of prediction rules in ROSE 2 system.
| S.No | Parameters | Values |
|---|---|---|
| 1 | Test | k-fold cross validation |
| 2 | Number of passes | 10 |
| 3 | Majority threshold | 21% |
| 4 | Minimum similarity | 50% |
| 5 | Partially matched rules | All |
| 6 | Rule support | strength × similarity |
Confusion matrix (sum over 10 passes) describing overall output of the validation process.
| Type of DM | T1DM | T2DM | None |
|---|---|---|---|
| T1DM | 106 (TP) | 7 (FN) | 0 |
| T2DM | 9 (FP) | 269 (TN) | 0 |
Average accuracy (%) of the model for individual class and overall model.
| Type of DM | Correct | Incorrect | None |
|---|---|---|---|
| T1DM | 94.59 ± 6.16 | 5.41 ± 6.16 | 0.00 ± 0.00 |
| T2DM | 96.85 ± 4.11 | 3.15 ± 4.11 | 0.00 ± 0.00 |
| Total | 95.91 ± 2.61 | 4.09 ± 2.61 | 0.00 ± 0.00 |
Figure 5Test results of each pass of the 10-folds cross validation process.
Percent accuracy and percent error for each test of the 10-fold cross validation process along with average accuracy and standard error of all 10-folds
| Pass 1 | 40 | 1 | 39 | 97.5 | 2.5 |
| Pass 2 | 39 | 3 | 36 | 92.30769231 | 7.6923077 |
| Pass 3 | 39 | 2 | 37 | 94.87179487 | 5.1282051 |
| Pass 4 | 39 | 3 | 36 | 92.30769231 | 7.6923077 |
| Pass 5 | 39 | 2 | 37 | 94.87179487 | 5.1282051 |
| Pass 6 | 39 | 0 | 39 | 100 | 0 |
| Pass 7 | 39 | 1 | 38 | 97.43589744 | 2.5641026 |
| Pass 8 | 39 | 2 | 37 | 94.87179487 | 5.1282051 |
| Pass 9 | 39 | 1 | 38 | 97.43589744 | 2.5641026 |
| Pass 10 | 39 | 1 | 38 | 97.43589744 | 2.5641026 |
| No. Instances | 391 | ||||
| Total Number of Incorrect Examples | 16 | ||||
| Total Number of Correct Examples | 375 | ||||
| Average Accuracy | 95.90384615 | ||||
| Average Error | 4.096153846 | ||||
| Standard Error based on Percent Error of each Fold | 2.61660764 | ||||
| Average Accuracy ± Standard Errors | 95.9 ± 2.6 | ||||
Evaluation parameters for computing balanced accuracy.
| True Positive (TP) | False Positive (FP) | True Negative (TN) | False Negative (FN) |
|---|---|---|---|
| 106 | 9 | 269 | 7 |