| Literature DB >> 31531223 |
Muhammad Musa Uba1, Ren Jiadong1, Muhammad Noman Sohail1, Muhammad Irshad1, Kaifei Yu2.
Abstract
To predict diabetes mellitus model data mining (DM) based approaches on the dataset collected from the seven northwestern states of Nigeria. Data were collected from both primary and secondary sources through questionnaires and verbal interviews from patients with diabetic mellitus and other chronic diseases. Some hospital data were also used from the records of patients involved in this work. The dataset comprises 281 instances with 8 attributes. R programming software (version 5.3.1) was used in the experiments. The DM techniques used in this research were binomial logistic regression, classification, confusion matrix and correlation coefficient. The data were partitioned into training and testing sets. Training data were used in building the model while testing data were used to validate the model. The algorithm for the best-fitted model converges with null deviance: 281.951, residual deviance: 16.476 and AIC: 30.476. The significance variables are AGE, GLU, DBP and KDYP with 0.025, 0.01, 0.05 and 0.025 P values, respectively. The predicted model accounted for the accuracy of ∼97.1%. The correlation analysis results revealed that diabetic patients are more likely to be hypertensive than patients with other chronic diseases considered in the research.Entities:
Keywords: DM techniques; Nigeria; best-fitted model converges; biomedical measurement; chronic diseases; confusion matrix; correlation coefficient; data mining; data mining process; dataset; diabetes mellitus based model; diabetes mellitus model data mining based approaches; diabetic mellitus; diabetic patients; diseases; hospital data; medical computing; medical diagnostic computing; medical disorders; northwestern part; patient diagnosis; patient treatment; pattern classification; predicted model; primary sources; regression analysis; secondary sources; seven northwestern states; testing sets; training data
Year: 2019 PMID: 31531223 PMCID: PMC6718069 DOI: 10.1049/htl.2018.5111
Source DB: PubMed Journal: Healthc Technol Lett ISSN: 2053-3713
Fig. 1Proposed analytical platform of the predicted model
Legend: SCRS: Systematic cluster random sampling, QUET: Questionnaire, VINT: Verbal interview, HREC: Hospital record, DTRS: Data transformation, SMET: Stepwise method, DSEL: Data selection, CLSF: Classification, LREG: Logistic regression and CMTX: Confusion matrix
Fig. 2Attribute details in a graph format that has been used while the study conduct
Logistic regression model result
| Variables | Coefficients | Std error | ||
|---|---|---|---|---|
| constant | −48.12040 | 21.61366 | −2.226 | 0.0260* |
| AGE | 0.30854 | 0.19877 | 1.552 | 0.0246* |
| GLU | 6.52985 | 2.70314 | 2415 | 0.0157* |
| BMI | −0.13504 | 0.09778 | −1.381 | 0.1673 |
| DBP | −0.06805 | 0.17965 | −0.379 | 0.7048 |
| KDYP | 7.97545 | 4.15712 | 1.919 | 0.0550. |
| HRTP | 3.28995 | 3.06506 | 1.073 | 0.2831 |
| EYEP | −1.68871 | 2.02458 | −0.834 | 0.4042 |
Legends: AGE: Patients Age, GLU: Patents Glucose level, BMI: Patients Body Mass Index, DBP: Patients Diastolic Blood Pressure, KDYP: Symptoms related to kidney problems, HETP: Symptoms related to heart/cardiovascular problems and EYEP: Symptoms related to eye problems.
Null deviance: 281.951.
RD: 16.302.
AIC: 32.302.
Fisher's iterations: 12.
Logistic regression model result
| Variables | Coefficients | Std error | ||
|---|---|---|---|---|
| constant | −38.26928 | 13.32100 | −2.872 | 0.00407** |
| AGE | 0.18394 | 0.08125 | 2.264 | 0.02358* |
| GLU | 5.43762 | 1.89178 | 2.874 | 0.00405** |
| BMI | −0.12266 | 0.14977 | −0.819 | 0.41280 |
| DBP | −0.07408 | 0.05872 | −1.262 | 0.05708. |
| KDYP | 10.04632 | 4.23233 | 2.374 | 0.01761* |
Legends: AGE: Patients Age, GLU: Patents Glucose level, BMI: Patients Body Mass Index, DBP: Patients Diastolic Blood Pressure and KDYP: Symptoms related to kidney problems.
Null deviance: 281.951.
RD: 16.476.
AIC: 30.476.
Fisher's iterations: 11.
Confusion matrix
| Actual | Predicted | |
|---|---|---|
| False | True | |
| 0 | 17 | 2 |
| 1 | 0 | 51 |
Accuracy = Sum of the right diagonal divided by the total sum of the entire observations = 0.971 = 97.1%.
Correlation matrix
| TYPE | KDYP | HETP | EYEP | HBPK | |
|---|---|---|---|---|---|
| TYPE | 1.00 | 0.09 | 0.08 | 0.13 | 0.25 |
| KDYP | 0.09 | 1.00 | 0.32 | −0.01 | 0.09 |
| HETP | 0.08 | 0.32 | 1.00 | 0.11 | 0.02 |
| EYEP | 0.13 | −0.01 | 0.11 | 1.00 | 0.09 |
| HBPK | 0.25 | 0.09 | 0.02 | 0.09 | 1.00 |
Legends: TYPE: Diabetes Patient's, KDYP: Symptoms related to kidney problems, HETP: Symptoms related to heart/cardiovascular problems and EYEP: Symptoms related to eye problems.
Fig. 3‘ROCR’ curve, for the TP and FP rates values
Fig. 4Correlation plot shows the graphical representation of correlations between the variables ‘TYPE’, ‘KDYP’, ‘HETP’, ‘EYEP’ and ‘HBPk’