| Literature DB >> 35051240 |
Habib Jafari1, Shamarina Shohaimi2, Nader Salari3,4, Ali Akbar Kiaei5, Farid Najafi6, Soleiman Khazaei1, Mehrdad Niaparast1, Anita Abdollahi1, Masoud Mohammadi7.
Abstract
Anthropometry is a Greek word that consists of the two words "Anthropo" meaning human species and "metery" meaning measurement. It is a science that deals with the size of the body including the dimensions of different parts, the field of motion and the strength of the muscles of the body. Specific individual dimensions such as heights, widths, depths, distances, environments and curvatures are usually measured. In this article, we investigate the anthropometric characteristics of patients with chronic diseases (diabetes, hypertension, cardiovascular disease, heart attacks and strokes) and find the factors affecting these diseases and the extent of the impact of each to make the necessary planning. We have focused on cohort studies for 10047 qualified participants from Ravansar County. Machine learning provides opportunities to improve discrimination through the analysis of complex interactions between broad variables. Among the chronic diseases in this cohort study, we have used three deep neural network models for diagnosis and prognosis of the risk of type 2 diabetes mellitus (T2DM) as a case study. Usually in Artificial Intelligence for medicine tasks, Imbalanced data is an important issue in learning and ignoring that leads to false evaluation results. Also, the accuracy evaluation criterion was not appropriate for this task, because a simple model that is labeling all samples negatively has high accuracy. So, the evaluation criteria of precession, recall, AUC, and AUPRC were considered. Then, the importance of variables in general was examined to determine which features are more important in the risk of T2DM. Finally, personality feature was added, in which individual feature importance was examined. Performing by Shapley Values, the model is tuned for each patient so that it can be used for prognosis of T2DM risk for that patient. In this paper, we have focused and implemented a full pipeline of Data Creation, Data Preprocessing, Handling Imbalanced Data, Deep Learning model, true Evaluation method, Feature Importance and Individual Feature Importance. Through the results, the pipeline demonstrated competence in improving the Diagnosis and Prognosis the risk of T2DM with personalization capability.Entities:
Mesh:
Year: 2022 PMID: 35051240 PMCID: PMC8775210 DOI: 10.1371/journal.pone.0262701
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Methodology.
Fig 2Impute the data to handle missing values.
Fig 3The architecture of deep learning method for diagnose T2DM.
Mean and standard deviation of anthropometric indexes and risk factors of chronic diseases.
| count | mean | Std. | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Age_At_Interview | 10047 | 47.33 | 8.29 | 34 | 40 | 46 | 54 | 70 |
| HeightCm | 10047 | 162.77 | 9.3 | 114.4 | 155.4 | 162.1 | 170 | 197.7 |
| WeightKg | 10047 | 72.89 | 13.64 | 31.5 | 63.5 | 72.2 | 81.3 | 145.5 |
| WaistCircumference | 10047 | 97.34 | 10.51 | 57 | 90.5 | 97 | 104 | 148.5 |
| HipCircumference | 10047 | 102.63 | 8.87 | 64.7 | 97 | 102 | 107.6 | 150 |
| BMI | 10047 | 27.51 | 4.64 | 12.51 | 24.42 | 27.25 | 30.26 | 52.8 |
| Age_T2DM | 969 | 51.88 | 7.61 | 35 | 46 | 52 | 58 | 67 |
| Age_Not_T2DM | 9078 | 46.84 | 8.21 | 34 | 40 | 45.5 | 53 | 70 |
| HeightCm_T2DM | 969 | 160.96 | 9.09 | 137.6 | 153.9 | 160.2 | 167.8 | 185.7 |
| HeightCm_Not_T2DM | 9078 | 162.96 | 9.3 | 114.4 | 155.6 | 162.4 | 170.2 | 197.7 |
| WeightKg_T2DM | 969 | 73.3 | 12.7 | 37.5 | 64.8 | 72.8 | 81 | 127 |
| WeightKg_Not_T2DM | 9078 | 72.85 | 13.73 | 31.5 | 63.5 | 72.2 | 81.3 | 145.5 |
| WaistCircumference_T2DM | 969 | 99.65 | 9.89 | 65.2 | 93 | 99.5 | 106 | 132.5 |
| WaistCircumference_Not_T2DM | 9078 | 97.09 | 10.54 | 57 | 90 | 97 | 104 | 148.5 |
| HipCircumference_T2DM | 969 | 102.69 | 8.63 | 79.2 | 97 | 102 | 107.4 | 140.5 |
| HipCircumference_Not_T2DM | 9078 | 102.62 | 8.9 | 64.7 | 97 | 102 | 107.6 | 150 |
| BMI_T2DM | 969 | 28.29 | 4.33 | 15.59 | 25.37 | 28.09 | 30.79 | 48.84 |
| BMI_Not_T2DM | 9078 | 27.43 | 4.67 | 12.51 | 24.34 | 27.16 | 30.18 | 52.8 |
Fig 4Looking at the data distribution for positive and negative classes.
Fig 5Evaluating baseline model performance.
a) Advantage of bias initialization; b) confusion matrix; c-f) evaluate metrics.
Fig 6Evaluating weighted model performance.
a-d) evaluate metrics; e) confusion matrix.
Fig 7Evaluating resampled model performance.
a-d) evaluate metrics; e) confusion matrix.
Fig 8(left) AUC and (right) AUPRC of baseline, weighted, and resampled models.
An individual sample A against median of dataset features.
| Sample | Gender ID | Age_At_Interview | Height Cm | Weight Kg | Waist Circumference | Hip Circumference | BMI |
|---|---|---|---|---|---|---|---|
| Patient A | male | 60 | 159.7 | 121.2 | 133.0 | 150.0 | 47.52 |
| Median (Total) | -- | 46 | 162.1 | 72.2 | 97.0 | 102.0 | 27.25 |
| Median (T2DM) | -- | 52 | 160.2 | 72.8 | 99.5 | 102.0 | 28.09 |
| Median (Not T2DM) | -- | 45 | 162.4 | 72.2 | 97.0 | 102.0 | 27.16 |
Fig 9Heat-map of features with annotations and cluster-map of features.
All feature set with Hip Circumference on the left column vs. the corresponding feature set with Hip Circumference removed on the right column.
| BMI, Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview), Gender ID, and | vs. | BMI, Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview), Gender ID. |
| Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview), Gender ID, and | vs. | Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview), Gender ID. |
| BMI, Weight (Kg), Height (Cm), Age (At Interview), Gender ID, and | vs. | BMI, Weight (Kg), Height (Cm), Age (At Interview), Gender ID. |
| BMI, Waist Circumference, Weight (Kg), Age (At Interview), Gender ID, and | vs. | BMI, Waist Circumference, Weight (Kg), Age (At Interview), Gender ID. |
| BMI, Waist Circumference, Weight (Kg), Height (Cm), Gender ID, and | vs. | BMI, Waist Circumference, Weight (Kg), Height (Cm), Gender ID. |
| BMI, Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview), and | vs. | BMI, Waist Circumference, Weight (Kg), Height (Cm), Age (At Interview). |
| BMI, Height (Cm), Age (At Interview), Gender ID, and | vs. | BMI, Height (Cm), Age (At Interview), Gender ID. |
| BMI, Weight (Kg), Age (At Interview), Gender ID, and | vs. | BMI, Weight (Kg), Age (At Interview), Gender ID. |
| BMI, Weight (Kg), Height (Cm), Gender ID, and | vs. | BMI, Weight (Kg), Height (Cm), Gender ID. |
| BMI, Weight (Kg), Height (Cm), Age (At Interview), and | vs. | BMI, Weight (Kg), Height (Cm), Age (At Interview). |
|
|
|
|
| Gender ID, and | vs. | Gender ID |
| vs. | {} |
Fig 10Feature contribution for predicting the risk of patient A.
Fig 11Force plot of feature contribution for predicting the diabetes risk of patient A.
Fig 12Summary plot of Shapley values for all patients, aim to predict the diabetes risk.
Fig 13Bar summary plot for the absolute value of the Shapley values, to show the feature importance overall.
Fig 14Influence of BMI and waist circumference features on model output.
Fig 15Shapley interaction values for different socio-demographic features.
Fig 16Shapley interact value for all features.