| Literature DB >> 32831065 |
Oleg Metsker1, Kirill Magoev2,3, Alexey Yakovlev1,2, Stanislav Yanishevskiy1, Georgy Kopanitsa4, Sergey Kovalchuk2, Valeria V Krzhizhanovskaya2,3.
Abstract
BACKGROUND: Methods of data mining and analytics can be efficiently applied in medicine to develop models that use patient-specific data to predict the development of diabetic polyneuropathy. However, there is room for improvement in the accuracy of predictive models. Existing studies of diabetes polyneuropathy considered a limited number of predictors in one study to enable a comparison of efficiency of different machine learning methods with different predictors to find the most efficient one. The purpose of this study is the implementation of machine learning methods for identifying the risk of diabetes polyneuropathy based on structured electronic medical records collected in databases of medical information systems.Entities:
Keywords: Clinical decision support; Machine learning; Polyneuropathy; Risk factors
Mesh:
Year: 2020 PMID: 32831065 PMCID: PMC7444272 DOI: 10.1186/s12911-020-01215-w
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Workflow of predictive model development training
T-SNE parameters
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| perplexity | 30.0 | metric | Euclidean |
| early_exaggeration | 12.0 | init | random |
| learning_rate | 200.0 | verbose | 0 |
| n_iter | 1000 | random_state | None |
| n_iter_without_progress | 300 | method | barnes_hut |
| min_grad_norm | 1e-07 | angle | 0.5 |
Classification models
| Model | Parameter | Value |
|---|---|---|
| Artificial Neural Network (ANN) (Multilayer Perceptron) | Number of nodes in the hidden layer | From 5 to 20 features |
| Support Vector Machine (SVM) | Kernel function | Linear |
| Decision tree | Max tree depth | 2, 4, 8, 16 |
| Linear regression | Normalize; n_jobs | False; none |
| Logistic regression | Criterion; Min_samples_split | Gini; 2 |
Properties of datasets after preprocessing
| Time series replacement | |||
|---|---|---|---|
| Last values | Stats | Maximums/Minimums | |
| Fill in the missing data | 5425 patients, 31 parameters, 43.17% developed polyneuropathy | 5425 patients, 31 parameters, 43.17% developed polyneuropathy | 5425 patients, 31 parameters, 43.17% developed polyneuropathy |
| Filter out the missing data | 5094 patients, 19 parameters, 43.62% developed polyneuropathy | 5094 patients, 31 parameters, 43.62% developed polyneuropathy | 5094 patients, 19 parameters, 43.62% developed polyneuropathy |
Fig. 2Clustering patients with diabetes using T-SNE machine learning method
Cluster description
| Cluster № | Count | Polyneuropahy, % | Age, years median | Age, years STD | Males, % | Females, % |
|---|---|---|---|---|---|---|
| Cluster #0 | 1877 | 59 | 65 | 9.79 | 0 | 100 |
| Cluster #1 | 1628 | 43 | 60 | 13.74 | 100 | 0 |
| Cluster #2 | 101 | 40 | 61 | 9.14 | 100 | 0 |
| Cluster #3 | 930 | 0 | 31 | 6.53 | 0 | 100 |
| Cluster #4 | 119 | 0 | 21 | 6.10 | 100 | 0 |
| Cluster #5 | 111 | 23 | 39 | 20.29 | 99 | 1 |
Fig. 3The feature importance of decision tree (a). The correlation matrix of features and complication (b)
Performance of the classifiers without comorbidities
| Model | Precision | Recall | F1 score | Accuracy |
|---|---|---|---|---|
| ANN | 0.6736 | 0.7342 | 0.7471 | |
| SVM | 0.6817 | 0.7655 | 0.7210 | 0.7443 |
| Decision tree | 0.6526 | 0.7302 | 0.6865 | 0.7039 |
| Linear Regression | 0.6777 | 0.7911 | ||
| Logistic Regression | 0.7693 | 0.7232 | 0.7384 |
Performance evaluation with comorbidities
| Precision | Recall | F1 score | Accuracy | AUC | |
|---|---|---|---|---|---|
| ANN | 0.7221 | 0.7734 | 0.8120 | 0.8922 | |
| SVM | 0.7795 | 0.7864 | 0.7823 | 0.8054 | 0.8644 |
| Decision tree | 0.7982 | ||||
| Linear Regression | 0.7934 | 0.8134 | 0.8031 | 0.8228 | 0.8926 |
| Logistic Regression | 0.7961 | 0.8115 | 0.8036 | 0.8238 | 0.8941 |
Fig. 4Decision tree for polyneuropathy prediction
Fig. 5SVM interpretation
Fig. 6DT interpretation
Fig. 7ANN interpretation
Fig. 8Logistic regression interpretation
Fig. 9Interpretation summary
1. Hemoglobin (HGB), 2. Leukocytes (LEU), 3. Platelets (PLT), 4. pH, 5. Mean platelet volume (MPW), 6. Creatinine, 7. Mean cell hemoglobin (MCH), 8. Neutrophils (NEUT), 9. Mean corpuscular volume (MCV), 10. Cholesterol, 11. Glucose, 12. Procalcitonin (PCT), 13. Red blood cell distribution width (RDW), 14. Alanine transaminase (ALT), | 15. Bilirubin, 16. Platelet distribution width (PDW), 17. High-density lipoprotein (HDL), 18. Aspartate aminotransferase (AST), 19. White blood count (WBC), 20. Troponin, 21. Monocytes, 22. Bilirubin, 23. Red blood cell count (RBC), 24. Triglycerides, 25. Hematocrit (HCT), 26. Low-density lipoproteins (LDL), 27. Blood in urine (BLD). |