| Literature DB >> 31766665 |
Marina Sánchez-Rico1, Jesús M Alvarado1.
Abstract
The study of diagnostic associations entails a large number of methodological problems regarding the application of machine learning algorithms, collinearity and wide variability being some of the most prominent ones. To overcome these, we propose and tested the usage of uniform manifold approximation and projection (UMAP), a very recent, popular dimensionality reduction technique. We showed its effectiveness by using it on a large Spanish clinical database of patients diagnosed with depression, to whom we applied UMAP before grouping them using a hierarchical agglomerative cluster analysis. By extensively studying its behavior and results, validating them with purely unsupervised metrics, we show that they are consistent with well-known relationships, which validates the applicability of UMAP to advance the study of comorbidities.Entities:
Keywords: UMAP; comorbidities; depression; hierarchical clustering
Year: 2019 PMID: 31766665 PMCID: PMC6960661 DOI: 10.3390/bs9120122
Source DB: PubMed Journal: Behav Sci (Basel) ISSN: 2076-328X
Figure 1Sample and variables selection.
Figure 2Statistical procedure. (a) Application of 24 combinations of uniform manifold approximation and projection (UMAP), varying the number of dimensions (2–5), minimum embedding distance (0.1, 0.5), and number of neighbors (15, 50, 100). (b) Application of agglomerative hierarchical clustering for each of the 1824 combinations changing the clustering method (average, centroid, Ward, and complete) and number of clusters selected (2–20). (c) Average silhouette coefficent for each computed model.
Figure 3Average silhouette coefficent (SC) by number of dimensions produced by UMAP.
Figure 4Average silhouette coefficent (SC) by number of clusters (k) and average and Ward clustering methods. For each k-value and clustering method, we can see a point for each generated model, a combination of the minimum embedding distance, and number of neighbors in the UMAP projection.
Figure 5UMAP two-dimensional space projection with Ward’s clusters distribution.
Figure 6Silhouette coefficient index for each of the model selected clusters.
Percentage of diagnoses coded in ICD-10 with 25% or more appearance in each cluster. The first column (Chapter name) indicates the icd10 chapter in which the corresponding diagnosis is included.
| Chapter Name | 3D Code | 3D Name | Cl 1 | Cl 2 | Cl 3 | Cl 4 | Cl 5 | Cl 6 | Cl 7 | Cl 8 | Cl 9 | Cl 10 | Cl 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Endocrine, Nutritional | |||||||||||||
| and Metabolic Diseases | E03 | Other hypothyroidism | 31.3 | ||||||||||
| E11 | Type 2 diabetes mellitus | 39.2 | 39.7 | ||||||||||
| E78 | Disorders of lipoprotein | ||||||||||||
| metabolism and lipidemias | 42.0 | 33.8 |
|
| 39.1 | ||||||||
| E87 | Other disorders of fluid, | ||||||||||||
| electrolyte and acid-base | 30.0 | ||||||||||||
| Mental, Behavioral and | |||||||||||||
| Neurodevelopmental disorders | F10 | Alcohol related disorders | 35.5 | ||||||||||
| F17 | Nicotine dependence |
| |||||||||||
| F32 | Major depressive disorder, | ||||||||||||
| single episode |
|
|
|
|
|
|
|
|
|
| |||
| F33 | Major depressive disorder, | ||||||||||||
| recurrent |
| ||||||||||||
| Diseases of the | |||||||||||||
| Circulatory System | I10 | Primary hypertension |
| 31.3 |
| 49.1 | |||||||
| I12 | Hypertensive chronic | ||||||||||||
| kidney disease | 55.5 | ||||||||||||
| I48 | Atrial fibrillation flutter | 37.9 | |||||||||||
| I50 | Heart faliure | 43.5 | |||||||||||
| Diseases of the | |||||||||||||
| Respiratory System | J96 | Respiratory failure, | |||||||||||
| not elsewhere classified | 46.1 | ||||||||||||
| Diseases of the | |||||||||||||
| Genitourinary System | N17 | Acute kidney failure | 39.8 | ||||||||||
| N18 | Chronic Kidney Disease | 66.2 | |||||||||||
| Pregnancy, Childbirth and | |||||||||||||
| the Puerperium | O99 | Other maternal diseases | |||||||||||
| classifiable elsewhere (...) |
| ||||||||||||
| Symptoms, signs and abnormal | |||||||||||||
| clinical and laboratory findings | R05 | Cough | 55.4 | 32.1 | 38.4 |
|
| ||||||
| Factors influencing health status | |||||||||||||
| and contact with health services | Z85 | Personal history of | |||||||||||
| malignant neoplasm | 58.3 | ||||||||||||
| Z88 | Allergy status to drugs, | ||||||||||||
| medicaments (...) |
| ||||||||||||
| Z90 | Acquired absence of organs, | ||||||||||||
| not elsewhere classified | 32.3 | ||||||||||||
| Z92 | Personal history of medical | ||||||||||||
| treatment | 28.7 | ||||||||||||
| Z99 | Dependence on enabling | ||||||||||||
| machines and devices (...) | 31.0 |