| Literature DB >> 34981033 |
Vitor P Bezzan1, Cleber D Rocco2.
Abstract
Blood tests play an essential role in everyday medicine and are used by doctors in several diagnostic procedures. Moreover, this data is multivariate - and often some diseases, such as COVID-19, could have different symptom manifestations and outcomes. This study proposes a method of extracting useful information from blood tests using UMAP technique - Uniform Manifold Approximation and Projection for Dimension Reduction combined with DBSCAN clustering and statistical approaches. The analysis performed here indicates several clusters of infection prevalence varying between 2%-37%, showing that our procedure is indeed capable of finding different patterns. A possible explanation is that COVID-19 is not just a respiratory infection but a systemic disease with critical hematological implications, primarily on white-cell fractions, as indicated by relevant statistical test p -values in the range of 0.03-0.1. The novel analysis procedure proposed could be adopted in other data-sets of different illnesses to help researchers to discover new patterns of data that could be used in various diseases and contexts.Entities:
Keywords: Applied AI; Blood exam; COVID-19; Dimensionality reduction; Machine learning; Unsupervised learning
Year: 2021 PMID: 34981033 PMCID: PMC8716149 DOI: 10.1016/j.imu.2021.100828
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Selected references for UMAP usage in medicine and biology.
| Publication year | Reference | Application/Usage |
|---|---|---|
| 2019 | Single-cell visualization using UMAP | |
| 2019 | Population patterns in genomic cohorts | |
| 2021 | UMAP in population genetics | |
| 2021 | Artifacts in microbiome data | |
| 2021 | Transfer learning on molecular fingerprints | |
| 2021 | Molecular dynamics simulations |
Parameter grid and intervals used in the clustering procedure.
| Parameter | Interval | Description |
|---|---|---|
| neighbors | Balance between local and global data representation | |
| spread | Minimum distance allowed between points in representation | |
| eps | Maximum neighborhood distance in DBSCAN |
Fig. 1Steps synthesizing our method for both experiments proposed.
Variables used for study.
| Fraction | Components |
|---|---|
| Red Cell | Hematocrit, Hemoglobin, Red Cells, MCHC, MCH, MCV, RDW |
| White Cell | Platelets, MPV, Lymphocytes, Leukocytes, Basophils, Eosinophils, Monocytes |
Fig. 2White cell blood count distributions, normalized for patients.
Fig. 3DBSCAN cluster results for Experiment I. On the right, all COVID-19 patients with clusters associated.
Means for variables in clusters found in experiment I (Red components - extreme values in bold).
| Hematocrit | Hemoglobin | Red Cells | MCHC | MCH | MCV | RDW | Patients | ||
|---|---|---|---|---|---|---|---|---|---|
| Cluster | |||||||||
| 2 | 0.449555 | 0.360825 | 0.403754 | −0.219273 | −0.129423 | −0.025629 | −0.192997 | 34.6 | 26 |
| 4 | 0.331591 | 0.353596 | 0.177950 | 0.187976 | 0.259933 | 0.197007 | −0.155573 | 23.1 | 39 |
| 6 | −0.046087 | −0.249906 | 19.4 | 31 | |||||
| 0 | −0.123704 | −0.160615 | −0.269449 | −0.167212 | 0.249553 | 0.363449 | 0.330257 | 17.9 | 145 |
| 5 | −0.369784 | 16.0 | 25 | ||||||
| 1 | −0.015429 | 0.021216 | 0.056257 | 0.141979 | −0.079685 | −0.156664 | −0.216660 | 7.4 | 269 |
| 3 | −0.285210 | −0.324563 | −0.212740 | −0.133359 | 2.9 | 34 |
Means for variables in clusters found in experiment I (White components - extreme values in bold).
| Platelets | MPV | Lymphocytes | Leukocytes | Basophils | Eosinophils | Monocytes | Patients | ||
|---|---|---|---|---|---|---|---|---|---|
| Cluster | |||||||||
| 2 | 0.092664 | −0.018652 | 34.6 | 26 | |||||
| 4 | −0.327375 | −0.262615 | −0.127550 | −0.291689 | −0.231599 | 0.100975 | 23.1 | 39 | |
| 6 | 0.244400 | −0.376571 | 0.014347 | 0.068407 | 0.130960 | −0.105025 | 19.4 | 31 | |
| 0 | −0.129383 | −0.154026 | −0.031201 | 0.037454 | 0.068019 | 0.019385 | 17.9 | 145 | |
| 5 | −0.108903 | −0.016250 | −0.301180 | −0.465017 | 16.0 | 25 | |||
| 1 | 0.115441 | −0.031031 | 0.160436 | 0.065219 | −0.026183 | 0.058192 | −0.004671 | 7.4 | 269 |
| 3 | −0.118372 | 0.242435 | −0.133926 | 0.023392 | 2.9 | 34 |
Fig. 4DBSCAN cluster results for Experiment II. On the right, all special-care patients.
Means for variables and respective t and KS tests for clusters found in experiment II (Red components - no significant -values in bold).
| Hematocrit | Hemoglobin | Red Cells | MCHC | MCH | MCV | RDW | Special Care (%) | Patients | |
|---|---|---|---|---|---|---|---|---|---|
| Mean - Cluster 1 | 0.192373 | 0.228284 | 0.124672 | 0.187246 | 0.152039 | 0.078920 | −0.227019 | 7.0 | 14 |
| Mean - Cluster 2 | 0.276826 | 0.302162 | 0.261730 | 0.166864 | 0.034623 | −0.037691 | −0.194673 | 61.0 | 67 |
| t-test | 0.638796 | 0.619572 | 0.701361 | 0.466539 | 0.285301 | 0.295333 | 0.562766 | – | – |
| KS-test | 0.440488 | 0.675420 | 0.788581 | 0.458707 | 0.284728 | 0.348343 | 0.863925 | – | – |
Means for variables and respective t and KS tests for clusters found in experiment II (White components - significant -values in bold).
| Platelets | MPV | Lymphocytes | Leukocytes | Basophils | Eosinophils | Monocytes | Special Care (%) | Patients | |
|---|---|---|---|---|---|---|---|---|---|
| Mean - Cluster 1 | −0.445631 | 0.331228 | 0.063713 | −0.537869 | 0.016237 | −0.305755 | 0.858424 | 7.0 | 14 |
| Mean - Cluster 2 | −0.734901 | 0.263530 | −0.049911 | −0.741464 | −0.205530 | −0.516632 | 0.406545 | 61.0 | 67 |
| t-test | 0.399595 | 0.331979 | 0.150230 | 0.156617 | – | – | |||
| KS-test | 0.689187 | 0.272100 | 0.564482 | 0.284728 | 0.105875 | – | – | ||