Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Simulation-derived best practices for clustering clinical data.

Literature DB >> 33862229

Simulation-derived best practices for clustering clinical data.

Caitlin E Coombes¹, Xin Liu², Zachary B Abrams³, Kevin R Coombes⁴, Guy Brock⁵.

Abstract

INTRODUCTION: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.
METHODS: We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit.
RESULTS: HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets. DISCUSSION: Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.

Entities: Chemical

Keywords: Clinical informatics; Clinical trial; Clustering; Electronic health record; Unsupervised machine learning

Mesh：

Year: 2021 PMID： 33862229 PMCID： PMC9017600 DOI： 10.1016/j.jbi.2021.103788

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

34 in total

Review 1. Clustering algorithms in biomedical research: a review.

Authors: Rui Xu; Donald C Wunsch
Journal: IEEE Rev Biomed Eng Date: 2010

2. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts.

Authors: Peter J Castaldi; Marta Benet; Hans Petersen; Nicholas Rafaels; James Finigan; Matteo Paoletti; H Marike Boezen; Judith M Vonk; Russell Bowler; Massimo Pistolesi; Milo A Puhan; Josep Anto; Els Wauters; Diether Lambrechts; Wim Janssens; Francesca Bigazzi; Gianna Camiciottoli; Michael H Cho; Craig P Hersh; Kathleen Barnes; Stephen Rennard; Meher Preethi Boorgula; Jennifer Dy; Nadia N Hansel; James D Crapo; Yohannes Tesfaigzi; Alvar Agusti; Edwin K Silverman; Judith Garcia-Aymerich
Journal: Thorax Date: 2017-06-21 Impact factor: 9.139

3. Using Unsupervised Machine Learning to Identify Subgroups Among Home Health Patients With Heart Failure Using Telehealth.

Authors: Eliezer Bose; Kavita Radhakrishnan
Journal: Comput Inform Nurs Date: 2018-05 Impact factor: 1.985

4. LDOC1 mRNA is differentially expressed in chronic lymphocytic leukemia and predicts overall survival in untreated patients.

Authors: Hatice Duzkale; Carmen D Schweighofer; Kevin R Coombes; Lynn L Barron; Alessandra Ferrajoli; Susan O'Brien; William G Wierda; John Pfeifer; Tadeusz Majewski; Bogdan A Czerniak; Jeffrey L Jorgensen; L Jeffrey Medeiros; Emil J Freireich; Michael J Keating; Lynne V Abruzzo
Journal: Blood Date: 2011-02-10 Impact factor: 22.113

5. Identification of subtypes in subjects with mild-to-moderate airflow limitation and its clinical and socioeconomic implications.

Authors: Jin Hwa Lee; Chin Kook Rhee; Kyungjoo Kim; Jee-Ae Kim; Sang Hyun Kim; Kwang Ha Yoo; Woo Jin Kim; Yong Bum Park; Hye Yun Park; Ki-Suck Jung
Journal: Int J Chron Obstruct Pulmon Dis Date: 2017-04-12

6. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records.

Authors: Maria Pikoula; Jennifer Kathleen Quint; Francis Nissen; Harry Hemingway; Liam Smeeth; Spiros Denaxas
Journal: BMC Med Inform Decis Mak Date: 2019-04-18 Impact factor: 2.796

7. A cluster-based approach for integrating clinical management of Medicare beneficiaries with multiple chronic conditions.

Authors: Brent M Egan; Susan E Sutherland; Peter L Tilkemeier; Robert A Davis; Valinda Rutledge; Angelo Sinopoli
Journal: PLoS One Date: 2019-06-19 Impact factor: 3.240

Simulation-derived best practices for clustering clinical data.

Review 1. Clustering algorithms in biomedical research: a review.

2. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts.

3. Using Unsupervised Machine Learning to Identify Subgroups Among Home Health Patients With Heart Failure Using Telehealth.

4. LDOC1 mRNA is differentially expressed in chronic lymphocytic leukemia and predicts overall survival in untreated patients.

5. Identification of subtypes in subjects with mild-to-moderate airflow limitation and its clinical and socioeconomic implications.

6. Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records.

7. A cluster-based approach for integrating clinical management of Medicare beneficiaries with multiple chronic conditions.

8. Detecting Systemic Data Quality Issues in Electronic Health Records.

9. Mercator: A Pipeline For Multi-Method, Unsupervised Visualization And Distance Generation.

10. Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

1. A cohesin-associated gene score may predict immune checkpoint blockade in hepatocellular carcinoma.