Literature DB >> 33603019

Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark.

Gregoire Preud'homme1,2, Kevin Duarte1, Kevin Dalleau3, Claire Lacomblez1, Emmanuel Bresso3, Malika Smaïl-Tabbone2,3, Miguel Couceiro3, Marie-Dominique Devignes2,3, Masatake Kobayashi1,2, Olivier Huttin1,2, João Pedro Ferreira1,2, Faiez Zannad1,2, Patrick Rossignol1,2, Nicolas Girerd4,5,6.   

Abstract

The choice of the most appropriate unsupervised machine-learning method for "heterogeneous" or "mixed" data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of "ready-to-use" tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

Entities:  

Year:  2021        PMID: 33603019      PMCID: PMC7892576          DOI: 10.1038/s41598-021-83340-8

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  3 in total

1.  VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values.

Authors:  Matthieu Marbac; Mohammed Sedki
Journal:  Bioinformatics       Date:  2019-04-01       Impact factor: 6.937

2.  Eplerenone, a selective aldosterone blocker, in patients with left ventricular dysfunction after myocardial infarction.

Authors:  Bertram Pitt; Willem Remme; Faiez Zannad; James Neaton; Felipe Martinez; Barbara Roniker; Richard Bittman; Steve Hurley; Jay Kleiman; Marjorie Gatlin
Journal:  N Engl J Med       Date:  2003-03-31       Impact factor: 91.245

3.  Data-Driven Approach to Identify Subgroups of Heart Failure With Reduced Ejection Fraction Patients With Different Prognoses and Aldosterone Antagonist Response Patterns.

Authors:  João Pedro Ferreira; Kevin Duarte; John J V McMurray; Bertram Pitt; Dirk J van Veldhuisen; John Vincent; Tariq Ahmad; Jasper Tromp; Patrick Rossignol; Faiez Zannad
Journal:  Circ Heart Fail       Date:  2018-07       Impact factor: 8.790

  3 in total
  7 in total

1.  A foresight whole systems obesity classification for the English UK biobank cohort.

Authors:  Stephen Clark; Nik Lomax; Mark Birkin; Michelle Morris
Journal:  BMC Public Health       Date:  2022-02-18       Impact factor: 4.135

2.  DIVIS: a semantic DIstance to improve the VISualisation of heterogeneous phenotypic datasets.

Authors:  Rayan Eid; Claudine Landès; Alix Pernet; Emmanuel Benoît; Pierre Santagostini; Angelina El Ghaziri; Julie Bourbeillon
Journal:  BioData Min       Date:  2022-04-04       Impact factor: 2.522

3.  Comparison of Unsupervised Machine Learning Approaches for Cluster Analysis to Define Subgroups of Heart Failure with Preserved Ejection Fraction with Different Outcomes.

Authors:  Hirmand Nouraei; Hooman Nouraei; Simon W Rabkin
Journal:  Bioengineering (Basel)       Date:  2022-04-16

4.  A machine learning-based approach to determine infection status in recipients of BBV152 (Covaxin) whole-virion inactivated SARS-CoV-2 vaccine for serological surveys.

Authors:  Prateek Singh; Rajat Ujjainiya; Satyartha Prakash; Salwa Naushin; Viren Sardana; Nitin Bhatheja; Ajay Pratap Singh; Joydeb Barman; Kartik Kumar; Saurabh Gayali; Raju Khan; Birendra Singh Rawat; Karthik Bharadwaj Tallapaka; Mahesh Anumalla; Amit Lahiri; Susanta Kar; Vivek Bhosale; Mrigank Srivastava; Madhav Nilakanth Mugale; C P Pandey; Shaziya Khan; Shivani Katiyar; Desh Raj; Sharmeen Ishteyaque; Sonu Khanka; Ankita Rani; Jyotsna Sharma; Anuradha Seth; Mukul Dutta; Nishant Saurabh; Murugan Veerapandian; Ganesh Venkatachalam; Deepak Bansal; Dinesh Gupta; Prakash M Halami; Muthukumar Serva Peddha; Ravindra P Veeranna; Anirban Pal; Ranvijay Kumar Singh; Suresh Kumar Anandasadagopan; Parimala Karuppanan; Syed Nasar Rahman; Gopika Selvakumar; Subramanian Venkatesan; Malay Kumar Karmakar; Harish Kumar Sardana; Anamika Kothari; Devendra Singh Parihar; Anupma Thakur; Anas Saifi; Naman Gupta; Yogita Singh; Ritu Reddu; Rizul Gautam; Anuj Mishra; Avinash Mishra; Iranna Gogeri; Geethavani Rayasam; Yogendra Padwad; Vikram Patial; Vipin Hallan; Damanpreet Singh; Narendra Tirpude; Partha Chakrabarti; Sujay Krishna Maity; Dipyaman Ganguly; Ramakrishna Sistla; Narender Kumar Balthu; Kiran Kumar A; Siva Ranjith; B Vijay Kumar; Piyush Singh Jamwal; Anshu Wali; Sajad Ahmed; Rekha Chouhan; Sumit G Gandhi; Nancy Sharma; Garima Rai; Faisal Irshad; Vijay Lakshmi Jamwal; Masroor Ahmad Paddar; Sameer Ullah Khan; Fayaz Malik; Debashish Ghosh; Ghanshyam Thakkar; S K Barik; Prabhanshu Tripathi; Yatendra Kumar Satija; Sneha Mohanty; Md Tauseef Khan; Umakanta Subudhi; Pradip Sen; Rashmi Kumar; Anshu Bhardwaj; Pawan Gupta; Deepak Sharma; Amit Tuli; Saumya Ray Chaudhuri; Srinivasan Krishnamurthi; L Prakash; Ch V Rao; B N Singh; Arvindkumar Chaurasiya; Meera Chaurasiyar; Mayuri Bhadange; Bhagyashree Likhitkar; Sharada Mohite; Yogita Patil; Mahesh Kulkarni; Rakesh Joshi; Vaibhav Pandya; Sachin Mahajan; Amita Patil; Rachel Samson; Tejas Vare; Mahesh Dharne; Ashok Giri; Sachin Mahajan; Shilpa Paranjape; G Narahari Sastry; Jatin Kalita; Tridip Phukan; Prasenjit Manna; Wahengbam Romi; Pankaj Bharali; Dibyajyoti Ozah; Ravi Kumar Sahu; Prachurjya Dutta; Moirangthem Goutam Singh; Gayatri Gogoi; Yasmin Begam Tapadar; Elapavalooru Vssk Babu; Rajeev K Sukumaran; Aishwarya R Nair; Anoop Puthiyamadam; Prajeesh Kooloth Valappil; Adrash Velayudhan Pillai Prasannakumari; Kalpana Chodankar; Samir Damare; Ved Varun Agrawal; Kumardeep Chaudhary; Anurag Agrawal; Shantanu Sengupta; Debasis Dash
Journal:  Comput Biol Med       Date:  2022-04-25       Impact factor: 6.698

5.  Very Preterm Children Gut Microbiota Comparison at the Neonatal Period of 1 Month and 3.5 Years of Life.

Authors:  Gaël Toubon; Marie-José Butel; Jean-Christophe Rozé; Patricia Lepage; Johanne Delannoy; Pierre-Yves Ancel; Marie-Aline Charles; Julio Aires
Journal:  Front Microbiol       Date:  2022-07-22       Impact factor: 6.064

6.  How heterogeneous is the dengue transmission profile in Brazil? A study in six Brazilian states.

Authors:  Iasmim Ferreira de Almeida; Raquel Martins Lana; Cláudia Torres Codeço
Journal:  PLoS Negl Trop Dis       Date:  2022-09-12

Review 7.  Phenotype clustering in health care: A narrative review for clinicians.

Authors:  Tyler J Loftus; Benjamin Shickel; Jeremy A Balch; Patrick J Tighe; Kenneth L Abbott; Brian Fazzone; Erik M Anderson; Jared Rozowsky; Tezcan Ozrazgat-Baslanti; Yuanfang Ren; Scott A Berceli; William R Hogan; Philip A Efron; J Randall Moorman; Parisa Rashidi; Gilbert R Upchurch; Azra Bihorac
Journal:  Front Artif Intell       Date:  2022-08-12
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.