Literature DB >> 27303057

Avoiding common pitfalls when clustering biological data.

Tom Ronan1, Zhijie Qi1, Kristen M Naegle2.   

Abstract

Clustering is an unsupervised learning method, which groups data points based on similarity, and is used to reveal the underlying structure of data. This computational approach is essential to understanding and visualizing the complex data that are acquired in high-throughput multidimensional biological experiments. Clustering enables researchers to make biological inferences for further experiments. Although a powerful technique, inappropriate application can lead biological researchers to waste resources and time in experimental follow-up. We review common pitfalls identified from the published molecular biology literature and present methods to avoid them. Commonly encountered pitfalls relate to the high-dimensional nature of biological data from high-throughput experiments, the failure to consider more than one clustering method for a given problem, and the difficulty in determining whether clustering has produced meaningful results. We present concrete examples of problems and solutions (clustering results) in the form of toy problems and real biological data for these issues. We also discuss ensemble clustering as an easy-to-implement method that enables the exploration of multiple clustering solutions and improves robustness of clustering solutions. Increased awareness of common clustering pitfalls will help researchers avoid overinterpreting or misinterpreting the results and missing valuable insights when clustering biological data.
Copyright © 2016, American Association for the Advancement of Science.

Mesh:

Year:  2016        PMID: 27303057     DOI: 10.1126/scisignal.aad1932

Source DB:  PubMed          Journal:  Sci Signal        ISSN: 1945-0877            Impact factor:   8.192


  32 in total

Review 1.  Using Large Datasets to Understand Nanotechnology.

Authors:  Kalina Paunovska; David Loughrey; Cory D Sago; Robert Langer; James E Dahlman
Journal:  Adv Mater       Date:  2019-08-20       Impact factor: 30.849

2.  Spatial Segmentation of Mass Spectrometry Imaging Data by Combining Multivariate Clustering and Univariate Thresholding.

Authors:  Hang Hu; Ruichuan Yin; Hilary M Brown; Julia Laskin
Journal:  Anal Chem       Date:  2021-02-11       Impact factor: 6.986

Review 3.  Machine learning applications in cell image analysis.

Authors:  Andrey Kan
Journal:  Immunol Cell Biol       Date:  2017-03-15       Impact factor: 5.126

4.  Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles.

Authors:  Andrew E Brereton; P Andrew Karplus
Journal:  Protein Sci       Date:  2017-08-11       Impact factor: 6.725

Review 5.  Using Gene Expression to Study Specialized Metabolism-A Practical Guide.

Authors:  Riccardo Delli-Ponti; Devendra Shivhare; Marek Mutwil
Journal:  Front Plant Sci       Date:  2021-01-12       Impact factor: 5.753

6.  Statistical power for cluster analysis.

Authors:  Edwin S Dalmaijer; Camilla L Nord; Duncan E Astle
Journal:  BMC Bioinformatics       Date:  2022-05-31       Impact factor: 3.307

7.  Analyzing 2000 in Vivo Drug Delivery Data Points Reveals Cholesterol Structure Impacts Nanoparticle Delivery.

Authors:  Kalina Paunovska; Carmen J Gil; Melissa P Lokugamage; Cory D Sago; Manaka Sato; Gwyn N Lando; Marielena Gamboa Castro; Anton V Bryksin; James E Dahlman
Journal:  ACS Nano       Date:  2018-07-20       Impact factor: 15.881

Review 8.  A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.

Authors:  Ashraful Haque; Jessica Engel; Sarah A Teichmann; Tapio Lönnberg
Journal:  Genome Med       Date:  2017-08-18       Impact factor: 11.117

9.  Rigid geometry solves "curse of dimensionality" effects in clustering methods: An application to omics data.

Authors:  Shun Adachi
Journal:  PLoS One       Date:  2017-06-14       Impact factor: 3.240

10.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.

Authors:  Peijie Lin; Michael Troup; Joshua W K Ho
Journal:  Genome Biol       Date:  2017-03-28       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.