Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Avoiding common pitfalls when clustering biological data.

Literature DB >> 27303057

Avoiding common pitfalls when clustering biological data.

Tom Ronan¹, Zhijie Qi¹, Kristen M Naegle².

Abstract

Clustering is an unsupervised learning method, which groups data points based on similarity, and is used to reveal the underlying structure of data. This computational approach is essential to understanding and visualizing the complex data that are acquired in high-throughput multidimensional biological experiments. Clustering enables researchers to make biological inferences for further experiments. Although a powerful technique, inappropriate application can lead biological researchers to waste resources and time in experimental follow-up. We review common pitfalls identified from the published molecular biology literature and present methods to avoid them. Commonly encountered pitfalls relate to the high-dimensional nature of biological data from high-throughput experiments, the failure to consider more than one clustering method for a given problem, and the difficulty in determining whether clustering has produced meaningful results. We present concrete examples of problems and solutions (clustering results) in the form of toy problems and real biological data for these issues. We also discuss ensemble clustering as an easy-to-implement method that enables the exploration of multiple clustering solutions and improves robustness of clustering solutions. Increased awareness of common clustering pitfalls will help researchers avoid overinterpreting or misinterpreting the results and missing valuable insights when clustering biological data.

Mesh：

Year: 2016 PMID： 27303057 DOI： 10.1126/scisignal.aad1932

Source DB: PubMed Journal: Sci Signal ISSN： 1945-0877 Impact factor: 8.192

Keyword Cloud
Cited

32 in total

Review 1. Using Large Datasets to Understand Nanotechnology.

Authors: Kalina Paunovska; David Loughrey; Cory D Sago; Robert Langer; James E Dahlman
Journal: Adv Mater Date: 2019-08-20 Impact factor: 30.849

2. Spatial Segmentation of Mass Spectrometry Imaging Data by Combining Multivariate Clustering and Univariate Thresholding.

Authors: Hang Hu; Ruichuan Yin; Hilary M Brown; Julia Laskin
Journal: Anal Chem Date: 2021-02-11 Impact factor: 6.986

Avoiding common pitfalls when clustering biological data.

Review 1. Using Large Datasets to Understand Nanotechnology.

2. Spatial Segmentation of Mass Spectrometry Imaging Data by Combining Multivariate Clustering and Univariate Thresholding.

Review 3. Machine learning applications in cell image analysis.

4. Ensemblator v3: Robust atom-level comparative analyses and classification of protein structure ensembles.

Review 5. Using Gene Expression to Study Specialized Metabolism-A Practical Guide.

6. Statistical power for cluster analysis.

7. Analyzing 2000 in Vivo Drug Delivery Data Points Reveals Cholesterol Structure Impacts Nanoparticle Delivery.

Review 8. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.

9. Rigid geometry solves "curse of dimensionality" effects in clustering methods: An application to omics data.

10. CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.