Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Impact of similarity metrics on single-cell RNA-seq data clustering.

Literature DB >> 30137247

Impact of similarity metrics on single-cell RNA-seq data clustering.

Taiyun Kim¹, Irene Rui Chen¹, Yingxin Lin¹, Andy Yi-Yang Wang², Jean Yee Hwa Yang¹, Pengyi Yang¹.

Abstract

Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.

Entities: Gene

Keywords: clustering; correlation; distance; scRNA-seq; similarity metric; single-cell RNA-seq

Year: 2019 PMID： 30137247 DOI： 10.1093/bib/bby076

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
Cited

27 in total

1. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.

Authors: Yingxin Lin; Shila Ghazanfar; Kevin Y X Wang; Johann A Gagnon-Bartsch; Kitty K Lo; Xianbin Su; Ze-Guang Han; John T Ormerod; Terence P Speed; Pengyi Yang; Jean Yee Hwa Yang
Journal: Proc Natl Acad Sci U S A Date: 2019-04-26 Impact factor: 11.205

2. Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.

Authors: Shaked Slovin; Annamaria Carissimo; Francesco Panariello; Antonio Grimaldi; Valentina Bouché; Gennaro Gambardella; Davide Cacchiarelli
Journal: Methods Mol Biol Date: 2021

3. A single-cell atlas of mouse lung development.

Authors: Nicholas M Negretti; Erin J Plosa; John T Benjamin; Bryce A Schuler; A Christian Habermann; Christopher S Jetter; Peter Gulleman; Claire Bunn; Alice N Hackett; Meaghan Ransom; Chase J Taylor; David Nichols; Brittany K Matlock; Susan H Guttentag; Timothy S Blackwell; Nicholas E Banovich; Jonathan A Kropski; Jennifer M S Sucre
Journal: Development Date: 2021-12-20 Impact factor: 6.868

4. Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data.

Authors: Aidan Mcloughlin; Haiyan Huang
Journal: J Comput Biol Date: 2022-07-06 Impact factor: 1.549

Review 5. Heterogeneous data integration methods for patient similarity networks.

Authors: Jessica Gliozzo; Marco Mesiti; Marco Notaro; Alessandro Petrini; Alex Patak; Antonio Puertas-Gallardo; Alberto Paccanaro; Giorgio Valentini; Elena Casiraghi
Journal: Brief Bioinform Date: 2022-07-18 Impact factor: 13.994

6. Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans).

Authors: Alfred Ultsch; Jörn Lötsch
Journal: BMC Bioinformatics Date: 2022-06-16 Impact factor: 3.307

7. Occult polyclonality of preclinical pancreatic cancer models drives in vitro evolution.

Authors: Maria E Monberg; Heather Geiger; Jaewon J Lee; Roshan Sharma; Alexander Semaan; Vincent Bernard; Justin Wong; Fang Wang; Shaoheng Liang; Daniel B Swartzlander; Bret M Stephens; Matthew H G Katz; Ken Chen; Nicolas Robine; Paola A Guerrero; Anirban Maitra
Journal: Nat Commun Date: 2022-06-25 Impact factor: 17.694