| Literature DB >> 31778143 |
Shian Su1,2, Luyi Tian1,2, Xueyi Dong1,2, Peter F Hickey1,2, Saskia Freytag3, Matthew E Ritchie1,2,4.
Abstract
MOTIVATION: Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking.Mesh:
Year: 2020 PMID: 31778143 PMCID: PMC7141847 DOI: 10.1093/bioinformatics/btz889
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic of a CellBench analysis. (A) Inputs to a benchmark analysis include data with known labels and collections of method wrappers that receive input and produce output in a consistent format. Methods that correspond to the same step in a pipeline are grouped into blocks, and the CellBench framework implicitly generates results for all combinations of methods. The code required reflects the diagram model and supports piping syntax. Results are returned in a tibble structure appropriate for manipulation with popular tidyverse packages (B) Using the tibble structure, the adjusted Rand index metric for combinations of normalization, imputation and clustering is filtered to plot results for the top 4 performing pipelines for RaceID, TSCAN and Seurat using ggplot2 (Wickham, 2016). The bar plot shows the relative performance of each clustering method and its sensitivity to upstream methods.