Lu Zhao1, Zhimin Liu2,3, Sasha F Levy2,3, Song Wu1. 1. Department of Applied Mathematics and Statistics. 2. Laufer Center for Physical and Quantitative Biology. 3. Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY 11794, USA.
Abstract
Motivation: Barcode sequencing (bar-seq) is a high-throughput, and cost effective method to assay large numbers of cell lineages or genotypes in complex cell pools. Because of its advantages, applications for bar-seq are quickly growing-from using neutral random barcodes to study the evolution of microbes or cancer, to using pseudo-barcodes, such as shRNAs or sgRNAs to simultaneously screen large numbers of cell perturbations. However, the computational pipelines for bar-seq clustering are not well developed. Available methods often yield a high frequency of under-clustering artifacts that result in spurious barcodes, or over-clustering artifacts that group distinct barcodes together. Here, we developed Bartender, an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data. Results: In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts. Additionally, Bartender includes unique molecular identifier handling and a 'multiple time point' mode that matches barcode clusters between different clustering runs for seamless handling of time course data. Bartender is a set of simple-to-use command line tools that can be performed on a laptop at comparable run times to existing methods. Availability and implementation: Bartender is available at no charge for non-commercial use at https://github.com/LaoZZZZZ/bartender-1.1. Contact: sasha.levy@stonybrook.edu or song.wu@stonybrook.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Barcode sequencing (bar-seq) is a high-throughput, and cost effective method to assay large numbers of cell lineages or genotypes in complex cell pools. Because of its advantages, applications for bar-seq are quickly growing-from using neutral random barcodes to study the evolution of microbes or cancer, to using pseudo-barcodes, such as shRNAs or sgRNAs to simultaneously screen large numbers of cell perturbations. However, the computational pipelines for bar-seq clustering are not well developed. Available methods often yield a high frequency of under-clustering artifacts that result in spurious barcodes, or over-clustering artifacts that group distinct barcodes together. Here, we developed Bartender, an accurate clustering algorithm to detect barcodes and their abundances from raw next-generation sequencing data. Results: In contrast with existing methods that cluster based on sequence similarity alone, Bartender uses a modified two-sample proportion test that also considers cluster size. This modification results in higher accuracy and lower rates of under- and over-clustering artifacts. Additionally, Bartender includes unique molecular identifier handling and a 'multiple time point' mode that matches barcode clusters between different clustering runs for seamless handling of time course data. Bartender is a set of simple-to-use command line tools that can be performed on a laptop at comparable run times to existing methods. Availability and implementation: Bartender is available at no charge for non-commercial use at https://github.com/LaoZZZZZ/bartender-1.1. Contact: sasha.levy@stonybrook.edu or song.wu@stonybrook.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Teemu Kivioja; Anna Vähärautio; Kasper Karlsson; Martin Bonke; Martin Enge; Sten Linnarsson; Jussi Taipale Journal: Nat Methods Date: 2011-11-20 Impact factor: 28.547
Authors: Patrick A Gibney; Charles Lu; Amy A Caudy; David C Hess; David Botstein Journal: Proc Natl Acad Sci U S A Date: 2013-10-28 Impact factor: 11.205
Authors: David Sims; Ana M Mendes-Pereira; Jessica Frankum; Darren Burgess; Maria-Antonietta Cerone; Cristina Lombardelli; Costas Mitsopoulos; Jarle Hakas; Nirupa Murugaesu; Clare M Isacke; Kerry Fenwick; Ioannis Assiotis; Iwanka Kozarewa; Marketa Zvelebil; Alan Ashworth; Christopher J Lord Journal: Genome Biol Date: 2011-10-21 Impact factor: 13.583
Authors: Meredith Noble; Jonathan R Treadwell; Stephen J Tregear; Vivian H Coates; Philip J Wiffen; Clarisse Akafomo; Karen M Schoelles Journal: Cochrane Database Syst Rev Date: 2010-01-20
Authors: Scott W Simpkins; Raamesh Deshpande; Justin Nelson; Sheena C Li; Jeff S Piotrowski; Henry Neil Ward; Yoko Yashiroda; Hiroyuki Osada; Minoru Yoshida; Charles Boone; Chad L Myers Journal: Nat Protoc Date: 2019-02 Impact factor: 13.491
Authors: Xianan Liu; Zhimin Liu; Adam K Dziulko; Fangfei Li; Darach Miller; Robert D Morabito; Danielle Francois; Sasha F Levy Journal: Cell Syst Date: 2019-04-03 Impact factor: 10.304
Authors: Weronika Jasinska; Michael Manhart; Jesse Lerner; Louis Gauthier; Adrian W R Serohijos; Shimon Bershtein Journal: Nat Ecol Evol Date: 2020-02-24 Impact factor: 15.460
Authors: Drew S Tack; Peter D Tonner; Abe Pressman; Nathan D Olson; Sasha F Levy; Eugenia F Romantseva; Nina Alperovich; Olga Vasilyeva; David Ross Journal: Mol Syst Biol Date: 2021-03 Impact factor: 11.429
Authors: Benjamin E Rubin; Spencer Diamond; Brady F Cress; Alexander Crits-Christoph; Yue Clare Lou; Adair L Borges; Haridha Shivram; Christine He; Michael Xu; Zeyi Zhou; Sara J Smith; Rachel Rovinsky; Dylan C J Smock; Kimberly Tang; Trenton K Owens; Netravathi Krishnappa; Rohan Sachdeva; Rodolphe Barrangou; Adam M Deutschbauer; Jillian F Banfield; Jennifer A Doudna Journal: Nat Microbiol Date: 2021-12-06 Impact factor: 30.964