| Literature DB >> 33471102 |
Zhanlin Chen1,2, Jing Zhang3, Jason Liu, Zixuan Zhang4, Jiangqi Zhu4, Donghoon Lee5,6, Min Xu7, Mark Gerstein1,2.
Abstract
SUMMARY: scATAC-seq is a powerful approach for characterizing cell-type-specific regulatory landscapes. However, it is difficult to benchmark the performance of various scATAC-seq analysis techniques (such as clustering and deconvolution) without having a priori a known set of gold-standard cell types. To simulate scATAC-seq experiments with known cell-type labels, we introduce an efficient and scalable scATAC-seq simulation method (SCAN-ATAC-Sim) that down-samples bulk ATAC-seq data (e.g., from representative cell lines or tissues). Our protocol uses a consistent but tunable signal-to-noise ratio across cell types in a scATAC-seq simulation for integrating bulk experiments with different levels of background noise, and it independently samples twice without replacement to account for the diploid genome. Because it uses an efficient weighted reservoir sampling algorithm and is highly parallelizable with OpenMP, our implementation in C ++ allows millions of cells to be simulated in less than an hour on a laptop computer. AVAILABILITY: SCAN-ATAC-Sim is available at scan-atac-sim.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Year: 2021 PMID: 33471102 PMCID: PMC8289380 DOI: 10.1093/bioinformatics/btaa1039
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937