| Literature DB >> 32456609 |
Collin Giguere1, Harsh Vardhan Dubey1, Vishal Kumar Sarsani1, Hachem Saddiki2, Shai He1, Patrick Flaherty3.
Abstract
BACKGROUND: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.Entities:
Keywords: DNA sequencing; Hierarchical Dirichlet; Single-cell DNA sequencing; simulator
Year: 2020 PMID: 32456609 PMCID: PMC7249349 DOI: 10.1186/s12859-020-03550-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1SCSIM simulation workflow. Inputs (shown in rounded boxes with green text) are the reference sequence and experiment design. Outputs (shown in cornered boxes with orange text) are the bulk and single cell FASTQ read files. Intermediate objects are shown in purple with no boxes
Fig. 2Graphical model representation of the SCSIM hierarchical model. To generate data from the model, first, sample a distribution over K mutant genomes, ″. Sample a distribution for each biological unit (individual), i′ for i=1,…,N. Sample a distribution for each sample (biopsy), ij′ for j=1,…,N. Sample variant locations and types that define the mutant synthetic genomes =[1,…,]. Finally, sample the bulk and single-cell reads for each sample
Fig. 3Simulated true SNV locations across mutated synthetic prototype genomes
Fig. 4IGV visualization of reads generated by SCSIM for 4 single-cell and 4 bulk samples at a a genomic location where each sample has a true SNV, b at a genomic location where half of the samples have a true SNV, and c at a genomic location where a random fraction of the samples have a true SNV
Fig. 5Venn diagram showing concordance of called and true variants