| Literature DB >> 32027371 |
David Posada1,2,3.
Abstract
Our capacity to study individual cells has enabled a new level of resolution for understanding complex biological systems such as multicellular organisms or microbial communities. Not surprisingly, several methods have been developed in recent years with a formidable potential to investigate the somatic evolution of single cells in both healthy and pathological tissues. However, single-cell sequencing data can be quite noisy due to different technical biases, so inferences resulting from these new methods need to be carefully contrasted. Here, I introduce CellCoal, a software tool for the coalescent simulation of single-cell sequencing genotypes. CellCoal simulates the history of single-cell samples obtained from somatic cell populations with different demographic histories and produces single-nucleotide variants under a variety of mutation models, sequencing read counts, and genotype likelihoods, considering allelic imbalance, allelic dropout, amplification, and sequencing errors, typical of this type of data. CellCoal is a flexible tool that can be used to understand the implications of different somatic evolutionary processes at the single-cell level, and to benchmark dedicated bioinformatic tools for the analysis of single-cell sequencing data. CellCoal is available at https://github.com/dapogon/cellcoal.Entities:
Keywords: allele dropout; amplification error; single-cell genomics; somatic evolution
Year: 2020 PMID: 32027371 PMCID: PMC7182211 DOI: 10.1093/molbev/msaa025
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Main flow of CellCoal. First, a sample genealogy is simulated. Then, cell genotypes are evolved along this genealogy by introducing somatic mutations, deletions, and copy-neutral LOH. Finally, sequencing reads are produced considering the specific biases of single-cell sequencing.
. 2.Amplification error models. Four-template (A) and two-template (B) model for amplification (γ) and sequencing (ε) error.
. 3.Effect of sequencing coverage heterogeneity on single-cell genotypes. (A) Probability that the maximum likelihood genotype is wrong. (B) Proportion of genotypes called. (C) Total number of single-nucleotide variants (SNVs) called. GATK and true (GATK+ADO) are the likelihood models used for calling genotypes. Coverage dispersion corresponds to the negative binomial dispersion parameter. The smaller this parameter is, the more heterogeneity there is. At the top, 1×, 5×, and 25× are different overall sequencing depths. In the boxplots, the central line indicates the median, whereas the box limits correspond to the Q1 and Q3 quartiles and the asterisk to the mean.