| Literature DB >> 33183232 |
Vivekananda Sarangi1, Alexandre Jourdon2, Taejeong Bae1, Arijit Panda1, Flora Vaccarino2,3, Alexej Abyzov4.
Abstract
BACKGROUND: The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection.Entities:
Keywords: MDA; Single cell; Whole genome amplification
Mesh:
Substances:
Year: 2020 PMID: 33183232 PMCID: PMC7663899 DOI: 10.1186/s12859-020-03858-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Concept and workflow of the approach. a VAF distribution of HETs at 30× sequencing coverage in three cases: Bulk sample, uniformly amplified cell, and un-uniformly amplified cell. The distribution from bulk shows a peak around 50%, which is expected. Then we have a single cell sequenced at 30× with good amplification. The allele frequency plot still has a peak around 50%, but not as sharp as the bulk sample. The last example is a single cell also sequenced at 30× but with non-uniform amplification. b Conceptual description of the approach. First, SNPs are phased. The reads supporting the SNPs are divided into two haplotypes, named maternal and paternal, although the exact origin of each haplotype is unknowns. With less than 1 read supporting each SNP (coverage < 1×), multiple SNPs are merged to form a SNP unit. Reads supporting SNPs within the SNP unit from only one haplotype are used to calculate the allele frequency over that SNP unit. The allele frequency plot for high coverage data closely resembles the one from shallow coverage data
Fig. 2Flowchart of method implementation. Script 1 through 3 should be executed in sequence, however, they are independent of each other and as long as the input are correct, user can start with any script. The final Script-3 produces a VAF plot for each sample. Two examples of uniformly (on the left) and un-uniform (on the right) amplified cells are shown
Fig. 3Validation of Scellector using 9 cells subjected to shallow sequencing by high coverage sequencing. a Side by side comparison of the allele frequency plots from shallow coverage and high coverage. b Scatter plot showing high correlation between the shallow and high coverage. Three comparison using allele dropout, standard deviation and a combination of standard deviation and allele dropout (AD) show similar results