| Literature DB >> 29950014 |
Gryte Satas1,2, Benjamin J Raphael1.
Abstract
Motivation: Current technologies for single-cell DNA sequencing require whole-genome amplification (WGA), as a single cell contains too little DNA for direct sequencing. Unfortunately, WGA introduces biases in the resulting sequencing data, including non-uniformity in genome coverage and high rates of allele dropout. These biases complicate many downstream analyses, including the detection of genomic variants.Entities:
Mesh:
Year: 2018 PMID: 29950014 PMCID: PMC6022575 DOI: 10.1093/bioinformatics/bty286
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(a) Single-cell DNA sequencing typically requires WGA to obtain sufficient quantities of DNA, which results in non-uniform read depth with correlation at scale of amplicons. Since the two homologous chromosomes are amplified independently, read-depth correlations are strongest between sequence reads originating from the same chromosome/haplotype. (b) Amplicon-scale read-depth correlations, combined with high rates of allelic dropout result in increased rates of concurrent allelic dropout for pairs of alleles originating from the same haplotype, where entries of the dropout vectors and indicate whether alleles A and b, respectively, are measured in each cell. (c) We derive a phasing score for pairs of nearby SNPs based on the P-values of concurrent dropout for different phasings of alleles. High or low values of the phasing score correspond to amplification fragments containing pairs of alleles that are likely to be on the same haplotype. These amplification fragments are used as input to haplotype assembly algorithms, augmenting phasing information from read fragments containing alleles found on the same read
Fig. 2.Haplotype assembly on whole-genome DNA-sequencing data. (a) We form a validation dataset of seven synthetic diploid cells with known haplotypes from X chromosomes in whole-genome DNA-sequencing data of single neuron cells from a male (Lodato ). (b) (Left) The accuracy of the predicted phase for the set of amplification fragments with the absolute value of the phasing score . We observe highly accurate prediction of phase for pairs of SNPs whose distance is less than the length of amplicons (here 95th percentile of amplicon length is 103 kb). (Right) The proportion of SNP pairs included in the set of amplification fragments . (c) The N50 and switch error for haplotype assembly as we vary the phasing score threshold c. The N50 and switch error for the haplotype assembly with no amplification fragments is marked with an ‘×’
Fig. 3.Assembling haplotypes on whole-exome DNA-sequencing data. (a) We validate haplotype assemblies on whole-exome DNA-sequencing data from an individual breast cancer patient (Wang ), by comparing to the haplotype of Chromosome 17, whose haplotype we can determine from the eight cancer cells that have lost one homolog of this chromosome. (b) Haplotype block length (N50) as a function of haplotype switch error for varying threshold of phasing score. The N50 and switch error for the assembly with no amplification fragments is marked with an ‘×’. Amplification fragments increase the length of haplotype assemblies by orders of magnitude with small increase in switch error. (c) The accuracy of the phasing for the highest scoring 20% of amplification fragments for varying numbers of cells