Literature DB >> 29950014

Haplotype phasing in single-cell DNA-sequencing data.

Gryte Satas1,2, Benjamin J Raphael1.   

Abstract

Motivation: Current technologies for single-cell DNA sequencing require whole-genome amplification (WGA), as a single cell contains too little DNA for direct sequencing. Unfortunately, WGA introduces biases in the resulting sequencing data, including non-uniformity in genome coverage and high rates of allele dropout. These biases complicate many downstream analyses, including the detection of genomic variants.
Results: We show that amplification biases have a potential upside: long-range correlations in rates of allele dropout provide a signal for phasing haplotypes at the lengths of amplicons from WGA, lengths which are generally longer than than individual sequence reads. We describe a statistical test to measure concurrent allele dropout between single-nucleotide polymorphisms (SNPs) across multiple sequenced single cells. We use results of this test to perform haplotype assembly across a collection of single cells. We demonstrate that the algorithm predicts phasing between pairs of SNPs with higher accuracy than phasing from reads alone. Using whole-genome sequencing data from only seven neural cells, we obtain haplotype blocks that are orders of magnitude longer than with sequence reads alone (median length 10.2 kb versus 312 bp), with error rates <2%. We demonstrate similar advantages on whole-exome data from 16 cells, where we obtain haplotype blocks with median length 9.2 kb-comparable to typical gene lengths-compared with median lengths of 41 bp with sequence reads alone, with error rates <4%. Our algorithm will be useful for haplotyping of rare alleles and studies of allele-specific somatic aberrations. Availability and implementation: Source code is available at https://www.github.com/raphael-group. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29950014      PMCID: PMC6022575          DOI: 10.1093/bioinformatics/bty286

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

In recent years, single-cell DNA-sequencing technologies have enabled the measurement of the genomic changes in individual cells (Gawad ). This technology has been used to measure somatic mutations in normal tissue (Lodato ; McConnell ), to quantify somatic evolution in cancer (Navin, 2015; Wang ), to investigate the genomes of unculturable microorganisms (Marcy ), and for other applications. Unfortunately, it is not yet possible to directly sequence the DNA molecule(s) present in a single cell. Rather, current single-cell DNA-sequencing technologies first perform whole-genome amplification (WGA), in order to obtain sufficient DNA to sequence. Several WGA methods have been introduced, with the three most common being degenerate oligonucleotide primed PCR (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC) (Gawad ). The lengths of the amplified genomic fragments, or amplicons, range from 200 to 300 bp for DOP-PCR, to 1–5 kb for MALBAC, and up to 10–100 kb for MDA (Sherman ). As WGA uses repeated cycles of amplification, any errors or non-uniformity in coverage obtained during early cycles of amplification are amplified in later cycles. Thus, WGA results in highly non-uniform coverage of the genome Figure 1a, which is apparent in the observed strong correlations between read depth at genomic loci within the distance of an amplicon (Zhang ).
Fig. 1.

(a) Single-cell DNA sequencing typically requires WGA to obtain sufficient quantities of DNA, which results in non-uniform read depth with correlation at scale of amplicons. Since the two homologous chromosomes are amplified independently, read-depth correlations are strongest between sequence reads originating from the same chromosome/haplotype. (b) Amplicon-scale read-depth correlations, combined with high rates of allelic dropout result in increased rates of concurrent allelic dropout for pairs of alleles originating from the same haplotype, where entries of the dropout vectors and indicate whether alleles A and b, respectively, are measured in each cell. (c) We derive a phasing score for pairs of nearby SNPs based on the P-values of concurrent dropout for different phasings of alleles. High or low values of the phasing score correspond to amplification fragments containing pairs of alleles that are likely to be on the same haplotype. These amplification fragments are used as input to haplotype assembly algorithms, augmenting phasing information from read fragments containing alleles found on the same read

(a) Single-cell DNA sequencing typically requires WGA to obtain sufficient quantities of DNA, which results in non-uniform read depth with correlation at scale of amplicons. Since the two homologous chromosomes are amplified independently, read-depth correlations are strongest between sequence reads originating from the same chromosome/haplotype. (b) Amplicon-scale read-depth correlations, combined with high rates of allelic dropout result in increased rates of concurrent allelic dropout for pairs of alleles originating from the same haplotype, where entries of the dropout vectors and indicate whether alleles A and b, respectively, are measured in each cell. (c) We derive a phasing score for pairs of nearby SNPs based on the P-values of concurrent dropout for different phasings of alleles. High or low values of the phasing score correspond to amplification fragments containing pairs of alleles that are likely to be on the same haplotype. These amplification fragments are used as input to haplotype assembly algorithms, augmenting phasing information from read fragments containing alleles found on the same read The amplification bias resulting from WGA leads to challenges in identifying genomic variants in single cells, and thus is a negative of single-cell sequencing technology. Considerable efforts have been made to develop analysis algorithms that overcome this bias (Bakker ; Garvin ) and to design WGA methods with less bias (Chen ; Picher ). In this article, we demonstrate a positive aspect of amplification bias: since neighboring genomic loci are often co-amplified, the correlation in sequence coverage between neighboring alleles on the same chromosome can be used to phase haplotypes in diploid genomes (Fig. 1b). Diploid genomes, such as the human genome, consist of pairs of homologous chromosomes, distinguished by single-nucleotide polymorphisms (SNPs), and other small genomic differences. Current DNA-sequencing technologies yield sequence reads that originate from a mixture of both homologous chromosomes, losing information about the chromosomal origin of each read. Thus, for any pair of heterozygous SNPs that are further apart than a read length, the alleles that are present on the same chromosome are unknown. Haplotype assembly is the process of reconstructing the haplotypes of an individual—i.e. assigning the alleles of heterozygous SNPs to the corresponding chromosome of origin—from sequence reads obtained from an individual. Since each read generally derives from a single chromosome, if a read spans multiple SNPs, then the observed alleles are presumed from a single haplotype. Haplotype and SNP phase information has applications in population genetics (Tewhey ) and clinical and medical genomics (Glusman ; Roach ; van de Ven ) as well as being used to improve other analyses such as SNP imputation and genotyping (Browning and Yu, 2009; Marchini ) and somatic variant calling (Bohrson ). Obtaining long continuous haplotype blocks is a challenge as the distance between adjacent SNPs is longer than reads and read fragments in most sequencing technologies. Long reads and other technologies that provide long-range phase information, such as linked-read and Hi-C data, have been used to improve the length of haplotype assemblies (Edge ; Patterson ; Pirola ; Zheng ), and there are microfluidic techniques designed to recover haplotypes of single-cell data (Chu ; Fan ). However, none of these techniques can be applied to existing short-read single-cell data, and may be prohibitively expensive for new experiments. We describe an algorithm that exploits amplification bias across a collection of sequenced single cells to assemble haplotypes (Fig. 1c). This algorithm is based on the observation in (Zhang ) that homologous chromosomes are amplified almost independently during WGA, and thus show similar rates of amplification bias. Thus, amplicon-scale correlations in read depth provide a signal to phase heterozygous SNPs across amplicon lengths. Specifically, our model leverages allele dropout, where one of the two alleles of a heterozygous SNP is not measured in a cell, a common feature of single-cell sequencing data (Gawad ). Alleles of two nearby SNPs on the same chromosome are likely to drop out (not be covered by an amplicon) concurrently. We derive a statistical test of concurrent dropout of alleles of heterozygous SNPs. We validate our approach using both whole-genome and whole-exome DNA single-cell sequencing data, and show that our approach predicts haplotype phase with high accuracy, achieving >90% accuracy on top-ranked 22% of pairs of SNPs within amplicon-length distances. We use pairs of SNPs exhibiting high rates of concurrent dropout to define amplification fragments that we input into an existing haplotype assembler (Edge ). We obtain haplotype blocks that are three to four orders of magnitude larger (10.2 kb versus 312 bp on whole-genome data, 9.2 kb versus 41 bp on whole-exome data) than obtained using read information alone, with low increase in assembly errors.

2 Materials and methods

WGA methods used in single-cell sequencing result in datasets with high rates of allele dropout, where alleles that are present in the genome are not observed in a sequenced cell. The primary cause of allele dropout is the failure of amplification of a genomic region from one of the two homologous chromosomes during WGA. Thus, one expects to observe correlations in rates of allele dropout between alleles of SNPs whose distances are within the length of an amplicon (up to 10–100 kb, depending on the WGA method). More specifically, since an amplicon contains DNA sequence from one homologous chromosome, one expects to observe a higher rate of concurrent dropout (dropout of both alleles in one cell) for a pair of alleles on the same haplotype and within the length of an amplicon, than for two alleles on different haplotypes. We describe a statistical test to evaluate such concurrent dropout between alleles for a pair of SNPs. We then use pairs of alleles which show strong evidence of significant concurrent dropout as input to a haplotype assembly algorithm.

2.1 Quantifying concurrent dropout

We obtain DNA-sequencing data from n single cells from the same individual, and assume that these cells share m heterozygous SNPs. Consider a pair of heterozygous SNPs with alleles A, a for the one SNP and alleles B, b for the other SNP. For an allele A, we define the dropout vector to be a binary vector of length n, where if we do not observe any reads containing allele A in cell s, and otherwise. Thus, d indicates the dropout for allele A across cells. For an allele A, let be the number of dropouts, or cells where A is not measured. Similarly, for a pair of alleles A and B, let be the number of concurrent dropouts of alleles, i.e. the number of cells where both A and B are not observed. The key idea of our model is that if the distance between SNPs is less than the length of an amplicon, and if alleles A and B are on the same haplotype, then concurrent dropout of A and B is more frequent than expected by chance. Conversely, if SNPs are far apart or A and B are present on different haplotypes, then we expect that amplification of these alleles is independent, and concurrent dropouts are random events. Let N be a random variable indicating this number of concurrent dropouts between alleles A and B across n cells. Under the null model, dropouts between allele A and allele B are independent. To compute the distribution of N under the null, we need to compute the probability that allele X drops out in cell s for each allele and cell. However, varies by locus, allele X, and cell s. Locus-specific and allele-specific variability in dropout rates results from context-specific amplification, sequencing or alignment biases. Cell-specific variability in dropout rates results from differences in sequencing depth and uniformity of coverage across cells. Since it is difficult to model each of these effects directly, we instead compute a weighted exact distribution , where w are cell-specific weights obtained from the observed number of dropouts across all loci in each cell. See Leiserson ) for details of similar weighted tests used in other biological applications. Specifically, let D be the matrix whose rows correspond to the 2 m dropout vectors for the set of alleles of m heterozygous SNPs. We compute the P-value of observing n or more concurrent dropouts, conditioned on the observed row sums and column sums of the matrix D. Computing this P-value is non-trivial. Leiserson ) introduced the WExT algorithm to compute a saddlepoint approximation of a P-value for the related problem of mutually exclusive events. We use a recent release of the WExT software that computes a saddlepoint approximation for the co-occurrence test statistic N.

2.2 Augmenting haplotype assembly with amplification fragments

Haplotype assembly is the reconstruction of haplotypes from local information about groups of alleles that are present on the same chromosome. The input for haplotype assembly is a set of fragments , where a fragment defines a phasing over a set of SNPs—e.g. , where delineates the two chromosomes, indicates alleles A, B, c are on one chromosome and alleles a, B, C are on the other. Haplotype assembly algorithms aim to find the most likely haplotypes from the set . Typically, the set of fragments is equal to , the set of sequenced reads (or paired-end reads in the case of mate pair libraries). This is because alleles of two SNPs measured on the same read (or paired-read) are highly likely to reside on the same haplotype. We extend the fragment set using a set of amplification fragments defined from pairs of alleles from neighboring SNPs that demonstrate concurrent dropout, using the statistical test defined in the previous section. A pair of SNPs, having alleles A, a for one SNP and alleles B, b for the second SNP, can be phased in two ways: or . As noted by Zhang ), during WGA, homologous chromosomes are amplified independently. Thus, we assume the random variables N and N are independent under the null hypothesis, and combine the P-values PAB and Pab using Fisher’s method to obtain a single P-value where and T follows the distribution, since two P-values are combined. For each pair of SNPs, this procedures yields two P-values, and , corresponding to the strength of evidence against the null model of independence for each phasing. Under phasing , we expect high dropout concurrence for allele pair {A, B} and allele pair {a, b} and independent dropout for allele pairs {A, b} and {a, B}. Thus, we expect to see a low P-value and a high P-value . We summarize the evidence in support of each phasing using a phasing score defined as Here, h > 0 indicates stronger evidence for phasing , while h < 0 indicates stronger evidence for phasing . Large values of suggest that the allele pairs for one phasing have more concurrent dropout than expected by chance compared with the allele pairs for the other phasing, and thus are more likely to have come from same haplotypes. We define a set of amplification fragments containing allele pairs whose absolute values of the phasing scoring , for a non-negative threshold c. We then use a haplotype assembly algorithm, HapCut2 (Edge ) to assemble haplotypes using the combined set of sequence and amplification fragments as input.

3 Results

We tested our model on two single-cell DNA-sequencing datasets: whole-genome sequencing of neurons (Lodato ) and whole-exome sequencing of breast cancer cells (Wang ).

3.1 Whole-genome single cell sequencing of neurons

We first evaluated the performance of our model on synthetic diploid cells constructed from sequenced X chromosomes in single-cell DNA sequencing of neurons from a male individual, UMB1465 (Lodato ). This dataset includes whole-genome sequencing from n = 16 single cells and two bulk samples. The single cells were amplified using MDA, and a recent analysis (Sherman ) estimates that these samples had MDA amplicons lengths up to 200 kb, with a median of 19 kb and 95th percentile of 103 kb. To create synthetic diploid cells, we extracted the haploid X chromosomes from each sequenced cell (Fig. 2a), excluding the pseudo-autosomal regions PAR1 and PAR2 on chromosome X as defined in human reference genome assembly GRCh37.p13. To account for variability in dropout rates between cells, we downsampled all cells to a fixed number of reads, and formed pairs of cells based on the total number of covered positions in the samples. We generated simulated haplotypes using population-level allele frequencies acquired from dbSNP, and spiked these alleles into the sequencing data. We introduced sequencing error into the spiked-in alleles based on the Phred quality scores of individual reads. Further details on the simulation can be found in Supplementary Material S1.
Fig. 2.

Haplotype assembly on whole-genome DNA-sequencing data. (a) We form a validation dataset of seven synthetic diploid cells with known haplotypes from X chromosomes in whole-genome DNA-sequencing data of single neuron cells from a male (Lodato ). (b) (Left) The accuracy of the predicted phase for the set of amplification fragments with the absolute value of the phasing score . We observe highly accurate prediction of phase for pairs of SNPs whose distance is less than the length of amplicons (here 95th percentile of amplicon length is 103 kb). (Right) The proportion of SNP pairs included in the set of amplification fragments . (c) The N50 and switch error for haplotype assembly as we vary the phasing score threshold c. The N50 and switch error for the haplotype assembly with no amplification fragments is marked with an ‘×’

Haplotype assembly on whole-genome DNA-sequencing data. (a) We form a validation dataset of seven synthetic diploid cells with known haplotypes from X chromosomes in whole-genome DNA-sequencing data of single neuron cells from a male (Lodato ). (b) (Left) The accuracy of the predicted phase for the set of amplification fragments with the absolute value of the phasing score . We observe highly accurate prediction of phase for pairs of SNPs whose distance is less than the length of amplicons (here 95th percentile of amplicon length is 103 kb). (Right) The proportion of SNP pairs included in the set of amplification fragments . (c) The N50 and switch error for haplotype assembly as we vary the phasing score threshold c. The N50 and switch error for the haplotype assembly with no amplification fragments is marked with an ‘×’ To infer haplotypes, we applied the statistical test described in Section 2 to all pairs of SNPs within 50 kb of each other. Next, we constructed a set of amplification fragments for a phasing score value and input the amplification fragments and read fragments into HapCut2 to assemble haplotypes. We evaluated the ability of the phasing score to accurately phase pairs of SNPs at varying distances and over a range of values (Fig. 2b). We found that the proportion of fragments whose phase was predicted correctly, increases with the absolute value of the phasing score, indicating that is correlated with the the accuracy of phasing. With larger amplicon lengths (50 kb–1 Mb and 1–5 Mb) the fragment accuracy decreases rapidly as the phasing score decreases. In addition, relatively few fragments have high phasing scores , demonstrating that few SNPs are accurately phased when the distance between SNPs exceed the length of WGA amplicons. For haplotype assembly, we use pairs of SNPs whose distance is 50 kb or less. Over this set of SNP pairs, we are able to correctly phase fragments with 77% accuracy overall. However, if we restrict to the set of fragments with scores , the top 22% of fragments, we obtain an accuracy of 91%. We assemble haplotypes using amplification fragments and read fragments as input to HapCut2 (Edge ) as described in Section 2. We ran HapCut2 with a range of thresholds c for the phasing scores, where for each c, we supply HapCut2 with , the set of sequence fragments and amplification fragments. We also run HapCut2 using only sequence reads to get a baseline measure of results using only short reads. In each case, we ran HapCut2 with default parameters. We evaluate the resulting haplotype assemblies using the following metrics (Fig. 2c). Not surprisingly, we observe a trade-off between switch error rate and length of resulting haplotype blocks, Without any amplification fragments , HapCut2 obtains haplotype blocks with a median (N50) length of 312 bp. As the phasing score threshold c decreases, the block lengths increase by several orders of magnitude with only relatively small corresponding increases in switch error rate With phasing score threshold c = 2.25, we obtain a block length of N50 = 10.2 kb, with a switch error rate of 0.02. At lower values of phasing score threshold, we see larger increases in error, which corresponds to the observed decrease fragment accuracy at the same values (Fig. 2b). Depending on the downstream analysis that utilizes the resulting haplotypes, these low thresholds may prove useful. For example, at a threshold of c = 0, we acquire blocks with an N50 = 9.96 Mb, with 92.6% of blocks containing no switch errors. The N50 is the length of a haplotype block such that half of all phased variants are in a block at least as long. Switch error is the proportion of phase connections between adjacent SNPs that are incorrect.

3.2 Whole-exome single-cell sequencing of breast cancer

Whole-exome sequencing comprises a significant proportion of available single-cell sequencing datasets (Navin, 2015), due to both the lower sequencing costs compared with whole-genome sequencing and because of interest in measuring variation in coding regions. Assembling haplotypes from short-read whole-exome sequencing data is challenging as reads (or paired reads) are shorter than the lengths of most introns. Introns are estimated to have a median length of ∼1334 bp in the human genome (Hong ), while the median exon size is ∼122 bp (International Human Genome Sequencing Consortium, 2001). Thus haplotype phase is difficult to determine across exons from short read data. Since WGA amplicons are typically longer than an intron, we hypothesized that we could use our model to obtain haplotype blocks from single-cell whole-exome data that were substantially longer than blocks obtained with short-read sequencing data. To test this hypothesis, we evaluated our model on single-cell whole-exome data from a triple-negative breast cancer patient (Wang ). We observed that one copy of Chromosome 17 was lost in eight cancer cells in this datasets that were indicated to be hypodiploid. Using these eight cells we obtained the true haplotypes for Chromosome 17 (Fig. 3a). We then applied our model to 14 normal (non-cancerous) diploid cells from the same individual. We ran the model and constructed amplification fragments as in previous sections. Figure 3b shows the features of the resulting haplotype assemblies.
Fig. 3.

Assembling haplotypes on whole-exome DNA-sequencing data. (a) We validate haplotype assemblies on whole-exome DNA-sequencing data from an individual breast cancer patient (Wang ), by comparing to the haplotype of Chromosome 17, whose haplotype we can determine from the eight cancer cells that have lost one homolog of this chromosome. (b) Haplotype block length (N50) as a function of haplotype switch error for varying threshold of phasing score. The N50 and switch error for the assembly with no amplification fragments is marked with an ‘×’. Amplification fragments increase the length of haplotype assemblies by orders of magnitude with small increase in switch error. (c) The accuracy of the phasing for the highest scoring 20% of amplification fragments for varying numbers of cells

Assembling haplotypes on whole-exome DNA-sequencing data. (a) We validate haplotype assemblies on whole-exome DNA-sequencing data from an individual breast cancer patient (Wang ), by comparing to the haplotype of Chromosome 17, whose haplotype we can determine from the eight cancer cells that have lost one homolog of this chromosome. (b) Haplotype block length (N50) as a function of haplotype switch error for varying threshold of phasing score. The N50 and switch error for the assembly with no amplification fragments is marked with an ‘×’. Amplification fragments increase the length of haplotype assemblies by orders of magnitude with small increase in switch error. (c) The accuracy of the phasing for the highest scoring 20% of amplification fragments for varying numbers of cells Using only read fragments , the haplotype assemblies have a median block length of N50 = 41 bp, which, as expected, is shorter than the length of a single exon. Using amplification fragments we are able to increase the block length by several orders of magnitude, with only small increases in switch error (Fig. 3b). For example, when the phasing score threshold c = 2.75, we obtain a median block length N50 = 9.3 kb, with a corresponding switch error rate of 0.04. This indicates that we are able to phase across multiple exons. Indeed this haplotype block length is of the same order of magnitude of the typical gene length. Although read-based phasing is limited by the length of fragments, reference-based phasing algorithm is generally able to phase over longer genomic distances. We compared the haplotype assemblies we obtain to a reference-based phasing algorithm, EAGLE2 (Loh ) (Supplementary Fig. S1). We show that while EAGLE2 was able to provide a phasing over the whole chromosome, our amplification-fragment based phasing provides lower error rates within the blocks that we obtain. To investigate the number of single-cells required to obtain accurate amplification fragments, we ran the model on subsets of cells of size 2–15 (Fig. 3c). We find that on this data set, the fragment accuracy levels off after 8–10 cells. Thus, we can obtain accurate amplification fragments even with relatively few cells.

4 Discussion

Single-cell DNA sequencing is increasingly being used to explore the genomic content of individual cells, but requires analysis algorithms that are robust to the errors and biases in this data. In this article, we exploit one bias in single-cell-sequencing data, amplification bias, and show how we can leverage this local information to assemble haplotypes. Our results demonstrate that concurrent dropout between nearby alleles can provide amplicon-scale correlations that lead to better haplotype assemblies than using only correlations between alleles on the same sequence read. However, there are several limitations and avenues for further improvement of our approach. First, many recent haplotype assembly algorithms, including the HapCut2 (Edge ) used here, employ more sophisticated probabilistic models for error in fragments. Extending our model to estimate error rates for amplification fragments could provide better integration with the error models in haplotype assemblers, and yield more accurate haplotype predictions. Although our current model does not calculate likelihood for each phasing, we may be able to estimate the error rates based on the empirical distribution of phasing scores. Alternatively, an improved probability model that accounts for several of the features of sequencing data—including sequencing error and observed read depth—could be developed and would likely outperform the straightforward model of concurrent dropout introduced here. Additionally, extending the model to consider groups of SNPs instead of just pairs of SNPs may be useful for identifying phase when pairwise relationships are weak. This can be particularly useful with the lower sequence coverages that are common in single-cell sequencing datasets. If one’s goal is to obtain a high-quality phased diploid genomes, there are a number of good approaches. These include: the application of high-quality reference-based phasing algorithms (Browning and Yu, 2009; Delaneau ; Loh ; Stephens ) that exploit large populations on genotyped individuals; long-read (Glusman ) or linked-read (Zheng ) sequencing; or specialized techniques such as Strand-Seq (Porubský ). Phasing algorithms also exist for both bulk (Castel ) and single-cell RNA-seq data (Castel ). We do not expect that researchers will perform single-cell sequencing if their only goal is to obtain a phased, diploid genome. Rather, we anticipate that the approach described here will be a useful complement for specific analyses of single-cell sequencing data. For example, we expect that this model can be broadly useful in variant calling in single cells, which is typically significantly confounded by amplification bias. Although we validated the model on diploid genomes, the model is readily adaptable to copy-number aberrations, and thus can be applied to cancer genomes which often demonstrate high levels of aneuploidy. The information derived from our model may be useful for allele-specific copy number calling in single cells, which to our knowledge is not currently done in any existing single-cell copy-number caller. Information on haplotype phase will also be useful for calling retrotranspon insertions (Evrony ) or single-nucleotide variants (SNVs). Recently, (Bohrson ) showed how they were able to reduce false positive rates in SNV calling in single-cell DNA sequencing by phasing SNVs to nearby SNPs. However, many SNVs cannot be phased to SNPs with short reads, especially in whole-exome data. Our method is able to phase across much larger distances and across exons. An additional application is phasing of structural variants in single-cells. SNPs on either side of the breakpoints of a structural variant will also show more dropout concurrence than expected by chance. An extension of our model might allow for improved phasing of structural variants, which could be useful in reconstructing highly rearranged cancer genomes. Finally, additional extensions of dropout concurrence might be applied to single-cell RNA-seq data, e.g. by exploiting correlations in allele-specific expression or allele-specific alternative splicing.

Funding

This work was supported by a US National Science Foundation (NSF) CAREER Award [CCF-1053753] and US National Institutes of Health (NIH) grants [R01HG007069 and R01CA180776 to B.J.R.]. Conflict of Interest: B.J.R. is a co-founder and consultant at Medley Genomics. Click here for additional data file.
  33 in total

1.  A new multipoint method for genome-wide association studies by imputation of genotypes.

Authors:  Jonathan Marchini; Bryan Howie; Simon Myers; Gil McVean; Peter Donnelly
Journal:  Nat Genet       Date:  2007-06-17       Impact factor: 38.330

2.  Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies.

Authors:  Brian L Browning; Zhaoxia Yu
Journal:  Am J Hum Genet       Date:  2009-12       Impact factor: 11.025

3.  Mosaic copy number variation in human neurons.

Authors:  Michael J McConnell; Michael R Lindberg; Kristen J Brennand; Julia C Piper; Thierry Voet; Chris Cowing-Zitron; Svetlana Shumilina; Roger S Lasken; Joris R Vermeesch; Ira M Hall; Fred H Gage
Journal:  Science       Date:  2013-11-01       Impact factor: 47.728

4.  Whole-genome molecular haplotyping of single cells.

Authors:  H Christina Fan; Jianbin Wang; Anastasia Potanina; Stephen R Quake
Journal:  Nat Biotechnol       Date:  2010-12-19       Impact factor: 54.908

Review 5.  The importance of phase information for human genomics.

Authors:  Ryan Tewhey; Vikas Bansal; Ali Torkamani; Eric J Topol; Nicholas J Schork
Journal:  Nat Rev Genet       Date:  2011-02-08       Impact factor: 53.242

6.  Rare variant phasing and haplotypic expression from RNA sequencing with phASER.

Authors:  Stephane E Castel; Pejman Mohammadi; Wendy K Chung; Yufeng Shen; Tuuli Lappalainen
Journal:  Nat Commun       Date:  2016-09-08       Impact factor: 14.919

7.  Whole-genome haplotyping approaches and genomic medicine.

Authors:  Gustavo Glusman; Hannah C Cox; Jared C Roach
Journal:  Genome Med       Date:  2014-09-25       Impact factor: 11.117

8.  TruePrime is a novel method for whole-genome amplification from single cells based on TthPrimPol.

Authors:  Ángel J Picher; Bettina Budeus; Oliver Wafzig; Carola Krüger; Sara García-Gómez; María I Martínez-Jiménez; Alberto Díaz-Talavera; Daniela Weber; Luis Blanco; Armin Schneider
Journal:  Nat Commun       Date:  2016-11-29       Impact factor: 14.919

9.  Clonal evolution in breast cancer revealed by single nucleus genome sequencing.

Authors:  Yong Wang; Jill Waters; Marco L Leung; Anna Unruh; Whijae Roh; Xiuqing Shi; Ken Chen; Paul Scheet; Selina Vattathil; Han Liang; Asha Multani; Hong Zhang; Rui Zhao; Franziska Michor; Funda Meric-Bernstam; Nicholas E Navin
Journal:  Nature       Date:  2014-07-30       Impact factor: 49.962

10.  Ultraaccurate genome sequencing and haplotyping of single human cells.

Authors:  Wai Keung Chu; Peter Edge; Ho Suk Lee; Vikas Bansal; Vineet Bafna; Xiaohua Huang; Kun Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2017-10-24       Impact factor: 11.205

View more
  5 in total

Review 1.  Eleven grand challenges in single-cell data science.

Authors:  David Lähnemann; Johannes Köster; Ewa Szczurek; Davis J McCarthy; Stephanie C Hicks; Mark D Robinson; Catalina A Vallejos; Kieran R Campbell; Niko Beerenwinkel; Ahmed Mahfouz; Luca Pinello; Pavel Skums; Alexandros Stamatakis; Camille Stephan-Otto Attolini; Samuel Aparicio; Jasmijn Baaijens; Marleen Balvert; Buys de Barbanson; Antonio Cappuccio; Giacomo Corleone; Bas E Dutilh; Maria Florescu; Victor Guryev; Rens Holmer; Katharina Jahn; Thamar Jessurun Lobo; Emma M Keizer; Indu Khatri; Szymon M Kielbasa; Jan O Korbel; Alexey M Kozlov; Tzu-Hao Kuo; Boudewijn P F Lelieveldt; Ion I Mandoiu; John C Marioni; Tobias Marschall; Felix Mölder; Amir Niknejad; Lukasz Raczkowski; Marcel Reinders; Jeroen de Ridder; Antoine-Emmanuel Saliba; Antonios Somarakis; Oliver Stegle; Fabian J Theis; Huan Yang; Alex Zelikovsky; Alice C McHardy; Benjamin J Raphael; Sohrab P Shah; Alexander Schönhuth
Journal:  Genome Biol       Date:  2020-02-07       Impact factor: 13.583

2.  scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data.

Authors:  Zhiqiang Yan; Xiaohui Zhu; Yuqian Wang; Yanli Nie; Shuo Guan; Ying Kuo; Di Chang; Rong Li; Jie Qiao; Liying Yan
Journal:  BMC Bioinformatics       Date:  2020-02-01       Impact factor: 3.169

Review 3.  Computational methods for chromosome-scale haplotype reconstruction.

Authors:  Shilpa Garg
Journal:  Genome Biol       Date:  2021-04-12       Impact factor: 13.583

4.  Experimental method for haplotype phasing across the entire length of chromosome 21 in trisomy 21 cells using a chromosome elimination technique.

Authors:  Sachiko Wakita; Mari Hara; Yasuji Kitabatake; Keiji Kawatani; Hiroki Kurahashi; Ryotaro Hashizume
Journal:  J Hum Genet       Date:  2022-05-31       Impact factor: 3.755

5.  Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets.

Authors:  Emily Berger; Deniz Yorukoglu; Lillian Zhang; Sarah K Nyquist; Alex K Shalek; Manolis Kellis; Ibrahim Numanagić; Bonnie Berger
Journal:  Nat Commun       Date:  2020-09-16       Impact factor: 14.919

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.