| Literature DB >> 26848124 |
Nicholas H Putnam1, Brendan L O'Connell2, Jonathan C Stites1, Brandon J Rice1, Marco Blanchette1, Robert Calef1, Christopher J Troll1, Andrew Fields1, Paul D Hartley1, Charles W Sugnet1, David Haussler3, Daniel S Rokhsar4, Richard E Green2.
Abstract
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem, dramatically increasing the scaffold contiguity of assemblies. Here, we describe a simpler approach ("Chicago") based on in vitro reconstituted chromatin. We generated two Chicago data sets with human DNA and developed a statistical model and a new software pipeline ("HiRise") that can identify poor quality joins and produce accurate, long-range sequence scaffolds. We used these to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 20 Mbp. We also demonstrated the utility of Chicago for improving existing assemblies by reassembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kbp to 10 Mbp.Entities:
Mesh:
Year: 2016 PMID: 26848124 PMCID: PMC4772016 DOI: 10.1101/gr.193474.115
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.A diagram of a Chicago library generation protocol. (A) Chromatin (nucleosomes in blue) is reconstituted in vitro upon naked DNA (black strand). (B) Chromatin is fixed with formaldehyde (thin red lines are crosslinks). (C) Fixed chromatin is cut with a restriction enzyme, generating free sticky ends (performed on streptavidin-coated beads; data not shown). (D) Sticky ends are filled in with biotinylated (blue circles) and thiolated (green squares) nucleotides. (E) Free blunt ends are ligated (ligations indicated by red asterisks). (F) Crosslinks are reversed and proteins removed to yield library fragments, which are then digested with an exonuclease to remove the terminal biotinylated nucleotides. The thiolated nucleotides protect the interior of the library fragments from digestion.
Figure 2.Histogram of read pair separations for several sequencing libraries mapped to hg19. (Black) Chicago library L1, prepared with MboI and 150-kbp input DNA; (red) Chicago library L2, prepared with MluCI and 150-kbp input DNA; and (violet) Chicago library L3, prepared with 500-kbp input DNA. A human Hi-C library (Kalhor et al. 2012) is shown in dark blue for comparison.
Figure 3.Genome coverage (sum of read pair separations divided by estimated genome size) in various read pair separation bins.
GM12878 Scaffolding results
Figure 4.The mapped locations on the GRCh38 reference sequence of Chicago read pairs are plotted in the vicinity of structural differences between GM12878 and the reference (A, deletion; B, inversion). Each Chicago pair is represented both above and below the diagonal. Above the diagonal, color indicates map quality score on the scale shown; below the diagonal, colors indicate the inferred haplotype phase of Chicago pairs based on overlap with phased SNPs, with read pairs of unknown haplotype origin shown in gray.