| Literature DB >> 22800726 |
Louise J S Williams1, Diana G Tabbaa, Na Li, Aaron M Berlin, Terrance P Shea, Iain Maccallum, Michael S Lawrence, Yotam Drier, Gad Getz, Sarah K Young, David B Jaffe, Chad Nusbaum, Andreas Gnirke.
Abstract
Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.Entities:
Mesh:
Year: 2012 PMID: 22800726 PMCID: PMC3483553 DOI: 10.1101/gr.138925.112
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.pFosill cloning vectors. (A) General map of the pFosill family of modified pFOS1 Fosmid vectors. The cloning site for inserting the genomic DNA fragments is flanked by forward and reverse Illumina-primer sequences (ILMN-F and ILMN-R) and two Nb.BbvCI nicking endonuclease sites. Nicks (yellow triangles) are introduced on two different strands and are located 5′ of the cloning site. ILMN-F is the standard Illumina sequencing primer SBS-3. The reverse primer in pFosill-1 and pFosill-3 is the SBS-8 primer for standard paired-end sequencing. In pFosill-2 and pFosill-4, the reverse primer is SBS-12 for three-read multiplex paired-end sequencing. The pUC-derived portion between the two cos sites is not present in the final circularized Fosmids which replicate under the control of oriS and the F-factor functions repE and sopA-C that ensure proper partition of the Fosmid among the two daughter cells. Vectors are cut at the unique AatII site as well as two restriction sites at the cloning site and dephosphorylated. (B) Cloning site of pFosill-1 (SBS-8 version) and pFosill-2 (SBS-12). Sheared, end-repaired, and size-selected genomic insert fragments are inserted by blunt-end ligation between two dephosphorylated Eco72I sites 4 bp downstream from the ILMN sequencing primers. The SapI sites shown are not useful for cloning as pFosill-1 and -2 harbor three additional SapI sites. (C) pFosill-3 (SBS-8 version) and pFosill-4 (SBS-12) are digested with SapI which excises a single fragment that includes the 3′ ends of the sequencing primers. Sheared and end-repaired genomic insert fragments are ligated to an excess of adapters that provide an 8-bp barcode (orange), the 3′ end of the Illumina sequencing primers, and three non-self-complementary 5′ overhanging bases for sticky-end ligation to the SapI ends of the vector arms. Supplemental Table S1 summarizes the relevant features of all four pFosill vectors.
Figure 2.Conversion of a Fosmid library to an Illumina-compatible Fosill jumping library. (A,B) The two Nb.BbvCI sites in the vector are nicked. (C) The nicks are translated in opposite directions into the cloned insert. (D) The insert is cleaved at the two translated nicks as well as at nicks originating at any BbvCI sites within the genomic DNA sequence. (E) Fragments are circularized by intramolecular ligation. (F) Recircularized vector molecules serve as templates for inverse PCR with full-length Illumina enrichment primers that include the sequences required for bridge-amplification and paired-end sequencing of the coligated termini of the original Fosmid insert on the Illumina flow cell.
Summary statistics for four Fosill libraries
Figure 3.Length distribution of genomic distance spanned by paired-end Fosill sequences. Shown are smoothed histograms of the spacing between unique read pairs in Fosill libraries from S. pombe 972h (A), human K-562 library H1 (gray) and H2 (black) (B), and mouse C57BL/6J (C) in their respective reference genomes. (y-axis) Percentage of all unique read pairs that fall in the 1-kb bin indicated on the x-axis. The percentages of unique read pairs spanning <1 kb and 30–50 kb are indicated.
Barcoded Fosill jumps from a multiplex Fosmid library
Examples of rearrangements in the K-562 genome identified by Fosill
Long-range connectivity of three de novo draft assemblies of the mouse genome