Literature DB >> 17868451

Optical mapping as a routine tool for bacterial genome sequence finishing.

Phil Latreille1, Stacie Norton, Barry S Goldman, John Henkhaus, Nancy Miller, Brad Barbazuk, Helge B Bode, Creg Darby, Zijin Du, Steve Forst, Sophie Gaudriault, Brad Goodner, Heidi Goodrich-Blair, Steven Slater.   

Abstract

BACKGROUND: In sequencing the genomes of two Xenorhabdus species, we encountered a large number of sequence repeats and assembly anomalies that stalled finishing efforts. This included a stretch of about 12 Kb that is over 99.9% identical between the plasmid and chromosome of X. nematophila.
RESULTS: Whole genome restriction maps of the sequenced strains were produced through optical mapping technology. These maps allowed rapid resolution of sequence assembly problems, permitted closing of the genome, and allowed correction of a large inversion in a genome assembly that we had considered finished.
CONCLUSION: Our experience suggests that routine use of optical mapping in bacterial genome sequence finishing is warranted. When combined with data produced through 454 sequencing, an optical map can rapidly and inexpensively generate an ordered and oriented set of contigs to produce a nearly complete genome sequence assembly.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17868451      PMCID: PMC2045679          DOI: 10.1186/1471-2164-8-321

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Xenorhabdus species are symbiotic bacteria associated with insectivorous nematodes of the genus Steinernema (for review see [1]) They reside in a specialized segment of the nematode gut [2,3], and provide insecticidal proteins [4,5] and small molecules [6-10] that help to kill the insect larvae that are the prey of the nematode. Both organisms reproduce in the dead larvae, the Xenorhabdus colonize the young nematodes, and the cycle repeats [11]. Xenorhabdus are closely related to the enteric gamma proteobacteria such as Escherichia coli [12], and are an emerging model for both mutualism and pathogenicity in invertebrate hosts. To better understand the genetic basis of these relationships, we are sequencing the genomes of two Xenorhabdus species: X. nematophila ATCC 19061 and an X. bovienii strain from Monsanto's collection. In the course of this work, we found that the X. nematophila genome contained large numbers of highly repetitive DNA regions, and efforts to finish the genome stalled. We sought a means to produce whole-genome maps for comparison with the genomic DNA sequence, and identified optical mapping as a useful means to align and orient the genome sections in silico. In addition, we produced an optical map of a second genome that we had considered finished, and identified a large sequence inversion that would have otherwise been unnoticed.

Results

A whole-genome restriction map permits finishing of the X. nematophila genome sequence

Eight-fold genome sequence coverage of X. nematophila ATCC19061 (Goodrich-Blair et al, in preparation) was generated, with 26,976 reads from a 2–4 kb insert library and 41,376 reads from a 4–8 kb insert library. This yielded an initial assembly consisting of 100 contiguous sequences (contigs) greater than 2 kb, 14 contigs greater than 100 kb, and 2 contigs greater than 200 kb length. Our initial research had shown the presence of a 150 Kb plasmid in addition to the circular chromosome (Goodrich-Blair and Goodner, unpublished). It became rapidly clear that multiple areas of repeated sequence were causing problems. In fact, the final X. nematophila sequence assembly shows a nearly identical 12 Kb region found on both the plasmid and chromosome, many transposons (including over 30 copies of a single transposon) scattered throughout the genome, and seven rRNA regions. Using the paired clone-end sequences and syntenic comparison to the related species Photorhabdus luminescens [13], resolution of misassembles and gap closure was attempted by walking across individual clones and amplifying potentially adjacent regions using the polymerase chain reaction (PCR). The resulting assembly contained over 50 contigs, but most lacked linkage information from gap-spanning paired ends. Multiplex PCR resolved some gaps, but provided no indication about whether the amplified product was actually the correct size, whether a particular gap was resistant to amplification, or whether a reaction failed because the primers were not properly paired to cross a gap. After four months of concerted effort, the assembly still contained 36 contigs which collectively contained several hundred copies of transposons plus seven ribosomal RNA coding regions. Given this complexity, optical mapping was attempted to provide a structural scaffold for aligning and orienting the contigs. Optical mapping permits assembly of whole-genome restriction endonuclease maps by digesting immobilized DNA molecules and determining the size and order of fragments [14-22]. In collaboration with OpGen Technologies (Madison, WI), optical maps of X. nematophila ATCC19061 were produced using AflII and EagI restriction enzymes. Through repeated overlapping of restriction maps from individual molecules (over 50-fold coverage), OpGen's assembler program reconstructed the ordered restriction map of the genome [23]. Each restriction map produced by optical mapping was aligned with the restriction map predicted from the X. nematophila genome sequence. The map permitted alignment and orientation of all 36 contigs, and identification of misassemblies, allowing production of PCR products to cover all remaining gaps in the sequence (Figure 1 panel A). Once the optical map was available, PCR, sequencing, and validation of the final assembly were accomplished in approximately one month. The map also detected several regions of misassembled sequence, including a plasmid that was integrated into the chromosomal sequence among the assembled contigs (Figure 1 panel B). The plasmid shares a highly conserved stretch of sequence with the chromosome (only 37 bp differences over approximately 12.5 kb), and this duplication led to the in silico misassembly. The final sequenced genome aligned directly to the restriction map generated by optical mapping (Figure 1 panel C).
Figure 1

Alignments between the whole-genome optical maps and the . Green regions indicate perfect alignment, white regions indicate no alignment, red regions indicate sequence that is present on at least two contigs, and yellow regions indicate inversions. Lines between maps indicate the position of identical sequences on the two maps, and can be used to visually identify misassemblies and inversions. Panel A: An early comparison of an optical map derived from EagI digestion of the X. nematophila genome to the assembled contigs generated by traditional sequencing technologies. All contigs could be ordered for gap closure. In addition, the optical map indicated an overlooked misassembly. Panel B: The finishing strategy, including gap closure and misassembly resolution, was simplified using the optical map as an assembly model. The X. nematophila optical map derived from an AflIII digestion of the chromosome is presented as a single contig in the center. The sequenced genome contains nine contigs that have a corresponding match to the optical map. The X. nematophila plasmid is 158 Kb and is too small to be identified using the current optical map technology. Nonetheless, small sections of the plasmid can be identified as regions that do not have corresponding optical map locations (white in figure). Panel C: Comparison of the final assembly of the X. nematophila genome (bottom) to the optical map (top) for the EagI digest. The non-aligned contig represents the plasmid, which was generated by traditional sequencing technologies. Panel D: Comparison of the finished sequence of Xenorhabdus bovienii to the EagI optical map revealed a large inverted region of the genome. The red regions indicate regions of repeats within the genome that cannot be resolved by optical mapping. These regions were resolved using traditional sequencing methods. The sequenced genome was easily re-oriented to correct the assembly.

Alignments between the whole-genome optical maps and the . Green regions indicate perfect alignment, white regions indicate no alignment, red regions indicate sequence that is present on at least two contigs, and yellow regions indicate inversions. Lines between maps indicate the position of identical sequences on the two maps, and can be used to visually identify misassemblies and inversions. Panel A: An early comparison of an optical map derived from EagI digestion of the X. nematophila genome to the assembled contigs generated by traditional sequencing technologies. All contigs could be ordered for gap closure. In addition, the optical map indicated an overlooked misassembly. Panel B: The finishing strategy, including gap closure and misassembly resolution, was simplified using the optical map as an assembly model. The X. nematophila optical map derived from an AflIII digestion of the chromosome is presented as a single contig in the center. The sequenced genome contains nine contigs that have a corresponding match to the optical map. The X. nematophila plasmid is 158 Kb and is too small to be identified using the current optical map technology. Nonetheless, small sections of the plasmid can be identified as regions that do not have corresponding optical map locations (white in figure). Panel C: Comparison of the final assembly of the X. nematophila genome (bottom) to the optical map (top) for the EagI digest. The non-aligned contig represents the plasmid, which was generated by traditional sequencing technologies. Panel D: Comparison of the finished sequence of Xenorhabdus bovienii to the EagI optical map revealed a large inverted region of the genome. The red regions indicate regions of repeats within the genome that cannot be resolved by optical mapping. These regions were resolved using traditional sequencing methods. The sequenced genome was easily re-oriented to correct the assembly.

Optical mapping identifies an assembly error in the X. bovienii sequence

In addition to X. nematophila, we had previously sequenced and assembled the genome of the related organism X. bovienii using traditional finishing technologies. Although the X. bovienii genome does not contain as many repeats as that of X. nematophila, the X. nematophila project had shown the value of non-sequence-based methodologies in validating sequence assemblies. After generating an optical map for X. bovienii (NCBI designation Xenorhabdus bovienii SS-2004) using AflIII, a large inversion was detected in the sequence assembly, permitting a simple re-orientation of the data and correction of the genome sequence (Figure 1 panel D). It is doubtful that this assembly inversion would have been detected without the optical map.

Discussion

The Xenorhabdus genomes analyzed in this project contain many highly repetitive regions, and these became a major obstacle in our attempts to assemble the genome sequences. Genome finishing traditionally relies on cosmid libraries or overlapping restriction maps of BACs to build larger meta-contigs. With the X. nematophila genome the traditional approach failed, and we used a genome-scale restriction map generated by optical mapping. This permitted rapid and accurate closing of X. nematophila, and provided savings of labor, reagents and time. Finishing the X. nematophila genome sequence would have otherwise required production of a fine-scale genetic-physical map at much greater cost in time and materials. Optical mapping also identified an inversion in the X. bovienii genome sequence assembly that we had considered finished. High throughput processes like DNA sequencing normally require trade-offs among cost, speed, and data quality. Sequencing costs are being reduced, and speed increased, by novel methods such as the pyrosequencing technology of 454 Life Sciences [24,25]. However, 454 technology produces shorter sequences (100 to 250 bases per reaction) than traditional Sanger sequencing using ABI instrumentation (800–1000 bases per reaction). These shorter 454-derived sequences mean that sequence contigs are also, on average, shorter than those produced using ABI instruments. However, the lower quality of sequence assemblies from 454 data is compensated by speed and cost considerations. Excluding the cost of purchasing the instrumentation and labor, a typical 5 Mb bacterial genome takes approximately 2 days and costs about $6,000 in consumables using 454. The same genome sequence produced by ABI instrumentation would cost approximately 10-fold more and take several weeks. In our experience, a typical 5 Mb assembly using 454 data would contain about 80–90 contigs, with an average length around 60–70 Kb. A similar genome assembled using data from ABI 3730 instruments would contain about 50 contigs with an average length >100 kb. Both strategies would typically add about 4,000 end-paired sequences from cosmids or phosmids to help scaffold the genome, at a cost of about another $4,000. The current cost for an optical map with a single enzyme is approximately $7,000, and adding a second enzyme costs around another $3,000 (in our experience, only one enzyme is typically required). The optical mapping system can accurately quantify fragments down to about 4 kb in size, and a contig of 40 kb has an approximately 80% probability of being placed within a whole genome optical map (OpGen, unpublished data). When all of these data are combined, a 454 shotgun sequence plus cosmid end sequences and an optical map, can produce an assembled and oriented set of contigs containing about 95% of the genome for under $20,000 with very limited input by a human finisher. This is about one-fifth the cost of a project produced through traditional means, provides very high quality data, and puts production of finished bacterial genomes within the reach of even small labs. We are currently working on a genome produced in this manner that will be primarily closed using undergraduate researchers supported by some bioinformatics infrastructure.

Conclusion

Even on these relatively small genomes, the whole-genome maps were very valuable. In the X. nematophila project, we had the advantages of long sequence reads and clone end-pairing data, yet still were unable to assemble contigs because of the presence of numerous highly repetitive sequences. The optical map allowed rapid closure of one genome and identified an assembly error in a fully-assembled genome sequence that gave no prior indication of having errors. As shotgun sequencing costs come down, the optical map becomes a significant portion of the budget for a new bacterial genome sequence. However, for genomes that contain particularly large numbers of repetitive sequences, require finishing, or simply require ordered and oriented contigs from shotgun sequence, an optical map can increase the speed and decrease the overall cost of the project. We also expect that mapping costs will come down as optical mapping becomes more routinely used by sequencing centers, and as resolution of fragment size moves toward the 1–2 kb range. We now routinely confirm the in silico assemblies of bacterial genomes using a whole-genome restriction map, and believe this is a relatively low cost method to speed finishing and ensure accuracy of finished bacterial genome sequences.

Methods

Genomic library construction, DNA sequencing, and finishing

The genomic DNA was sonicated at scale of 8.5 for two seconds, repeated 3 times (Missonex Inc. Sonicator XL2020). The ends were repaired using T4 DNA polymerase and T4 kinase (NEB) and fractionated on a 1% agarose gel. Fractions representing size ranges 2–4 KB and 4–8 KB were excised from the gel and purified using a Qiagen Gel Quick extraction column (Qiagen, Cat No 28704). DNA samples from the isolated fractions were checked for size on an agarose gel and then ligated into pUC18. Clones were plated and colonies picked on a Q-Bot (Genetix), to achieve 80% of sequence from the 2–4 KB library and 20% of sequence from the 4–8 KB library. Each template was sequenced using the Big Dye terminator protocol (Applied Biosystems) and analyzed on ABI 3700 and ABI 3730 sequencers. Both the forward (M13 -40) and reverse (M13 -21) primer were used on each template, yielding two related sequences per subclone. Data were assembled using phred/phrap (ver. 0.990319; [26,27]), and finished in Consed and Autofinish (v.13.0; [28-30]) using a variety of directed primer walks on subclones, and using PCR/walking to close any gaps. The sequence assemblies were confirmed by OpGen using optical mapping, as described below and previously [14-22]. These alignments were viewed using OpGen's MapViewer software (Figure 1; see below).

Optical map construction

Optical maps were prepared at OpGen Technologies, Inc. (Madison, WI) according to methods described previously [22,23]. Briefly, high molecular weight DNA was prepared by first embedding bacterial cells harvested at stationary phase in low melting temperature agarose plugs, followed by treatment with bacterial lysing solutions. The genomic DNA was recovered after thoroughly rinsing the plugs in TE followed by melting the plugs at 42 C and subsequent treatment with β-agarase. The high molecular weight DNA was then immobilized as individual molecules onto Optical Chips, digested with EagI or AflII restriction enzymes (New England Biolabs), fluorescently stained with YOYO-1 (Invitrogen) and positioned onto an automated fluorescent microscope system for image capture and fragment size measurement, resulting in high resolution single-molecule restriction maps. Collections of single molecule maps were then assembled to produce whole genome, ordered restriction maps.

Sequence-to-map comparison

Comparisons between Optical maps and sequence contigs were performed as described previously [22]. Sequence FASTA files were converted to in silico restriction maps via the MapViewer software (OpGen Technologies, Inc.) for direct comparison to the Optical maps. Comparisons were accomplished by aligning the sequence with the Optical maps according to their restriction fragment pattern. Alignments were generated with a dynamic programming algorithm which finds the optimal location, or placement, of a sequence contig by first performing a global alignment of the sequence contig against the Optical map. Local alignment analysis were also performed where segments of the sequence contigs were compared to the Optical map.

Competing interests

JH is employed by OpGen Technologies, Inc., the commercial provider of optical mapping technology.

Authors' contributions

PL performed quality control analysis on the sequence and prepared an early draft of the manuscript. JH produced the optical maps. SN, ZD and NM performed the genome finishing work. SS assisted with data analysis and was the primary writer of the manuscript. BB ran automated annotation on the genomes and provided an HTML interface for analysis. HGB, SF and BG performed genetic and molecular analysis of the strains to confirm their identity prior to sequencing and optical mapping. HGB, SF, BG, HB, CD and SG assisted with data analysis. BSG conceived and coordinated the project and helped to write the manuscript.
  24 in total

1.  N-phenethyl-2-phenylacetamide isolated from Xenorhabdus nematophilus induces apoptosis through caspase activation and calpain-mediated Bax cleavage in U937 cells.

Authors:  Seok-Young Hwang; Seunguk Paik; Sun-Ho Park; Hyun-Su Kim; In-Seon Lee; Sang-Pyo Kim; Won-Ki Baek; Min-Ho Suh; Taeg Kyu Kwon; Jong-Wook Park; Jae-Bok Park; Jung-Jeung Lee; Seong-Il Suh
Journal:  Int J Oncol       Date:  2003-01       Impact factor: 5.650

2.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

3.  A shotgun optical map of the entire Plasmodium falciparum genome.

Authors:  Z Lai; J Jing; C Aston; V Clarke; J Apodaca; E T Dimalanta; D J Carucci; M J Gardner; B Mishra; T S Anantharaman; S Paxia; S L Hoffman; J Craig Venter; E J Huff; D C Schwartz
Journal:  Nat Genet       Date:  1999-11       Impact factor: 38.330

4.  Sequence analysis of insecticidal genes from Xenorhabdus nematophilus PMFI296.

Authors:  J A Morgan; M Sergeant; D Ellis; M Ousley; P Jarrett
Journal:  Appl Environ Microbiol       Date:  2001-05       Impact factor: 4.792

5.  Shotgun optical maps of the whole Escherichia coli O157:H7 genome.

Authors:  A Lim; E T Dimalanta; K D Potamousis; G Yen; J Apodoca; C Tao; J Lin; R Qi; J Skiadas; A Ramanathan; N T Perna; G Plunkett; V Burland; B Mau; J Hackett; F R Blattner; T S Anantharaman; B Mishra; D C Schwartz
Journal:  Genome Res       Date:  2001-09       Impact factor: 9.043

6.  Automated finishing with autofinish.

Authors:  D Gordon; C Desmarais; P Green
Journal:  Genome Res       Date:  2001-04       Impact factor: 9.043

7.  Interactions of insecticidal toxin gene products from Xenorhabdus nematophilus PMFI296.

Authors:  Martin Sergeant; Paul Jarrett; Margaret Ousley; J Alun W Morgan
Journal:  Appl Environ Microbiol       Date:  2003-06       Impact factor: 4.792

8.  Xenorhabdus nematophilus inhibits p-bromophenacyl bromide (BPB)-sensitive PLA2 of Spodoptera exigua.

Authors:  Youngjin Park; Yonggyun Kim
Journal:  Arch Insect Biochem Physiol       Date:  2003-11       Impact factor: 1.698

9.  The bacterium Xenorhabdus nematophilus depresses nodulation reactions to infection by inhibiting eicosanoid biosynthesis in tobacco hornworms, Manduca sexta.

Authors:  Youngjin Park; Yonggyun Kim; Sean M Putnam; David W Stanley
Journal:  Arch Insect Biochem Physiol       Date:  2003-02       Impact factor: 1.698

10.  The genome sequence of the entomopathogenic bacterium Photorhabdus luminescens.

Authors:  Eric Duchaud; Christophe Rusniok; Lionel Frangeul; Carmen Buchrieser; Alain Givaudan; Séad Taourit; Stéphanie Bocs; Caroline Boursaux-Eude; Michael Chandler; Jean-François Charles; Elie Dassa; Richard Derose; Sylviane Derzelle; Georges Freyssinet; Sophie Gaudriault; Claudine Médigue; Anne Lanois; Kerrie Powell; Patricia Siguier; Rachel Vincent; Vincent Wingate; Mohamed Zouine; Philippe Glaser; Noël Boemare; Antoine Danchin; Frank Kunst
Journal:  Nat Biotechnol       Date:  2003-10-05       Impact factor: 54.908

View more
  66 in total

1.  Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilis pdc and adhB genes.

Authors:  Peter C Turner; Lorraine P Yomano; Laura R Jarboe; Sean W York; Christy L Baggett; Brélan E Moritz; Emily B Zentz; K T Shanmugam; Lonnie O Ingram
Journal:  J Ind Microbiol Biotechnol       Date:  2011-11-11       Impact factor: 3.346

2.  Assessment of whole-genome mapping in a well-defined outbreak of Salmonella enterica serotype Saintpaul.

Authors:  P D Fey; P C Iwen; E B Zentz; A M Briska; J K Henkhaus; K A Bryant; M A Larson; R K Noel; S H Hinrichs
Journal:  J Clin Microbiol       Date:  2012-06-20       Impact factor: 5.948

3.  Genome sequence of Aggregatibacter actinomycetemcomitans serotype c strain D11S-1.

Authors:  Casey Chen; Weerayuth Kittichotirat; Yan Si; Roger Bumgarner
Journal:  J Bacteriol       Date:  2009-10-09       Impact factor: 3.490

Review 4.  An introduction to the medicinal plant genome project.

Authors:  Shilin Chen; Li Xiang; Xu Guo; Qiushi Li
Journal:  Front Med       Date:  2011-06-22       Impact factor: 4.592

Review 5.  Beyond gel electrophoresis: microfluidic separations, fluorescence burst analysis, and DNA stretching.

Authors:  Kevin D Dorfman; Scott B King; Daniel W Olson; Joel D P Thomas; Douglas R Tree
Journal:  Chem Rev       Date:  2012-11-12       Impact factor: 60.622

6.  Complete genome sequence of the fish pathogen Flavobacterium branchiophilum.

Authors:  Marie Touchon; Paul Barbier; Jean-François Bernardet; Valentin Loux; Benoit Vacherie; Valérie Barbe; Eduardo P C Rocha; Eric Duchaud
Journal:  Appl Environ Microbiol       Date:  2011-09-16       Impact factor: 4.792

7.  Complete genome sequence of Actinobacillus suis H91-0380, a virulent serotype O2 strain.

Authors:  Janet I MacInnes; Joanne Mackinnon; Adina R Bujold; Kim Ziebell; Andrew M Kropinski; John H E Nash
Journal:  J Bacteriol       Date:  2012-12       Impact factor: 3.490

8.  Finishing genomes with limited resources: lessons from an ensemble of microbial genomes.

Authors:  Niranjan Nagarajan; Christopher Cook; Mariapia Di Bonaventura; Hong Ge; Allen Richards; Kimberly A Bishop-Lilly; Robert DeSalle; Timothy D Read; Mihai Pop
Journal:  BMC Genomics       Date:  2010-04-16       Impact factor: 3.969

9.  The sequence of a 1.8-mb bacterial linear plasmid reveals a rich evolutionary reservoir of secondary metabolic pathways.

Authors:  Marnix H Medema; Axel Trefzer; Andriy Kovalchuk; Marco van den Berg; Ulrike Müller; Wilbert Heijne; Liang Wu; Mohammad T Alam; Catherine M Ronning; William C Nierman; Roel A L Bovenberg; Rainer Breitling; Eriko Takano
Journal:  Genome Biol Evol       Date:  2010-07-12       Impact factor: 3.416

10.  Comparative whole-genome mapping to determine Staphylococcus aureus genome size, virulence motifs, and clonality.

Authors:  Sanjay K Shukla; Madhulatha Pantrangi; Buffy Stahl; Adam M Briska; Mary E Stemper; Trevor K Wagner; Emily B Zentz; Steven M Callister; Steven D Lovrich; John K Henkhaus; Colin W Dykes
Journal:  J Clin Microbiol       Date:  2012-08-22       Impact factor: 5.948

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.