Literature DB >> 19497936

ABACAS: algorithm-based automatic contiguation of assembled sequences.

Samuel Assefa1, Thomas M Keane, Thomas D Otto, Chris Newbold, Matthew Berriman.   

Abstract

SUMMARY: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated.
AVAILABILITY AND IMPLEMENTATION: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net.

Entities:  

Mesh:

Year:  2009        PMID: 19497936      PMCID: PMC2712343          DOI: 10.1093/bioinformatics/btp347

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The recent development of ultra high-throughput sequencing technologies has led to a huge increase in the number of genome sequencing projects being carried out (Mardis and Elaine, 2008). For small genomes, it is now possible to obtain high sequencing coverage with a single run of a new sequencing machine. Therefore there is widespread interest in sequencing large numbers of closely related species or strains where a high quality reference sequence already exists, for instance, to explore population structure and genetic variation. A number of new assemblers have been developed for carrying out both de novo and mapped assemblies from new technology reads (Chaisson et al. 2004; Miller et al. 2008). However, a significant amount of manual intervention is still required to go from a set of contigs to fully contiguated sequence. The problem of rapidly contiguating draft assemblies has existed since the inception of genome sequencing. A number of tools have previously been developed for this purpose such as Bambus (Pop et al. 2004), BACCardI (Bartels et al. 2005), Projector2 (van Hijum et al. 2005) and OSLay (Richter et al. 2007). Ideally, automatic contiguation programs consist of two major parts; ordering and orientating contigs based on a reference and closing gaps between ordered contigs. Bambus requires users to provide linking information between contigs generated from various methods including mapping of contigs to a related genome. Post processing of the resulting set of scaffolds is required to generate a pseudomolecule and close gaps. BACCardI provides support in genome finishing by scaffolding contigs based on virtual clone maps alongside other features. Projector2 is a web service application for closing gaps in prokaryotic genome assemblies. The most recent tool in the area, OSLay requires a mapping file between a reference (or a set of contigs) and sets of contigs to find synteny. The results could be used as inputs to the assembly viewer Consed (Gordon et al. 1998) to design primers. ABACAS is a stand alone program intended to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. Some of the features of ABACAS include showing ambiguous contigs or overlapping contigs, visualizing repetitive regions, considering base qualities of contigs for primer design and enabling users to drag and drop contigs.

2 METHODS

Figure 1 describes the overall pipeline implemented in ABACAS. It uses MUMmer (Kurtz et al. 2004) to find alignment positions and identify areas of synteny of the contigs against the reference. The output is then processed to generate a pseudomolecule taking overlapping contigs and gaps in to account. MUMmer's alignment programs, Nucmer and Promer, are used followed by the ‘delta-filter’ and ‘show-tiling’ utilities.
Fig. 1.

A flow-chart describing the pipeline implemented in ABACAS.

A flow-chart describing the pipeline implemented in ABACAS. Gaps in the pseudomolecule are represented by N's. ABACAS automatically extracts gaps on the pseudomolecule and, based on flanking sequences above a base quality threshold, designs primers for gap closure using Primer3 (Koressaar and Remm 2007). As part of the primer design step the uniqueness of the sequence is checked by running a sensitive NUCmer alignment. ABACAS allows users to adjust parameters such as melting temperature, size, flanking region and size of contig ends to exclude from picking primers. It then produces a list of sense and antisense primer oligos as well as a detailed Primer3 output that contains additional information on each gap position. ABACAS generates a comparison file that can be used to visualize ordered and oriented contigs in ACT, the Artemis Comparison Tool (Carver et al. 2008). Synteny is represented by red bars where colour intensity decreases with lower values of percent identity between comparable blocks. Information on contigs such as the orientation, percent identity, coverage and overlap with other contigs can also be visualized by loading the output feature file on ACT. Contigs that were not mapped can be included separately. Repetitive regions in the reference can also be identified using a MUMmer self-comparison and visualized in ACT alongside quality of the contigs. If all of the contigs are not mapped, there is an option to run tBLASTx (Altschul et al. 1997) on contigs that are not included in the pseudomolecule using sequences from the reference that correspond to the gaps. Additional contigs to the pseudomolecule can be dragged and dropped to the correct position using ACT.

3 IMPLEMENTATON

ABACAS is implemented in Perl and requires MUMmer and (optionally) BLAST installed on the local machine. The user supplies a FASTA file of contigs and a reference genome in FASTA format. As the program can be used in an iterative process of contiguating a genome sequence, the output files produced after each running of the program can be fed back into the program as input.

4 RESULTS AND DISCUSSION

ABACAS has already been used on a number of eukaryote and prokaryote genome projects at the Wellcome Trust Sanger Institute. Prokaryote genome projects include Escherichia coli 8178, Escherichia coli K88, Escherichia coli K99, Yersinia enterocolitica—five biotypes, Clostridium difficle and Mycobacterium canetti. It is also being used to finish a number of eukaryote genomes of Babesia bigemina (four chromosomes), Trypanosoma vivax (11 chromosomes) and Plasmodium Berghei (14 chromosomes). To give a quantitative example of the results, in the Plasmodium berghei chromosome 2 finishing effort, the number of contigs was reduced from 60 to 36 with nine potential joins via ABACAS. Forty-six PCR products were generated to close gaps and 38 of these were successful in closing gaps in the assembly. In C. difficle cdbi1, the initial number of contigs larger than 2 kb was reduced from 37 to 10 after PCR and primer walks. On the other hand, the six contigs of C. difficle cd196 were contiguated to a single contig based on primers suggested by ABACAS. The main difference compared to existing tools is the possibility to include ABACAS in a high-throughput automated workflow. In summary, ABACAS is primarily used in post-assembly applications as a finishing tool. It is used both to generate high quality reference-based genome scaffolds and to assist finishing efforts by directing primer design. At the Sanger institute, ABACAS has been included in the production pipeline to automatically finish draft assemblies. Ongoing work includes improving mapping of contigs in highly divergent species.
  12 in total

1.  Hierarchical scaffolding with Bambus.

Authors:  Mihai Pop; Daniel S Kosack; Steven L Salzberg
Journal:  Genome Res       Date:  2004-01       Impact factor: 9.043

2.  Fragment assembly with short reads.

Authors:  Mark Chaisson; Pavel Pevzner; Haixu Tang
Journal:  Bioinformatics       Date:  2004-04-01       Impact factor: 6.937

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  Enhancements and modifications of primer design program Primer3.

Authors:  Triinu Koressaar; Maido Remm
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

5.  OSLay: optimal syntenic layout of unfinished assemblies.

Authors:  Daniel C Richter; Stephan C Schuster; Daniel H Huson
Journal:  Bioinformatics       Date:  2007-04-26       Impact factor: 6.937

Review 6.  The impact of next-generation sequencing technology on genetics.

Authors:  Elaine R Mardis
Journal:  Trends Genet       Date:  2008-02-11       Impact factor: 11.639

7.  Consed: a graphical tool for sequence finishing.

Authors:  D Gordon; C Abajian; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

8.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

9.  Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies.

Authors:  Sacha A F T van Hijum; Aldert L Zomer; Oscar P Kuipers; Jan Kok
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  Aggressive assembly of pyrosequencing reads with mates.

Authors:  Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton
Journal:  Bioinformatics       Date:  2008-10-24       Impact factor: 6.937

View more
  253 in total

Review 1.  A beginner's guide to eukaryotic genome annotation.

Authors:  Mark Yandell; Daniel Ence
Journal:  Nat Rev Genet       Date:  2012-04-18       Impact factor: 53.242

2.  Seeking perfection.

Authors:  Thomas D Otto
Journal:  Nat Rev Microbiol       Date:  2010-10       Impact factor: 60.633

3.  CSAR-web: a web server of contig scaffolding using algebraic rearrangements.

Authors:  Kun-Tze Chen; Chin Lung Lu
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

4.  Generating whole bacterial genome sequences of low-abundance species from complex samples with IMS-MDA.

Authors:  Helena M B Seth-Smith; Simon R Harris; Paul Scott; Surendra Parmar; Peter Marsh; Magnus Unemo; Ian N Clarke; Julian Parkhill; Nicholas R Thomson
Journal:  Nat Protoc       Date:  2013-11-07       Impact factor: 13.491

5.  Identification of the hcb Gene Operon Involved in Catalyzing Aerobic Hexachlorobenzene Dechlorination in Nocardioides sp. Strain PD653.

Authors:  Koji Ito; Kazuhiro Takagi; Akio Iwasaki; Naoto Tanaka; Yu Kanesaki; Fabrice Martin-Laurent; Shizunobu Igimi
Journal:  Appl Environ Microbiol       Date:  2017-09-15       Impact factor: 4.792

6.  Whole genome analysis of sierra nevada virus, a novel mononegavirus in the family nyamiviridae.

Authors:  Matthew B Rogers; Lijia Cui; Adam Fitch; Vsevolod Popov; Amelia P A Travassos da Rosa; Nikos Vasilakis; Robert B Tesh; Elodie Ghedin
Journal:  Am J Trop Med Hyg       Date:  2014-04-28       Impact factor: 2.345

7.  Genome sequence of Blattabacterium sp. strain BGIGA, endosymbiont of the Blaberus giganteus cockroach.

Authors:  Charlie Y Huang; Zakee L Sabree; Nancy A Moran
Journal:  J Bacteriol       Date:  2012-08       Impact factor: 3.490

8.  Acquisition of the Sda1-encoding bacteriophage does not enhance virulence of the serotype M1 Streptococcus pyogenes strain SF370.

Authors:  Carola Venturini; Cheryl-Lynn Y Ong; Christine M Gillen; Nouri L Ben-Zakour; Peter G Maamary; Victor Nizet; Scott A Beatson; Mark J Walker
Journal:  Infect Immun       Date:  2013-03-25       Impact factor: 3.441

9.  Comparative genome and phenotypic analysis of Clostridium difficile 027 strains provides insight into the evolution of a hypervirulent bacterium.

Authors:  Richard A Stabler; Miao He; Lisa Dawson; Melissa Martin; Esmeralda Valiente; Craig Corton; Trevor D Lawley; Mohammed Sebaihia; Michael A Quail; Graham Rose; Dale N Gerding; Maryse Gibert; Michel R Popoff; Julian Parkhill; Gordon Dougan; Brendan W Wren
Journal:  Genome Biol       Date:  2009-09-25       Impact factor: 13.583

10.  r2cat: synteny plots and comparative assembly.

Authors:  Peter Husemann; Jens Stoye
Journal:  Bioinformatics       Date:  2009-12-16       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.