| Literature DB >> 23888102 |
Rommel Thiago Jucá Ramos1, Adriana R Carneiro, Pablo H Caracciolo, Vasco Azevedo, Maria Paula C Schneider, Debmalya Barh, Artur Silva.
Abstract
UNLABELLED: Genome assembly has always been complicated due to the inherent difficulties of sequencing technologies, as well the computational methods used to process sequences. Although many of the problems for the generation of contigs from reads are well known, especially those involving short reads, the orientation and ordination of contigs in the finishing stages is still very challenging and time consuming, as it requires the manual curation of the contigs to guarantee correct identification them and prevent misassembly. Due to the large numbers of sequences that are produced, especially from the reads produced by next generation sequencers, this process demands considerable manual effort, and there are few software options available to facilitate the process. To address this problem, we have developed the Graphic Contig Analyzer for All Sequencing Platforms (G4ALL): a stand-alone multi-user tool that facilitates the editing of the contigs produced in the assembly process. Besides providing information on the gene products contained in each contig, obtained through a search of the available biological databases, G4ALL produces a scaffold of the genome, based on the overlap of the contigs after curation. AVAILABILITY: THE SOFTWARE IS AVAILABLE AT: http://www.genoma.ufpa.br/rramos/softwares/g4all.xhtml.Entities:
Keywords: Bioinformatic tools; Genome assembly; sequence analysis; software
Year: 2013 PMID: 23888102 PMCID: PMC3717189 DOI: 10.6026/97320630009599
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1The interface lists the contigs that were curated, showing their position in the reference genome and in relation to the other contigs. After selecting one of these contigs, it is possible to move it to the left or to the right to correct the alignment in relation to the others, copy, save or edit each sequence, and trim both ends. In addition, the BLAST results can be observed through the menu “Blast Result”, in order to identify the existing gene products.
Figure 2Assembly of the consensus sequence. When the contigs are aligned with each other by similarity, they are extended to produce the consensus sequence
Figure 3Possible alignments of the contigs against the reference sequence provoke multiple Blast alignments; A) Two regions of the contig align with two parts of the reference due to the deletion of a region in the contig; B) A region was inserted into the contig that is not in the reference; C). The central region of the contig does not align with the reference, which could indicate an error in assembly or sequence insertion.
Figure 4Analysis of synteny using the genome of C. pseudotuberculosis 316 and 258. We observed a highly conserved gene order between genomes.