| Literature DB >> 23275695 |
Rommel Thiago Jucá Ramos1, Adriana Ribeiro Carneiro, Vasco Azevedo, Maria Paula Schneider, Debmalya Barh, Artur Silva.
Abstract
UNLABELLED: Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY: Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.Entities:
Keywords: NGS sequencing; ab initio assembly of genomes; redundant sequences
Year: 2012 PMID: 23275695 PMCID: PMC3524941 DOI: 10.6026/97320630008996
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Interface of analysis of contigs of G4ALL. Superposition demonstrating that the last three bases of contig_2648 (white) are different from those of contig NODE_185 (black) and the reference. After elimination of these bases, there is redundancy between the sequences.
Figure 2Criterion of elimination of redundancy of the contigs. Each contig of a group is compared with the others, considering the following possibilities: the – Contig completely redundant; B- Contig that when trimmed at the 5' end is redundant; CContig that when trimmed at the 3' end is redundant; D- Contig that when trimmed at both ends are redundant.
Figure 3Relation between the frequency of contigs of Escherichia coli DH10B and its size range. The X-axis shows the size of the contigs; the Y-axis indicates the number of contigs in each range/library for the fragment and mate-paired libraries, with and without the use of Simplifier.