Miao Zhang1, Warren Gish. 1. Department of Genetics, School of Medicine, Washington University-St Louis, 4566 Scott Avenue, St Louis, MO 63110, USA.
Abstract
MOTIVATION: mRNA sequences and expressed sequence tags represent some of the most abundant experimental data for identifying genes and alternatively spliced products in metazoans. These transcript sequences are frequently studied by aligning them to a genomic sequence template. For existing programs, error-prone, polymorphic and cross-species data, as well as non-canonical splice sites, still present significant barriers to producing accurate, complete alignments. RESULTS: We took a novel approach to spliced alignment that meaningfully combined information from sequence similarity with that obtained from PSSM splice site models. Scoring systems were chosen to maximize their power of discrimination, and dynamic programming (DP) was employed to guarantee optimal solutions would be found. The resultant program, EXALIN, performed better than other popular tools tested under a wide range of conditions that included detection of micro-exons and human-mouse cross-species comparisons. For improved speed with only a marginal decrease in splice site prediction accuracy, EXALIN could perform limited DP guided by a result from BLASTN. AVAILABILITY: The source code, binaries, scripts, scoring matrices and splice site models for human, mouse, rice and Caenorhabditis elegans utilized in this study are posted at http://blast.wustl.edu/exalin. The software (scripts, source code and binaries) is copyrighted but free for all to use.
MOTIVATION: mRNA sequences and expressed sequence tags represent some of the most abundant experimental data for identifying genes and alternatively spliced products in metazoans. These transcript sequences are frequently studied by aligning them to a genomic sequence template. For existing programs, error-prone, polymorphic and cross-species data, as well as non-canonical splice sites, still present significant barriers to producing accurate, complete alignments. RESULTS: We took a novel approach to spliced alignment that meaningfully combined information from sequence similarity with that obtained from PSSM splice site models. Scoring systems were chosen to maximize their power of discrimination, and dynamic programming (DP) was employed to guarantee optimal solutions would be found. The resultant program, EXALIN, performed better than other popular tools tested under a wide range of conditions that included detection of micro-exons and human-mouse cross-species comparisons. For improved speed with only a marginal decrease in splice site prediction accuracy, EXALIN could perform limited DP guided by a result from BLASTN. AVAILABILITY: The source code, binaries, scripts, scoring matrices and splice site models for human, mouse, rice and Caenorhabditis elegans utilized in this study are posted at http://blast.wustl.edu/exalin. The software (scripts, source code and binaries) is copyrighted but free for all to use.
Authors: Anke Burmester; Ekaterina Shelest; Gernot Glöckner; Christoph Heddergott; Susann Schindler; Peter Staib; Andrew Heidel; Marius Felder; Andreas Petzold; Karol Szafranski; Marc Feuermann; Ivo Pedruzzi; Steffen Priebe; Marco Groth; Robert Winkler; Wenjun Li; Olaf Kniemeyer; Volker Schroeckh; Christian Hertweck; Bernhard Hube; Theodore C White; Matthias Platzer; Reinhard Guthke; Joseph Heitman; Johannes Wöstemeyer; Peter F Zipfel; Michel Monod; Axel A Brakhage Journal: Genome Biol Date: 2011-01-19 Impact factor: 13.583
Authors: Jorge M C Mondego; Marcelo F Carazzolle; Gustavo G L Costa; Eduardo F Formighieri; Lucas P Parizzi; Johana Rincones; Carolina Cotomacci; Dirce M Carraro; Anderson F Cunha; Helaine Carrer; Ramon O Vidal; Raíssa C Estrela; Odalys García; Daniela P T Thomazella; Bruno V de Oliveira; Acássia Bl Pires; Maria Carolina S Rio; Marcos Renato R Araújo; Marcos H de Moraes; Luis A B Castro; Karina P Gramacho; Marilda S Gonçalves; José P Moura Neto; Aristóteles Góes Neto; Luciana V Barbosa; Mark J Guiltinan; Bryan A Bailey; Lyndel W Meinhardt; Julio Cm Cascardo; Gonçalo A G Pereira Journal: BMC Genomics Date: 2008-11-18 Impact factor: 3.969