Thomas D Wu1, Colin K Watanabe. 1. Department of Bioinformatics Genentech, Inc., South San Francisco, CA 94080, USA. twu@gene.com
Abstract
MOTIVATION: We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION: http://www.gene.com/share/gmap.
MOTIVATION: We introduce GMAP, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment, sandwich DP for splice site detection, and microexon identification with statistical significance testing. RESULTS: On a set of human messenger RNAs with random mutations at a 1 and 3% rate, GMAP identified all splice sites accurately in over 99.3% of the sequences, which was one-tenth the error rate of existing programs. On a large set of human expressed sequence tags, GMAP provided higher-quality alignments more often than blat did. On a set of Arabidopsis cDNAs, GMAP performed comparably with GeneSeqer. In these experiments, GMAP demonstrated a several-fold increase in speed over existing programs. AVAILABILITY: Source code for gmap and associated programs is available at http://www.gene.com/share/gmap SUPPLEMENTARY INFORMATION: http://www.gene.com/share/gmap.
Authors: Bianca O Carmello; Rafael L B Coan; Adauto L Cardoso; Erica Ramos; Bruno E A Fantinatti; Diego F Marques; Rogério A Oliveira; Guilherme T Valente; Cesar Martins Journal: Chromosome Res Date: 2017-08-03 Impact factor: 5.239
Authors: Nanette R Boyle; Mark Dudley Page; Bensheng Liu; Ian K Blaby; David Casero; Janette Kropat; Shawn J Cokus; Anne Hong-Hermesdorf; Johnathan Shaw; Steven J Karpowicz; Sean D Gallaher; Shannon Johnson; Christoph Benning; Matteo Pellegrini; Arthur Grossman; Sabeeha S Merchant Journal: J Biol Chem Date: 2012-03-08 Impact factor: 5.157
Authors: Emily A Lescak; Susan L Bassham; Julian Catchen; Ofer Gelmond; Mary L Sherbick; Frank A von Hippel; William A Cresko Journal: Proc Natl Acad Sci U S A Date: 2015-12-14 Impact factor: 11.205
Authors: Ratan Chopra; Gloria Burow; Andrew Farmer; Joann Mudge; Charles E Simpson; Thea A Wilkins; Michael R Baring; Naveen Puppala; Kelly D Chamberlin; Mark D Burow Journal: Mol Genet Genomics Date: 2015-02-07 Impact factor: 3.291