Literature DB >> 19414532

Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

David V Lu1, Randall H Brown, Manimozhiyan Arumugam, Michael R Brent.   

Abstract

MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics.
RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. AVAILABILITY: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19414532      PMCID: PMC2732315          DOI: 10.1093/bioinformatics/btp273

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Improved spliced alignment from an information theoretic approach.

Authors:  Miao Zhang; Warren Gish
Journal:  Bioinformatics       Date:  2005-11-02       Impact factor: 6.937

3.  The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs.

Authors:  Evan Keibler; Manimozhiyan Arumugam; Michael R Brent
Journal:  Bioinformatics       Date:  2007-01-18       Impact factor: 6.937

4.  PALMA: mRNA to genome alignments using large margin algorithms.

Authors:  Uta Schulze; Bettina Hepp; Cheng Soon Ong; Gunnar Rätsch
Journal:  Bioinformatics       Date:  2007-05-30       Impact factor: 6.937

5.  Uncertainty in homology inferences: assessing and improving genomic sequence alignment.

Authors:  Gerton Lunter; Andrea Rocco; Naila Mimouni; Andreas Heger; Alexandre Caldeira; Jotun Hein
Journal:  Genome Res       Date:  2007-12-11       Impact factor: 9.043

6.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA.

Authors:  R Mott
Journal:  Comput Appl Biosci       Date:  1997-08

7.  GMAP: a genomic mapping and alignment program for mRNA and EST sequences.

Authors:  Thomas D Wu; Colin K Watanabe
Journal:  Bioinformatics       Date:  2005-02-22       Impact factor: 6.937

8.  Pairagon+N-SCAN_EST: a model-based gene annotation pipeline.

Authors:  Manimozhiyan Arumugam; Chaochun Wei; Randall H Brown; Michael R Brent
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

9.  Automated generation of heuristics for biological sequence comparison.

Authors:  Guy St C Slater; Ewan Birney
Journal:  BMC Bioinformatics       Date:  2005-02-15       Impact factor: 3.169

10.  A cross-species alignment tool (CAT).

Authors:  Heng Li; Liang Guan; Tao Liu; Yiran Guo; Wei-Mou Zheng; Gane Ka-Shu Wong; Jun Wang
Journal:  BMC Bioinformatics       Date:  2007-09-19       Impact factor: 3.169

View more
  4 in total

1.  Comparative analysis of information contents relevant to recognition of introns in many species.

Authors:  Hiroaki Iwata; Osamu Gotoh
Journal:  BMC Genomics       Date:  2011-01-19       Impact factor: 3.969

2.  Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.

Authors:  Virag Sharma; Anas Elghafari; Michael Hiller
Journal:  Nucleic Acids Res       Date:  2016-03-25       Impact factor: 16.971

3.  Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.

Authors:  Hiroaki Iwata; Osamu Gotoh
Journal:  Nucleic Acids Res       Date:  2012-07-30       Impact factor: 16.971

4.  ASPic-GeneID: a lightweight pipeline for gene prediction and alternative isoforms detection.

Authors:  Tyler Alioto; Ernesto Picardi; Roderic Guigó; Graziano Pesole
Journal:  Biomed Res Int       Date:  2013-11-07       Impact factor: 3.411

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.