Literature DB >> 17537755

PALMA: mRNA to genome alignments using large margin algorithms.

Uta Schulze1, Bettina Hepp, Cheng Soon Ong, Gunnar Rätsch.   

Abstract

MOTIVATION: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task.
RESULTS: We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm-called PALMA-tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from Caenorhabditis elegans and human, it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. AVAILABILITY: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/palma

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17537755     DOI: 10.1093/bioinformatics/btm275

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

Authors:  David V Lu; Randall H Brown; Manimozhiyan Arumugam; Michael R Brent
Journal:  Bioinformatics       Date:  2009-05-04       Impact factor: 6.937

2.  Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays.

Authors:  Georg Zeller; Richard M Clark; Korbinian Schneeberger; Anja Bohlen; Detlef Weigel; Gunnar Rätsch
Journal:  Genome Res       Date:  2008-03-06       Impact factor: 9.043

3.  WebGMAP: a web service for mapping and aligning cDNA sequences to genomes.

Authors:  Chun Liang; Lin Liu; Guoli Ji
Journal:  Nucleic Acids Res       Date:  2009-05-22       Impact factor: 16.971

4.  Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.

Authors:  Hiroaki Iwata; Osamu Gotoh
Journal:  Nucleic Acids Res       Date:  2012-07-30       Impact factor: 16.971

5.  A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence.

Authors:  Osamu Gotoh
Journal:  Nucleic Acids Res       Date:  2008-03-15       Impact factor: 16.971

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.