Literature DB >> 8670617

Alignments of DNA and protein sequences containing frameshift errors.

X Guan1, E C Uberbacher.   

Abstract

Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six-frame translation can miss important homologies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.

Mesh:

Substances:

Year:  1996        PMID: 8670617     DOI: 10.1093/bioinformatics/12.1.31

Source DB:  PubMed          Journal:  Comput Appl Biosci        ISSN: 0266-7061


  13 in total

1.  Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.

Authors:  C Médigue; M Rose; A Viari; A Danchin
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

2.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

Authors:  E Birney; J D Thompson; T J Gibson
Journal:  Nucleic Acids Res       Date:  1996-07-15       Impact factor: 16.971

3.  Frameshift alignment: statistics and post-genomic applications.

Authors:  Sergey L Sheetlin; Yonil Park; Martin C Frith; John L Spouge
Journal:  Bioinformatics       Date:  2014-08-28       Impact factor: 6.937

Review 4.  Functional assignment of metagenomic data: challenges and applications.

Authors:  Tulika Prakash; Todd D Taylor
Journal:  Brief Bioinform       Date:  2012-07-06       Impact factor: 11.622

5.  BioJava: an open-source framework for bioinformatics in 2012.

Authors:  Andreas Prlić; Andrew Yates; Spencer E Bliven; Peter W Rose; Julius Jacobsen; Peter V Troshin; Mark Chapman; Jianjiong Gao; Chuan Hock Koh; Sylvain Foisy; Richard Holland; Gediminas Rimsa; Michael L Heuer; H Brandstätter-Müller; Philip E Bourne; Scooter Willis
Journal:  Bioinformatics       Date:  2012-08-09       Impact factor: 6.937

6.  HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.

Authors:  Yuan Zhang; Yanni Sun
Journal:  BMC Bioinformatics       Date:  2011-05-24       Impact factor: 3.169

7.  MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.

Authors:  Vincent Ranwez; Sébastien Harispe; Frédéric Delsuc; Emmanuel J P Douzery
Journal:  PLoS One       Date:  2011-09-16       Impact factor: 3.240

8.  ICDS database: interrupted CoDing sequences in prokaryotic genomes.

Authors:  Emmanuel Perrodou; Caroline Deshayes; Jean Muller; Christine Schaeffer; Alain Van Dorsselaer; Raymond Ripp; Olivier Poch; Jean-Marc Reyrat; Odile Lecompte
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST.

Authors:  E Michael Gertz; Yi-Kuo Yu; Richa Agarwala; Alejandro A Schäffer; Stephen F Altschul
Journal:  BMC Biol       Date:  2006-12-07       Impact factor: 7.431

10.  Detecting the molecular scars of evolution in the Mycobacterium tuberculosis complex by analyzing interrupted coding sequences.

Authors:  Caroline Deshayes; Emmanuel Perrodou; Daniel Euphrasie; Eric Frapy; Olivier Poch; Pablo Bifani; Odile Lecompte; Jean-Marc Reyrat
Journal:  BMC Evol Biol       Date:  2008-03-06       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.