Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

Literature DB >> 7620982

Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

Abstract

This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

Entities: Disease Gene Species

Mesh：

Year: 1995 PMID： 7620982 DOI： 10.1093/bioinformatics/11.2.117

Source DB: PubMed Journal: Comput Appl Biosci ISSN： 0266-7061

Keyword Cloud
Cited

3 in total

1. PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames.

Authors: E Birney; J D Thompson; T J Gibson
Journal: Nucleic Acids Res Date: 1996-07-15 Impact factor: 16.971

2. Expansion of 50 CAG/CTG repeats excluded in schizophrenia by application of a highly efficient approach using repeat expansion detection and a PCR screening set.

Authors: T Bowen; C Guy; G Speight; L Jones; A Cardno; K Murphy; P McGuffin; M J Owen; M C O'Donovan
Journal: Am J Hum Genet Date: 1996-10 Impact factor: 11.025

3. Detecting the molecular scars of evolution in the Mycobacterium tuberculosis complex by analyzing interrupted coding sequences.

Authors: Caroline Deshayes; Emmanuel Perrodou; Daniel Euphrasie; Eric Frapy; Olivier Poch; Pablo Bifani; Odile Lecompte; Jean-Marc Reyrat
Journal: BMC Evol Biol Date: 2008-03-06 Impact factor: 3.260

3 in total