Literature DB >> 17277413

Correcting base-assignment errors in repeat regions of shotgun assembly.

Degui Zhi1, Uri Keich, Pavel Pevzner, Steffen Heber, Haixu Tang.   

Abstract

Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project.

Mesh:

Year:  2007        PMID: 17277413     DOI: 10.1109/TCBB.2007.1005

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  Genome assembly reborn: recent computational challenges.

Authors:  Mihai Pop
Journal:  Brief Bioinform       Date:  2009-05-29       Impact factor: 11.622

2.  SEQuel: improving the accuracy of genome assemblies.

Authors:  Roy Ronen; Christina Boucher; Hamidreza Chitsaz; Pavel Pevzner
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

3.  Repeat-aware modeling and correction of short read errors.

Authors:  Xiao Yang; Srinivas Aluru; Karin S Dorman
Journal:  BMC Bioinformatics       Date:  2011-02-15       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.