Literature DB >> 11934736

Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.

Martti T Tammi1, Erik Arner, Tom Britton, Björn Andersson.   

Abstract

An increasingly important problem in genome sequencing is the failure of the commonly used shotgun assembly programs to correctly assemble repetitive sequences. The assembly of non-repetitive regions or regions containing repeats considerably shorter than the average read length is in practice easy to solve, while longer repeats have been a difficult problem. We here present a statistical method to separate arbitrarily long, almost identical repeats, which makes it possible to correctly assemble complex repetitive sequence regions. The differences between repeat units may be as low as 1% and the sequencing error may be up to ten times higher. The method is based on the realization that a comparison of only a part of all overlapping sequences at a time in a data set does not generate enough information for a conclusive analysis. Our method uses optimal multi-alignments consisting of all the overlaps of each read. This makes it possible to determine defined nucleotide positions, DNPs, which constitute the differences between the repeat units. Differences between repeats are distinguished from sequencing errors using statistical methods, where the probabilities of obtaining certain combinations of candidate DNPs are calculated using the information from the multi-alignments. The use of DNPs and combinations of DNPs will allow for optimal and rapid assemblies of repeated regions. This method can solve repeats that differ in only two positions in a read length, which is the theoretical limit for repeat separation. We predict that this method will be highly useful in shotgun sequencing in the future.

Mesh:

Substances:

Year:  2002        PMID: 11934736     DOI: 10.1093/bioinformatics/18.3.379

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  Correcting errors in shotgun sequences.

Authors:  Martti T Tammi; Erik Arner; Ellen Kindlund; Björn Andersson
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

2.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.

Authors:  Bastien Chevreux; Thomas Pfisterer; Bernd Drescher; Albert J Driesel; Werner E G Müller; Thomas Wetter; Sándor Suhai
Journal:  Genome Res       Date:  2004-05-12       Impact factor: 9.043

3.  Genome assembly reborn: recent computational challenges.

Authors:  Mihai Pop
Journal:  Brief Bioinform       Date:  2009-05-29       Impact factor: 11.622

4.  Deep repeat resolution-the assembly of the Drosophila Histone Complex.

Authors:  Philipp Bongartz; Siegfried Schloissnig
Journal:  Nucleic Acids Res       Date:  2019-02-20       Impact factor: 16.971

5.  SEQuel: improving the accuracy of genome assemblies.

Authors:  Roy Ronen; Christina Boucher; Hamidreza Chitsaz; Pavel Pevzner
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

6.  SeqEntropy: genome-wide assessment of repeats for short read sequencing.

Authors:  Hsueh-Ting Chu; William W L Hsiao; Theresa T H Tsao; D Frank Hsu; Chaur-Chin Chen; Sheng-An Lee; Cheng-Yan Kao
Journal:  PLoS One       Date:  2013-03-27       Impact factor: 3.240

7.  Shotgun haplotyping: a novel method for surveying allelic sequence variation.

Authors:  Sarah J Lindsay; James K Bonfield; Matthew E Hurles
Journal:  Nucleic Acids Res       Date:  2005-10-12       Impact factor: 16.971

8.  Database of Trypanosoma cruzi repeated genes: 20,000 additional gene variants.

Authors:  Erik Arner; Ellen Kindlund; Daniel Nilsson; Fatima Farzana; Marcela Ferella; Martti T Tammi; Björn Andersson
Journal:  BMC Genomics       Date:  2007-10-26       Impact factor: 3.969

9.  DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions.

Authors:  Erik Arner; Martti T Tammi; Anh-Nhi Tran; Ellen Kindlund; Bjorn Andersson
Journal:  BMC Bioinformatics       Date:  2006-03-20       Impact factor: 3.169

10.  Genome assembly forensics: finding the elusive mis-assembly.

Authors:  Adam M Phillippy; Michael C Schatz; Mihai Pop
Journal:  Genome Biol       Date:  2008-03-14       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.