Literature DB >> 8790470

A quantitative comparison of DNA sequence assembly programs.

M J Miller1, J I Powell.   

Abstract

We have compared 11 sequence assembly programs for the accuracy and reproducibility with which they assemble DNA fragments into a completed sequence. To test the assemblers under controlled conditions, the rat multidrug resistance (RATMDRM) gene sequence was randomly divided into overlapping 200- to 400-base fragments. Various degrees of error, in the form of miss-identified bases, missed bases, and duplicated bases, were randomly added to these fragments. The probability of an error, and the type of error, was modified using an error distribution template that was developed by comparing the original fragments used to sequence RATMDRM with the final, edited sequence stored in GenBank. From 0 to 15% error was then added to independent sets of fragments, and assemblage was attempted. The quality of the assemblages was evaluated by comparing the number of differences between the assembled sequence and the original sequence. Tests were also done to determine if the order in which fragments were added to a project affected the final sequence and if the quality of assemblage was sequence dependent. Similar results were also obtained using other, unrelated sequences. The programs could be roughly divided into three groups based on the accuracy and reproducibility of assembly. Three (GCG, FAB, and AutoAssembler) consistently produced consensus sequences of low error and high reproducibility. Intermediate results were obtained with five other programs (Sequencher, AssemblyLIGN, XBAP, SeqMan, and AutoAssembler in a mode that made use of an external special processor). Less satisfactory results were obtained with the remaining three programs (GeneWorks, GENeration, and PC/Gene). The ability of the programs to edit the assembled sequence was also compared. Five of the programs were able to display and edit automatic sequencer trace files. The Sequencher program had a particularly well-designed sequence editor that allowed rapid examination and correction of assembly errors.

Entities:  

Mesh:

Substances:

Year:  1994        PMID: 8790470     DOI: 10.1089/cmb.1994.1.257

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  An optimized protocol for analysis of EST sequences.

Authors:  F Liang; I Holt; G Pertea; S Karamycheva; S L Salzberg; J Quackenbush
Journal:  Nucleic Acids Res       Date:  2000-09-15       Impact factor: 16.971

Review 2.  Review of general algorithmic features for genome assemblers for next generation sequencers.

Authors:  Bilal Wajid; Erchin Serpedin
Journal:  Genomics Proteomics Bioinformatics       Date:  2012-06-09       Impact factor: 7.691

3.  Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects.

Authors:  Rhys A Farrer; Daniel A Henk; Dan MacLean; David J Studholme; Matthew C Fisher
Journal:  Sci Rep       Date:  2013       Impact factor: 4.379

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.