Literature DB >> 2734106

Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation.

S A Krawetz1.   

Abstract

The accuracy of nucleic acid sequence data interpretation was determined by assessing and quantifying the discrepancies reported in the GenBank database. This permitted the calculation of an Error Rate (ER) for nucleic acid sequence determination. If one assumes that most entries (TB, Total Bases) were independently verified or those without reported discrepancies were correct, the ER is 0.368 errors per 1000 bases. However, if one assumes that only those sequences with reported discrepancies (TBIQ, Total Bases from entries In Question) were verified and are thus correct, the ER is 2.887 errors per 1000 bases. This establishes the first set of limit boundaries of the ER for sequence interpretation and sequence errors within the GenBank database and provides the foundation for future assessments and the monitoring of sequence data accumulation. In addition, the ER measure provides a basis to evaluate the efficiency and merit of present and future automated nucleic acid sequencing technologies which will have a direct impact upon the final outcome of the "Human Genome Initiative".

Entities:  

Mesh:

Substances:

Year:  1989        PMID: 2734106      PMCID: PMC317871          DOI: 10.1093/nar/17.10.3951

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  8 in total

1.  The GenBank genetic sequence data bank.

Authors:  H S Bilofsky; C Burks
Journal:  Nucleic Acids Res       Date:  1988-03-11       Impact factor: 16.971

2.  The EMBL data library.

Authors:  G N Cameron
Journal:  Nucleic Acids Res       Date:  1988-03-11       Impact factor: 16.971

3.  Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase.

Authors:  K R Tindall; T A Kunkel
Journal:  Biochemistry       Date:  1988-08-09       Impact factor: 3.162

4.  A new method for sequencing DNA.

Authors:  A M Maxam; W Gilbert
Journal:  Proc Natl Acad Sci U S A       Date:  1977-02       Impact factor: 11.205

5.  Automatic reading of DNA sequencing gel autoradiographs using a large format digital scanner.

Authors:  J K Elder; D K Green; E M Southern
Journal:  Nucleic Acids Res       Date:  1986-01-10       Impact factor: 16.971

6.  Computer video acquisition and analysis system for biological data.

Authors:  T P Keenan; S A Krawetz
Journal:  Comput Appl Biosci       Date:  1988-03

7.  Fidelity of mammalian DNA polymerases.

Authors:  T A Kunkel; L A Loeb
Journal:  Science       Date:  1981-08-14       Impact factor: 47.728

8.  DNA sequencing with chain-terminating inhibitors.

Authors:  F Sanger; S Nicklen; A R Coulson
Journal:  Proc Natl Acad Sci U S A       Date:  1977-12       Impact factor: 11.205

  8 in total
  7 in total

1.  Assignment of position-specific error probability to primary DNA sequence data.

Authors:  C B Lawrence; V V Solovyev
Journal:  Nucleic Acids Res       Date:  1994-04-11       Impact factor: 16.971

2.  HOVERGEN: a database of homologous vertebrate genes.

Authors:  L Duret; D Mouchiroud; M Gouy
Journal:  Nucleic Acids Res       Date:  1994-06-25       Impact factor: 16.971

3.  A modular assembly cloning technique (aided by the BIOF software tool) for seamless and error-free assembly of long DNA fragments.

Authors:  Nadezhda A Orlova; Alexandre V Orlov; Ivan I Vorobiev
Journal:  BMC Res Notes       Date:  2012-06-18

4.  Error and error mitigation in low-coverage genome assemblies.

Authors:  Melissa J Hubisz; Michael F Lin; Manolis Kellis; Adam Siepel
Journal:  PLoS One       Date:  2011-02-14       Impact factor: 3.240

5.  Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes.

Authors:  Rolf S Kaas; Carsten Friis; David W Ussery; Frank M Aarestrup
Journal:  BMC Genomics       Date:  2012-10-31       Impact factor: 3.969

6.  Control control control: a reassessment and comparison of GenBank and chromatogram mtDNA sequence variation in Baltic grey seals (Halichoerus grypus).

Authors:  Katharina Fietz; Jeff A Graves; Morten Tange Olsen
Journal:  PLoS One       Date:  2013-08-16       Impact factor: 3.240

7.  Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR).

Authors:  James Robertson; Catherine Yoshida; Peter Kruczkiewicz; Celine Nadon; Anil Nichani; Eduardo N Taboada; John Howard Eagles Nash
Journal:  Microb Genom       Date:  2018-01-17
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.