Literature DB >> 1614861

Corruption of genomic databases with anomalous sequence.

E D Lamperti1, J M Kittelberger, T F Smith, L Villa-Komaroff.   

Abstract

We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.

Mesh:

Year:  1992        PMID: 1614861      PMCID: PMC336916          DOI: 10.1093/nar/20.11.2741

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  64 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  Database contamination.

Authors:  R Lopez; T Kristensen; H Prydz
Journal:  Nature       Date:  1992-01-16       Impact factor: 49.962

3.  Electronic data publishing and GenBank.

Authors:  M J Cinkosky; J W Fickett; P Gilna; C Burks
Journal:  Science       Date:  1991-05-31       Impact factor: 47.728

4.  Finding DNA sequencing errors.

Authors:  L Roberts
Journal:  Science       Date:  1991-05-31       Impact factor: 47.728

5.  Comparative statistics for DNA and protein sequences: single sequence analysis.

Authors:  S Karlin; G Ghandour
Journal:  Proc Natl Acad Sci U S A       Date:  1985-09       Impact factor: 11.205

6.  Tilapia prolactin: molecular cloning of two cDNAs and expression in Escherichia coli.

Authors:  F Rentier-Delrue; D Swennen; P Prunet; M Lion; J A Martial
Journal:  DNA       Date:  1989-05

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

8.  Nucleotide sequence of the rat guanidinoacetate methyltransferase gene.

Authors:  H Ogawa; M Fujioka
Journal:  Nucleic Acids Res       Date:  1988-09-12       Impact factor: 16.971

9.  Molecular sequence accuracy and the analysis of protein coding regions.

Authors:  D J States; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1991-07-01       Impact factor: 11.205

10.  Nucleotide sequence of the haptoglobin and haptoglobin-related gene pair. The haptoglobin-related gene contains a retrovirus-like element.

Authors:  N Maeda
Journal:  J Biol Chem       Date:  1985-06-10       Impact factor: 5.157

View more
  12 in total

1.  Bioinformatics and clinical informatics: the imperative to collaborate.

Authors:  I S Kohane
Journal:  J Am Med Inform Assoc       Date:  2000 Sep-Oct       Impact factor: 4.497

Review 2.  Interpreting cDNA sequences: some insights from studies on translation.

Authors:  M Kozak
Journal:  Mamm Genome       Date:  1996-08       Impact factor: 2.957

3.  The Genome Sequence DataBase (GSDB): improving data quality and data access.

Authors:  C Harger; M Skupski; J Bingham; A Farmer; S Hoisie; P Hraber; D Kiphart; L Krakowski; M McLeod; J Schwertfeger; G Seluja; A Siepel; G Singh; D Stamper; P Steadman; N Thayer; R Thompson; P Wargo; M Waugh; J J Zhuang; P A Schad
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

4.  Contamination of sequence databases with adaptor sequences.

Authors:  T Yoshikawa; A R Sanders; S D Detera-Wadleigh
Journal:  Am J Hum Genet       Date:  1997-02       Impact factor: 11.025

5.  VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Authors:  Alejandro A Schäffer; Eric P Nawrocki; Yoon Choi; Paul A Kitts; Ilene Karsch-Mizrachi; Richard McVeigh
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

6.  Atypical regions in large genomic DNA sequences.

Authors:  S Scherer; M S McPeek; T P Speed
Journal:  Proc Natl Acad Sci U S A       Date:  1994-07-19       Impact factor: 11.205

7.  Expressed sequence tags with cDNA termini: previously overlooked resources for gene annotation and transcriptome exploration in Chlamydomonas reinhardtii.

Authors:  Chun Liang; Yuansheng Liu; Lin Liu; Adam C Davis; Yingjia Shen; Qingshun Quinn Li
Journal:  Genetics       Date:  2008-05       Impact factor: 4.562

8.  Having a BLAST with bioinformatics (and avoiding BLASTphemy).

Authors:  A Pertsemlidis; J W Fondon
Journal:  Genome Biol       Date:  2001-09-27       Impact factor: 13.583

9.  Contamination of DNA database sequence entries with Escherichia coli insertion sequences.

Authors:  M Binns
Journal:  Nucleic Acids Res       Date:  1993-02-11       Impact factor: 16.971

10.  An optimized procedure greatly improves EST vector contamination removal.

Authors:  Yi-An Chen; Chang-Chun Lin; Chin-Di Wang; Huan-Bin Wu; Pei-Ing Hwang
Journal:  BMC Genomics       Date:  2007-11-13       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.