Literature DB >> 8367301

A quality control algorithm for DNA sequencing projects.

O White1, T Dunning, G Sutton, M Adams, J C Venter, C Fields.   

Abstract

Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.

Entities:  

Mesh:

Substances:

Year:  1993        PMID: 8367301      PMCID: PMC309901          DOI: 10.1093/nar/21.16.3829

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  21 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  Complementary DNA sequencing: expressed sequence tags and human genome project.

Authors:  M D Adams; J M Kelley; J D Gocayne; M Dubnick; M H Polymeropoulos; H Xiao; C R Merril; A Wu; B Olde; R F Moreno
Journal:  Science       Date:  1991-06-21       Impact factor: 47.728

3.  Sequence of an unusually large protein implicated in regulation of myosin activity in C. elegans.

Authors:  G M Benian; J E Kiff; N Neckelmann; D G Moerman; R H Waterston
Journal:  Nature       Date:  1989-11-02       Impact factor: 49.962

4.  K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping.

Authors:  J M Claverie; I Sauvaget; L Bougueleret
Journal:  Methods Enzymol       Date:  1990       Impact factor: 1.600

5.  Compositional variations in DNA sequences.

Authors:  R Nussinov
Journal:  Comput Appl Biosci       Date:  1991-07

Review 6.  Bacterial evolution.

Authors:  C R Woese
Journal:  Microbiol Rev       Date:  1987-06

7.  Linguistic measure of taxonomic and functional relatedness of nucleotide sequences.

Authors:  S Pietrokovski; J Hirshon; E N Trifonov
Journal:  J Biomol Struct Dyn       Date:  1990-06

8.  Isolation of a large number of novel mammalian genes by a differential cDNA library screening strategy.

Authors:  C Höög
Journal:  Nucleic Acids Res       Date:  1991-11-25       Impact factor: 16.971

9.  A survey of expressed genes in Caenorhabditis elegans.

Authors:  R Waterston; C Martin; M Craxton; C Huynh; A Coulson; L Hillier; R Durbin; P Green; R Shownkeen; N Halloran
Journal:  Nat Genet       Date:  1992-05       Impact factor: 38.330

10.  An analysis of the origin of metazoans, using comparisons of partial sequences of the 28S RNA, reveals an early emergence of triploblasts.

Authors:  R Christen; A Ratto; A Baroin; R Perasso; K G Grell; A Adoutte
Journal:  EMBO J       Date:  1991-03       Impact factor: 11.598

View more
  7 in total

1.  Detecting and analyzing DNA sequencing errors: toward a higher quality of the Bacillus subtilis genome sequence.

Authors:  C Médigue; M Rose; A Viari; A Danchin
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

Review 2.  Comparative analysis of environmental sequences: potential and challenges.

Authors:  Konrad U Foerstner; Christian von Mering; Peer Bork
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2006-03-29       Impact factor: 6.237

3.  VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Authors:  Alejandro A Schäffer; Eric P Nawrocki; Yoon Choi; Paul A Kitts; Ilene Karsch-Mizrachi; Richard McVeigh
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

4.  Contamination of cDNA- libraries and expressed-sequence-tags databases.

Authors:  M Dean; R Allikmets
Journal:  Am J Hum Genet       Date:  1995-11       Impact factor: 11.025

5.  Classifying coding DNA with nucleotide statistics.

Authors:  Nicolas Carels; Diego Frías
Journal:  Bioinform Biol Insights       Date:  2009-10-28

6.  On the species of origin: diagnosing the source of symbiotic transcripts.

Authors:  P T Hraber; J W Weller
Journal:  Genome Biol       Date:  2001-08-23       Impact factor: 13.583

7.  Mobilomics in Saccharomyces cerevisiae strains.

Authors:  Giulia Menconi; Giovanni Battaglia; Roberto Grossi; Nadia Pisanti; Roberto Marangoni
Journal:  BMC Bioinformatics       Date:  2013-03-20       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.