Literature DB >> 10089195

Establishing a method of vector contamination identification in database sequences.

G A Seluja1, A Farmer, M McLeod, C Harger, P A Schad.   

Abstract

MOTIVATION: The nucleotide sequence databases are invaluable tools both for the private and the academic research communities, from the retrieval of sequences to homology searching. Several issues related to data quality, such as the existence of sequencing artifacts and errors, are facing the databases. We investigated a major source of these errors, i.e. the presence of vector-contaminated sequences.
RESULTS: Using a panel of 180 vector polylinker sequences, we found 0.36% or 3029 vector-matching sequences in GenBank Release 95-96, with an average vector-matching length of 72 nucleotides. The number of vector-contaminated sequences has been growing with the database; however, the percent contamination has remained approximately constant at an average of 0.28% from 1982 to 1996. AVAILABILITY: Access to the database of vector polylinker sequences via sequence similarity searching is available at http://seqsim.ncgr.org/vector/ CONTACT: gas@molinfo.com

Mesh:

Substances:

Year:  1999        PMID: 10089195     DOI: 10.1093/bioinformatics/15.2.106

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Bioinformatics and clinical informatics: the imperative to collaborate.

Authors:  I S Kohane
Journal:  J Am Med Inform Assoc       Date:  2000 Sep-Oct       Impact factor: 4.497

2.  VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Authors:  Alejandro A Schäffer; Eric P Nawrocki; Yoon Choi; Paul A Kitts; Ilene Karsch-Mizrachi; Richard McVeigh
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

3.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.

Authors:  Juan Falgueras; Antonio J Lara; Noé Fernández-Pozo; Francisco R Cantón; Guillermo Pérez-Trabado; M Gonzalo Claros
Journal:  BMC Bioinformatics       Date:  2010-01-20       Impact factor: 3.169

4.  Expressed sequence tags with cDNA termini: previously overlooked resources for gene annotation and transcriptome exploration in Chlamydomonas reinhardtii.

Authors:  Chun Liang; Yuansheng Liu; Lin Liu; Adam C Davis; Yingjia Shen; Qingshun Quinn Li
Journal:  Genetics       Date:  2008-05       Impact factor: 4.562

5.  CleanEST: a database of cleansed EST libraries.

Authors:  Byungwook Lee; Gwangsik Shin
Journal:  Nucleic Acids Res       Date:  2008-10-02       Impact factor: 16.971

6.  An optimized procedure greatly improves EST vector contamination removal.

Authors:  Yi-An Chen; Chang-Chun Lin; Chin-Di Wang; Huan-Bin Wu; Pei-Ing Hwang
Journal:  BMC Genomics       Date:  2007-11-13       Impact factor: 3.969

Review 7.  Microarray probes and probe sets.

Authors:  Hongfang Liu; Ionut Bebu; Xin Li
Journal:  Front Biosci (Elite Ed)       Date:  2010-01-01
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.