Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Establishing a method of vector contamination identification in database sequences.

Literature DB >> 10089195

Establishing a method of vector contamination identification in database sequences.

G A Seluja¹, A Farmer, M McLeod, C Harger, P A Schad.

Abstract

MOTIVATION: The nucleotide sequence databases are invaluable tools both for the private and the academic research communities, from the retrieval of sequences to homology searching. Several issues related to data quality, such as the existence of sequencing artifacts and errors, are facing the databases. We investigated a major source of these errors, i.e. the presence of vector-contaminated sequences.
RESULTS: Using a panel of 180 vector polylinker sequences, we found 0.36% or 3029 vector-matching sequences in GenBank Release 95-96, with an average vector-matching length of 72 nucleotides. The number of vector-contaminated sequences has been growing with the database; however, the percent contamination has remained approximately constant at an average of 0.28% from 1982 to 1996. AVAILABILITY: Access to the database of vector polylinker sequences via sequence similarity searching is available at http://seqsim.ncgr.org/vector/ CONTACT: gas@molinfo.com

Mesh：

Substances：
DNA

Year: 1999 PMID： 10089195 DOI： 10.1093/bioinformatics/15.2.106

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

7 in total

1. Bioinformatics and clinical informatics: the imperative to collaborate.

Authors: I S Kohane
Journal: J Am Med Inform Assoc Date: 2000 Sep-Oct Impact factor: 4.497

2. VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Authors: Alejandro A Schäffer; Eric P Nawrocki; Yoon Choi; Paul A Kitts; Ilene Karsch-Mizrachi; Richard McVeigh
Journal: Bioinformatics Date: 2018-03-01 Impact factor: 6.937

3. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.

Authors: Juan Falgueras; Antonio J Lara; Noé Fernández-Pozo; Francisco R Cantón; Guillermo Pérez-Trabado; M Gonzalo Claros
Journal: BMC Bioinformatics Date: 2010-01-20 Impact factor: 3.169

4. Expressed sequence tags with cDNA termini: previously overlooked resources for gene annotation and transcriptome exploration in Chlamydomonas reinhardtii.

Authors: Chun Liang; Yuansheng Liu; Lin Liu; Adam C Davis; Yingjia Shen; Qingshun Quinn Li
Journal: Genetics Date: 2008-05 Impact factor: 4.562

5. CleanEST: a database of cleansed EST libraries.

Authors: Byungwook Lee; Gwangsik Shin
Journal: Nucleic Acids Res Date: 2008-10-02 Impact factor: 16.971

6. An optimized procedure greatly improves EST vector contamination removal.

Authors: Yi-An Chen; Chang-Chun Lin; Chin-Di Wang; Huan-Bin Wu; Pei-Ing Hwang
Journal: BMC Genomics Date: 2007-11-13 Impact factor: 3.969

Review 7. Microarray probes and probe sets.

Authors: Hongfang Liu; Ionut Bebu; Xin Li
Journal: Front Biosci (Elite Ed) Date: 2010-01-01

7 in total