Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Literature DB >> 29069347

VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Alejandro A Schäffer¹, Eric P Nawrocki¹, Yoon Choi¹, Paul A Kitts¹, Ilene Karsch-Mizrachi¹, Richard McVeigh¹.

Abstract

Motivation: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches.
Results: A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Availability and implementation: Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. Contact: aschaffe@helix.nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

Entities: Species

Mesh：

Year: 2018 PMID： 29069347 PMCID： PMC6030928 DOI： 10.1093/bioinformatics/btx669

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

21 in total

1. A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases.

Authors: C Miller; J Gurd; A Brass
Journal: Bioinformatics Date: 1999-02 Impact factor: 6.937

2. Identifying adaptor contamination when mining DNA sequence data.

Authors: Jeffrey Scott Coker; Eric Davies
Journal: Biotechniques Date: 2004-08 Impact factor: 1.993

3. Corruption of genomic databases with anomalous sequence.

Authors: E D Lamperti; J M Kittelberger; T F Smith; L Villa-Komaroff
Journal: Nucleic Acids Res Date: 1992-06-11 Impact factor: 16.971

4. Figaro: a novel statistical method for vector sequence removal.

Authors: James Robert White; Michael Roberts; James A Yorke; Mihai Pop
Journal: Bioinformatics Date: 2008-01-17 Impact factor: 6.937

5. AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads.

Authors: Alexis Criscuolo; Sylvain Brisse
Journal: Genomics Date: 2013-08-01 Impact factor: 5.736

6. Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination.

Authors: Junho Kim; Ju Heon Maeng; Jae Seok Lim; Hyeonju Son; Junehawk Lee; Jeong Ho Lee; Sangwoo Kim
Journal: Bioinformatics Date: 2016-06-22 Impact factor: 6.937

7. Contamination of cDNA sequences in databases.

Authors: C Savakis; R Doelz
Journal: Science Date: 1993-03-19 Impact factor: 47.728

8. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets.

Authors: Robert Schmieder; Yan Wei Lim; Forest Rohwer; Robert Edwards
Journal: BMC Bioinformatics Date: 2010-06-23 Impact factor: 3.169

Review 9. A broad spectrum PCR method for the detection of polyomaviruses and avoidance of contamination by cloning vectors.

Authors: C Völter; H zur Hausen; D Alber; E M de Villiers
Journal: Dev Biol Stand Date: 1998

10. Fast identification and removal of sequence contamination from genomic and metagenomic datasets.

Authors: Robert Schmieder; Robert Edwards
Journal: PLoS One Date: 2011-03-09 Impact factor: 3.240

8 in total

1. A Comprehensive Guide to Potato Transcriptome Assembly.

Authors: Maja Zagorščak; Marko Petek
Journal: Methods Mol Biol Date: 2021

2. Cultivar-specific transcriptome and pan-transcriptome reconstruction of tetraploid potato.

Authors: Marko Petek; Maja Zagorščak; Živa Ramšak; Sheri Sanders; Špela Tomaž; Elizabeth Tseng; Mohamed Zouine; Anna Coll; Kristina Gruden
Journal: Sci Data Date: 2020-07-24 Impact factor: 6.444

3. Viruses in unexplained encephalitis cases in American black bears (Ursus americanus).

Authors: Charles E Alex; Elizabeth Fahsbender; Eda Altan; Robert Bildfell; Peregrine Wolff; Ling Jin; Wendy Black; Kenneth Jackson; Leslie Woods; Brandon Munk; Tiffany Tse; Eric Delwart; Patricia A Pesavento
Journal: PLoS One Date: 2020-12-17 Impact factor: 3.240

4. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly.

Authors: Sheina B Sim; Renee L Corpuz; Tyler J Simmonds; Scott M Geib
Journal: BMC Genomics Date: 2022-02-22 Impact factor: 3.969

5. Identification of Antibiotic Resistance Proteins via MiCId's Augmented Workflow. A Mass Spectrometry-Based Proteomics Approach.

Authors: Gelio Alves; Aleksey Ogurtsov; Roger Karlsson; Daniel Jaén-Luchoro; Beatriz Piñeiro-Iglesias; Francisco Salvà-Serra; Björn Andersson; Edward R B Moore; Yi-Kuo Yu
Journal: J Am Soc Mass Spectrom Date: 2022-05-02 Impact factor: 3.262