Literature DB >> 18202027

Figaro: a novel statistical method for vector sequence removal.

James Robert White1, Michael Roberts, James A Yorke, Mihai Pop.   

Abstract

MOTIVATION: Sequences produced by automated Sanger sequencing machines frequently contain fragments of the cloning vector on their ends. Software tools currently available for identifying and removing the vector sequence require knowledge of the vector sequence, specific splice sites and any adapter sequences used in the experiment-information often omitted from public databases. Furthermore, the clipping coordinates themselves are missing or incorrectly reported. As an example, within the approximately 1.24 billion shotgun sequences deposited in the NCBI Trace Archive, as many as approximately 735 million (approximately 60%) lack vector clipping information. Correct clipping information is essential to scientists attempting to validate, improve and even finish the increasingly large number of genomes released at a 'draft' quality level.
RESULTS: We present here Figaro, a novel software tool for identifying and removing the vector from raw sequence data without prior knowledge of the vector sequence. The vector sequence is automatically inferred by analyzing the frequency of occurrence of short oligo-nucleotides using Poisson statistics. We show that Figaro achieves 99.98% sensitivity when tested on approximately 1.5 million shotgun reads from Drosophila pseudoobscura. We further explore the impact of accurate vector trimming on the quality of whole-genome assemblies by re-assembling two bacterial genomes from shotgun sequences deposited in the Trace Archive. Designed as a module in large computational pipelines, Figaro is fast, lightweight and flexible. AVAILABILITY: Figaro is released under an open-source license through the AMOS package (http://amos.sourceforge.net/Figaro).

Entities:  

Mesh:

Year:  2008        PMID: 18202027      PMCID: PMC2725436          DOI: 10.1093/bioinformatics/btm632

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

2.  Fast algorithms for large-scale genome alignment and comparison.

Authors:  Arthur L Delcher; Adam Phillippy; Jane Carlton; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

3.  Complete genome sequence of the Q-fever pathogen Coxiella burnetii.

Authors:  Rekha Seshadri; Ian T Paulsen; Jonathan A Eisen; Timothy D Read; Karen E Nelson; William C Nelson; Naomi L Ward; Hervé Tettelin; Tanja M Davidsen; Maureen J Beanan; Robert T Deboy; Sean C Daugherty; Lauren M Brinkac; Ramana Madupu; Robert J Dodson; Hoda M Khouri; Kathy H Lee; Heather A Carty; David Scanlan; Robert A Heinzen; Herbert A Thompson; James E Samuel; Claire M Fraser; John F Heidelberg
Journal:  Proc Natl Acad Sci U S A       Date:  2003-04-18       Impact factor: 11.205

4.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

5.  DNA sequence quality trimming and vector removal.

Authors:  H H Chou; M H Holmes
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

6.  Genome sequence of Chlamydophila caviae (Chlamydia psittaci GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae.

Authors:  T D Read; G S A Myers; R C Brunham; W C Nelson; I T Paulsen; J Heidelberg; E Holtzapple; H Khouri; N B Federova; H A Carty; L A Umayam; D H Haft; J Peterson; M J Beanan; O White; S L Salzberg; R-c Hsia; G McClarty; R G Rank; P M Bavoil; C M Fraser
Journal:  Nucleic Acids Res       Date:  2003-04-15       Impact factor: 16.971

Review 7.  The maize genome as a model for efficient sequence analysis of large plant genomes.

Authors:  Pablo D Rabinowicz; Jeffrey L Bennetzen
Journal:  Curr Opin Plant Biol       Date:  2006-02-03       Impact factor: 7.834

8.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution.

Authors:  Stephen Richards; Yue Liu; Brian R Bettencourt; Pavel Hradecky; Stan Letovsky; Rasmus Nielsen; Kevin Thornton; Melissa J Hubisz; Rui Chen; Richard P Meisel; Olivier Couronne; Sujun Hua; Mark A Smith; Peili Zhang; Jing Liu; Harmen J Bussemaker; Marinus F van Batenburg; Sally L Howells; Steven E Scherer; Erica Sodergren; Beverly B Matthews; Madeline A Crosby; Andrew J Schroeder; Daniel Ortiz-Barrientos; Catharine M Rives; Michael L Metzker; Donna M Muzny; Graham Scott; David Steffen; David A Wheeler; Kim C Worley; Paul Havlak; K James Durbin; Amy Egan; Rachel Gill; Jennifer Hume; Margaret B Morgan; George Miner; Cerissa Hamilton; Yanmei Huang; Lenée Waldron; Daniel Verduzco; Kerstin P Clerc-Blankenburg; Inna Dubchak; Mohamed A F Noor; Wyatt Anderson; Kevin P White; Andrew G Clark; Stephen W Schaeffer; William Gelbart; George M Weinstock; Richard A Gibbs
Journal:  Genome Res       Date:  2005-01       Impact factor: 9.043

9.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

10.  DNA sequencing with chain-terminating inhibitors.

Authors:  F Sanger; S Nicklen; A R Coulson
Journal:  Proc Natl Acad Sci U S A       Date:  1977-12       Impact factor: 11.205

View more
  21 in total

1.  A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.

Authors:  Yu-Wei Wu; Yuzhen Ye
Journal:  J Comput Biol       Date:  2011-03       Impact factor: 1.479

2.  VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

Authors:  Alejandro A Schäffer; Eric P Nawrocki; Yoon Choi; Paul A Kitts; Ilene Karsch-Mizrachi; Richard McVeigh
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

3.  Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models.

Authors:  Nadine Borchert; Christoph Dieterich; Karsten Krug; Wolfgang Schütz; Stephan Jung; Alfred Nordheim; Ralf J Sommer; Boris Macek
Journal:  Genome Res       Date:  2010-03-17       Impact factor: 9.043

4.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read.

Authors:  Juan Falgueras; Antonio J Lara; Noé Fernández-Pozo; Francisco R Cantón; Guillermo Pérez-Trabado; M Gonzalo Claros
Journal:  BMC Bioinformatics       Date:  2010-01-20       Impact factor: 3.169

5.  TagDust--a program to eliminate artifacts from next generation sequencing data.

Authors:  Timo Lassmann; Yoshihide Hayashizaki; Carsten O Daub
Journal:  Bioinformatics       Date:  2009-09-07       Impact factor: 6.937

6.  Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.

Authors:  Arthur Brady; Steven L Salzberg
Journal:  Nat Methods       Date:  2009-08-02       Impact factor: 28.547

7.  A Microbial Metagenome (Leucobacter sp.) in Caenorhabditis Whole Genome Sequences.

Authors:  Riccardo Percudani
Journal:  Bioinform Biol Insights       Date:  2013-02-24

8.  ESTclean: a cleaning tool for next-gen transcriptome shotgun sequencing.

Authors:  Hongseok Tae; Dongsung Ryu; Suhas Sureshchandra; Jeong-Hyeon Choi
Journal:  BMC Bioinformatics       Date:  2012-09-26       Impact factor: 3.169

9.  Filtering duplicate reads from 454 pyrosequencing data.

Authors:  Susanne Balzer; Ketil Malde; Markus A Grohme; Inge Jonassen
Journal:  Bioinformatics       Date:  2013-02-01       Impact factor: 6.937

10.  Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes.

Authors:  David Williams; William L Trimble; Meghan Shilts; Folker Meyer; Howard Ochman
Journal:  BMC Genomics       Date:  2013-08-08       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.