BACKGROUND: The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently. RESULTS: Here we present a semi-automated platform 'est2assembly' that processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used est2assembly to process Drosophila and Bicyclus public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections. CONCLUSIONS: Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. est2assembly is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.
BACKGROUND: The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently. RESULTS: Here we present a semi-automated platform 'est2assembly' that processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used est2assembly to process Drosophila and Bicyclus public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections. CONCLUSIONS: Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. est2assembly is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.
Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock Journal: Nat Genet Date: 2000-05 Impact factor: 38.330
Authors: P M Brakefield; J Gates; D Keys; F Kesbeke; P J Wijngaarden; A Monteiro; V French; S B Carroll Journal: Nature Date: 1996-11-21 Impact factor: 49.962
Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis Journal: Genome Res Date: 2002-10 Impact factor: 9.043
Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney Journal: Genome Res Date: 2002-10 Impact factor: 9.043
Authors: Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev Journal: Nat Protoc Date: 2013-07-11 Impact factor: 13.491
Authors: Heather M Hines; Riccardo Papa; Mayte Ruiz; Alexie Papanicolaou; Charles Wang; H Frederik Nijhout; W Owen McMillan; Robert D Reed Journal: BMC Genomics Date: 2012-06-29 Impact factor: 3.969
Authors: Patricia M Guimarães; Ana C M Brasileiro; Carolina V Morgante; Andressa C Q Martins; Georgios Pappas; Orzenil B Silva; Roberto Togawa; Soraya C M Leal-Bertioli; Ana C G Araujo; Marcio C Moretzsohn; David J Bertioli Journal: BMC Genomics Date: 2012-08-13 Impact factor: 3.969
Authors: Ana Riesgo; Sónia C S Andrade; Prashant P Sharma; Marta Novo; Alicia R Pérez-Porro; Varpu Vahtera; Vanessa L González; Gisele Y Kawauchi; Gonzalo Giribet Journal: Front Zool Date: 2012-11-29 Impact factor: 3.172
Authors: Melanie R Smee; Yannick Pauchet; Paul Wilkinson; Brian Wee; Michael C Singer; Richard H ffrench-Constant; David J Hodgson; Alexander S Mikheyev Journal: PLoS One Date: 2013-01-21 Impact factor: 3.240
Authors: Marco A N Passos; Viviane Oliveira de Cruz; Flavia L Emediato; Cristiane Camargo de Teixeira; Vânia C Rennó Azevedo; Ana C M Brasileiro; Edson P Amorim; Claudia F Ferreira; Natalia F Martins; Roberto C Togawa; Georgios J Pappas Júnior; Orzenil Bonfim da Silva; Robert N G Miller Journal: BMC Genomics Date: 2013-02-05 Impact factor: 3.969