BACKGROUND: Next generation sequencing (NGS) is widely used in metagenomic and transcriptomic analyses in biodiversity. The ease of data generation provided by NGS platforms has allowed researchers to perform these analyses on their particular study systems. In particular the 454 platform has become the preferred choice for PCR amplicon based biodiversity surveys because it generates the longest sequence reads. Nevertheless, the handling and organization of massive amounts of sequencing data poses a major problem for the research community, particularly when multiple researchers are involved in data acquisition and analysis. An integrated and user-friendly tool, which performs quality control, read trimming, PCR primer removal, and data organization is desperately needed, therefore, to make data interpretation fast and manageable. FINDINGS: We developed CANGS DB (Cleaning and Analyzing Next Generation Sequences DataBase) a flexible, stand alone and user-friendly integrated database tool. CANGS DB is specifically designed to organize and manage the massive amount of sequencing data arising from various NGS projects. CANGS DB also provides an intuitive user interface for sequence trimming and quality control, taxonomy analysis and rarefaction analysis. Our database tool can be easily adapted to handle multiple sequencing projects in parallel with different sample information, amplicon sizes, primer sequences, and quality thresholds, which makes this software especially useful for non-bioinformaticians. Furthermore, CANGS DB is especially suited for projects where multiple users need to access the data. CANGS DB is available at http://code.google.com/p/cangsdb/. CONCLUSION: CANGS DB provides a simple and user-friendly solution to process, store and analyze 454 sequencing data. Being a local database that is accessible through a user-friendly interface, CANGS DB provides the perfect tool for collaborative amplicon based biodiversity surveys without requiring prior bioinformatics skills.
BACKGROUND: Next generation sequencing (NGS) is widely used in metagenomic and transcriptomic analyses in biodiversity. The ease of data generation provided by NGS platforms has allowed researchers to perform these analyses on their particular study systems. In particular the 454 platform has become the preferred choice for PCR amplicon based biodiversity surveys because it generates the longest sequence reads. Nevertheless, the handling and organization of massive amounts of sequencing data poses a major problem for the research community, particularly when multiple researchers are involved in data acquisition and analysis. An integrated and user-friendly tool, which performs quality control, read trimming, PCR primer removal, and data organization is desperately needed, therefore, to make data interpretation fast and manageable. FINDINGS: We developed CANGS DB (Cleaning and Analyzing Next Generation Sequences DataBase) a flexible, stand alone and user-friendly integrated database tool. CANGS DB is specifically designed to organize and manage the massive amount of sequencing data arising from various NGS projects. CANGS DB also provides an intuitive user interface for sequence trimming and quality control, taxonomy analysis and rarefaction analysis. Our database tool can be easily adapted to handle multiple sequencing projects in parallel with different sample information, amplicon sizes, primer sequences, and quality thresholds, which makes this software especially useful for non-bioinformaticians. Furthermore, CANGS DB is especially suited for projects where multiple users need to access the data. CANGS DB is available at http://code.google.com/p/cangsdb/. CONCLUSION: CANGS DB provides a simple and user-friendly solution to process, store and analyze 454 sequencing data. Being a local database that is accessible through a user-friendly interface, CANGS DB provides the perfect tool for collaborative amplicon based biodiversity surveys without requiring prior bioinformatics skills.
Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney Journal: Genome Res Date: 2002-10 Impact factor: 9.043
Authors: Roman K Thomas; Elizabeth Nickerson; Jan F Simons; Pasi A Jänne; Torstein Tengs; Yuki Yuza; Levi A Garraway; Thomas LaFramboise; Jeffrey C Lee; Kinjal Shah; Keith O'Neill; Hidefumi Sasaki; Neal Lindeman; Kwok-Kin Wong; Ana M Borras; Edward J Gutmann; Konstantin H Dragnev; Ralph DeBiasi; Tzu-Hsiu Chen; Karen A Glatt; Heidi Greulich; Brian Desany; Christine K Lubeski; William Brockman; Pablo Alvarez; Stephen K Hutchison; J H Leamon; Michael T Ronan; Gregory S Turenchalk; Michael Egholm; William R Sellers; Jonathan M Rothberg; Matthew Meyerson Journal: Nat Med Date: 2006-06-25 Impact factor: 53.440
Authors: Julie A Huber; David B Mark Welch; Hilary G Morrison; Susan M Huse; Phillip R Neal; David A Butterfield; Mitchell L Sogin Journal: Science Date: 2007-10-05 Impact factor: 47.728
Authors: Adriana Giongo; David B Crabb; Austin G Davis-Richardson; Diane Chauliac; Jennifer M Mobberley; Kelsey A Gano; Nabanita Mukherjee; George Casella; Luiz F W Roesch; Brandon Walts; Alberto Riva; Gary King; Eric W Triplett Journal: ISME J Date: 2010-02-25 Impact factor: 10.302
Authors: Susan M Huse; David B Mark Welch; Andy Voorhis; Anna Shipunova; Hilary G Morrison; A Murat Eren; Mitchell L Sogin Journal: BMC Bioinformatics Date: 2014-02-05 Impact factor: 3.169
Authors: Karel Šimek; Vojtěch Kasalický; Jan Jezbera; Karel Horňák; Jiří Nedoma; Martin W Hahn; David Bass; Steffen Jost; Jens Boenigk Journal: ISME J Date: 2013-04-04 Impact factor: 10.302
Authors: Simon J Watson; Matthijs R A Welkers; Daniel P Depledge; Eve Coulter; Judith M Breuer; Menno D de Jong; Paul Kellam Journal: Philos Trans R Soc Lond B Biol Sci Date: 2013-02-04 Impact factor: 6.237