Ram Vinay Pandey1, Viola Nolte, Christian Schlötterer. 1. Institut für Populationsgenetik, Veterinärmedizinische Universität Wien, Veterinärplatz 1, Vienna, Austria. christian.schloetterer@vetmeduni.ac.at.
Abstract
BACKGROUND: Next generation sequencing (NGS) technologies have substantially increased the sequence output while the costs were dramatically reduced. In addition to the use in whole genome sequencing, the 454 GS-FLX platform is becoming a widely used tool for biodiversity surveys based on amplicon sequencing. In order to use NGS for biodiversity surveys, software tools are required, which perform quality control, trimming of the sequence reads, removal of PCR primers, and generation of input files for downstream analyses. A user-friendly software utility that carries out these steps is still lacking. FINDINGS: We developed CANGS (Cleaning and Analyzing Next Generation Sequences) a flexible and user-friendly integrated software utility: CANGS is designed for amplicon based biodiversity surveys using the 454 sequencing platform. CANGS filters low quality sequences, removes PCR primers, filters singletons, identifies barcodes, and generates input files for downstream analyses. The downstream analyses rely either on third party software (e.g.: rarefaction analyses) or CANGS-specific scripts. The latter include modules linking 454 sequences with the name of the closest taxonomic reference retrieved from the NCBI database and the sequence divergence between them. Our software can be easily adapted to handle sequencing projects with different amplicon sizes, primer sequences, and quality thresholds, which makes this software especially useful for non-bioinformaticians. CONCLUSION: CANGS performs PCR primer clipping, filtering of low quality sequences, links sequences to NCBI taxonomy and provides input files for common rarefaction analysis software programs. CANGS is written in Perl and runs on Mac OS X/Linux and is available at http://i122server.vu-wien.ac.at/pop/software.html.
BACKGROUND: Next generation sequencing (NGS) technologies have substantially increased the sequence output while the costs were dramatically reduced. In addition to the use in whole genome sequencing, the 454 GS-FLX platform is becoming a widely used tool for biodiversity surveys based on amplicon sequencing. In order to use NGS for biodiversity surveys, software tools are required, which perform quality control, trimming of the sequence reads, removal of PCR primers, and generation of input files for downstream analyses. A user-friendly software utility that carries out these steps is still lacking. FINDINGS: We developed CANGS (Cleaning and Analyzing Next Generation Sequences) a flexible and user-friendly integrated software utility: CANGS is designed for amplicon based biodiversity surveys using the 454 sequencing platform. CANGS filters low quality sequences, removes PCR primers, filters singletons, identifies barcodes, and generates input files for downstream analyses. The downstream analyses rely either on third party software (e.g.: rarefaction analyses) or CANGS-specific scripts. The latter include modules linking 454 sequences with the name of the closest taxonomic reference retrieved from the NCBI database and the sequence divergence between them. Our software can be easily adapted to handle sequencing projects with different amplicon sizes, primer sequences, and quality thresholds, which makes this software especially useful for non-bioinformaticians. CONCLUSION: CANGS performs PCR primer clipping, filtering of low quality sequences, links sequences to NCBI taxonomy and provides input files for common rarefaction analysis software programs. CANGS is written in Perl and runs on Mac OS X/Linux and is available at http://i122server.vu-wien.ac.at/pop/software.html.
Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney Journal: Genome Res Date: 2002-10 Impact factor: 9.043
Authors: Roman K Thomas; Elizabeth Nickerson; Jan F Simons; Pasi A Jänne; Torstein Tengs; Yuki Yuza; Levi A Garraway; Thomas LaFramboise; Jeffrey C Lee; Kinjal Shah; Keith O'Neill; Hidefumi Sasaki; Neal Lindeman; Kwok-Kin Wong; Ana M Borras; Edward J Gutmann; Konstantin H Dragnev; Ralph DeBiasi; Tzu-Hsiu Chen; Karen A Glatt; Heidi Greulich; Brian Desany; Christine K Lubeski; William Brockman; Pablo Alvarez; Stephen K Hutchison; J H Leamon; Michael T Ronan; Gregory S Turenchalk; Michael Egholm; William R Sellers; Jonathan M Rothberg; Matthew Meyerson Journal: Nat Med Date: 2006-06-25 Impact factor: 53.440
Authors: Julie A Huber; David B Mark Welch; Hilary G Morrison; Susan M Huse; Phillip R Neal; David A Butterfield; Mitchell L Sogin Journal: Science Date: 2007-10-05 Impact factor: 47.728
Authors: Holly M Bik; Dorota L Porazinska; Simon Creer; J Gregory Caporaso; Rob Knight; W Kelley Thomas Journal: Trends Ecol Evol Date: 2012-01-11 Impact factor: 17.712
Authors: Sandy S Pineda; Yanni K-Y Chin; Eivind A B Undheim; Sebastian Senff; Mehdi Mobli; Claire Dauly; Pierre Escoubas; Graham M Nicholson; Quentin Kaas; Shaodong Guo; Volker Herzig; John S Mattick; Glenn F King Journal: Proc Natl Acad Sci U S A Date: 2020-05-12 Impact factor: 11.205
Authors: Maria Fischer; Rene Snajder; Stephan Pabinger; Andreas Dander; Anna Schossig; Johannes Zschocke; Zlatko Trajanoski; Gernot Stocker Journal: PLoS One Date: 2012-08-01 Impact factor: 3.240