BACKGROUND: Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis. RESULTS: Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a 'sneak peak' into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI's nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results. CONCLUSION: CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.
BACKGROUND: Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis. RESULTS: Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a 'sneak peak' into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI's nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results. CONCLUSION: CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.
Authors: C Manichanh; L Rigottier-Gois; E Bonnaud; K Gloux; E Pelletier; L Frangeul; R Nalin; C Jarrin; P Chardon; P Marteau; J Roca; J Dore Journal: Gut Date: 2005-09-27 Impact factor: 23.059
Authors: Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko Journal: Genome Res Date: 2005-09-16 Impact factor: 9.043
Authors: Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg Journal: Nature Date: 2005-07-31 Impact factor: 49.962
Authors: Fredrik Bäckhed; Hao Ding; Ting Wang; Lora V Hooper; Gou Young Koh; Andras Nagy; Clay F Semenkovich; Jeffrey I Gordon Journal: Proc Natl Acad Sci U S A Date: 2004-10-25 Impact factor: 11.205
Authors: J Craig Venter; Karin Remington; John F Heidelberg; Aaron L Halpern; Doug Rusch; Jonathan A Eisen; Dongying Wu; Ian Paulsen; Karen E Nelson; William Nelson; Derrick E Fouts; Samuel Levy; Anthony H Knap; Michael W Lomas; Ken Nealson; Owen White; Jeremy Peterson; Jeff Hoffman; Rachel Parsons; Holly Baden-Tillson; Cynthia Pfannkoch; Yu-Hui Rogers; Hamilton O Smith Journal: Science Date: 2004-03-04 Impact factor: 47.728
Authors: Chris I Hunter; Alex Mitchell; Philip Jones; Craig McAnulla; Sebastien Pesseat; Maxim Scheremetjew; Sarah Hunter Journal: Brief Bioinform Date: 2012-09-08 Impact factor: 11.622
Authors: Christophe Lambert; Cassandra Braxton; Robert L Charlebois; Avisek Deyati; Paul Duncan; Fabio La Neve; Heather D Malicki; Sebastien Ribrioux; Daniel K Rozelle; Brandye Michaels; Wenping Sun; Zhihui Yang; Arifa S Khan Journal: Viruses Date: 2018-09-27 Impact factor: 5.048
Authors: Janisha A Patel; Dennis A Dean; Charles Hadley King; Nan Xiao; Soner Koc; Ekaterina Minina; Anton Golikov; Phillip Brooks; Robel Kahsay; Rahi Navelkar; Manisha Ray; Dave Roberson; Chris Armstrong; Raja Mazumder; Jonathon Keeney Journal: Database (Oxford) Date: 2021-03-30 Impact factor: 3.451
Authors: Charles H King; Hiral Desai; Allison C Sylvetsky; Jonathan LoTempio; Shant Ayanyan; Jill Carrie; Keith A Crandall; Brian C Fochtman; Lusine Gasparyan; Naila Gulzar; Paul Howell; Najy Issa; Konstantinos Krampis; Lopa Mishra; Hiroki Morizono; Joseph R Pisegna; Shuyun Rao; Yao Ren; Vahan Simonyan; Krista Smith; Sharanjit VedBrat; Michael D Yao; Raja Mazumder Journal: PLoS One Date: 2019-09-11 Impact factor: 3.240
Authors: Vivienne Edwards; Dylan L Smith; Francoise Meylan; Linda Tiffany; Sarah Poncet; Wells W Wu; Je-Nie Phue; Luis Santana-Quintero; Kathleen A Clouse; Odile Gabay Journal: Microorganisms Date: 2021-12-30