MOTIVATION: Next-generation sequencing methods are generating increasingly massive datasets, yet still do not fully capture genetic diversity in the richest environments. To understand such complicated and elusive systems, effective tools are needed to assist with delineating the differences found in and between community datasets. RESULTS: The Small Subunit Markov Modeler (SSuMMo) was developed to probabilistically assign SSU rRNA gene fragments from any sequence dataset to recognized taxonomic clades, producing consistent, comparable cladograms. Accuracy tests predicted >90% of genera correctly for sequences downloaded from public reference databases. Sequences from a next-generation sequence dataset, sampled from lean, overweight and obese individuals, were analysed to demonstrate parallel visualization of comparable datasets. SSuMMo shows potential as a valuable curatorial tool, as numerous incorrect and outdated taxonomic entries and annotations were identified in public databases. AVAILABILITY AND IMPLEMENTATION: SSuMMo is GPLv3 open source Python software, available at http://code.google.com/p/ssummo/. Taxonomy and HMM databases can be downloaded from http://bioltfws1.york.ac.uk/ssummo/. SUPPLEMENTARY INFORMATION: Supplemental materials are available at Bioinformatics Online.
MOTIVATION: Next-generation sequencing methods are generating increasingly massive datasets, yet still do not fully capture genetic diversity in the richest environments. To understand such complicated and elusive systems, effective tools are needed to assist with delineating the differences found in and between community datasets. RESULTS: The Small Subunit Markov Modeler (SSuMMo) was developed to probabilistically assign SSU rRNA gene fragments from any sequence dataset to recognized taxonomic clades, producing consistent, comparable cladograms. Accuracy tests predicted >90% of genera correctly for sequences downloaded from public reference databases. Sequences from a next-generation sequence dataset, sampled from lean, overweight and obese individuals, were analysed to demonstrate parallel visualization of comparable datasets. SSuMMo shows potential as a valuable curatorial tool, as numerous incorrect and outdated taxonomic entries and annotations were identified in public databases. AVAILABILITY AND IMPLEMENTATION: SSuMMo is GPLv3 open source Python software, available at http://code.google.com/p/ssummo/. Taxonomy and HMM databases can be downloaded from http://bioltfws1.york.ac.uk/ssummo/. SUPPLEMENTARY INFORMATION: Supplemental materials are available at Bioinformatics Online.
Authors: Anders Lanzén; Steffen L Jørgensen; Daniel H Huson; Markus Gorfer; Svenn Helge Grindhaug; Inge Jonassen; Lise Øvreås; Tim Urich Journal: PLoS One Date: 2012-11-08 Impact factor: 3.240
Authors: Chao Xie; Chin Lui Wesley Goi; Daniel H Huson; Peter F R Little; Rohan B H Williams Journal: BMC Bioinformatics Date: 2016-12-22 Impact factor: 3.169
Authors: Lindsey M Solden; David W Hoyt; William B Collins; Johanna E Plank; Rebecca A Daly; Erik Hildebrand; Timothy J Beavers; Richard Wolfe; Carrie D Nicora; Sam O Purvine; Michelle Carstensen; Mary S Lipton; Donald E Spalinger; Jeffrey L Firkins; Barbara A Wolfe; Kelly C Wrighton Journal: ISME J Date: 2016-12-13 Impact factor: 10.302