| Literature DB >> 27832214 |
Abstract
Assignment of 16S rRNA gene sequences to operational taxonomic units (OTUs) allows microbial ecologists to overcome the inconsistencies and biases within bacterial taxonomy and provides a strategy for clustering similar sequences that do not have representatives in a reference database. I have applied the Matthews correlation coefficient to assess the ability of 15 reference-independent and -dependent clustering algorithms to assign sequences to OTUs. This metric quantifies the ability of an algorithm to reflect the relationships between sequences without the use of a reference and can be applied to any data set or method. The most consistently robust method was the average neighbor algorithm; however, for some data sets, other algorithms matched its performance.Entities:
Keywords: 16S rRNA gene sequences; OTU; QIIME; bioinformatics; environmental microbiology; metagenomics; microbial ecology; microbiome; mothur
Year: 2016 PMID: 27832214 PMCID: PMC5069744 DOI: 10.1128/mSystems.00027-16
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1 Comparison of OTU quality generated by multiple algorithms applied to four data sets. The nearest, average, and furthest neighbor clustering algorithms were used as implemented in mothur (v.1.37) (25). Abundance-based greedy clustering (AGC) and distance-based greedy clustering (DGC) were implemented using USEARCH (v.6.1) and VSEARCH (v.1.5.0) (3, 5, 26). Other de novo clustering algorithms included Swarm (v.2.1.1) (6, 7), OTUCLUST (v.0.1) (27), and Sumaclust (v.1.0.20). The MCC values for swarm were determined by selecting the distance threshold that generated the maximum MCC value for each data set. The USEARCH and SortMeRNA (v.2.0) closed-reference clusterings were performed using QIIME (v.1.9.1) (28, 29). Closed-reference clustering was also performed using VSEARCH (v.1.5.0) and NINJA-OPS (v.1.5.0) (16). The order of the sequences in each data set was randomized 30 times, and the intramethod range in MCC values was smaller than the plotting symbol. MCC values were calculated using mothur.