| Literature DB >> 25276506 |
Frédéric Mahé1, Torbjørn Rognes2, Christopher Quince3, Colomban de Vargas4, Micah Dunthorn5.
Abstract
Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters' internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.Entities:
Keywords: Barcoding; Environmental diversity; Molecular operational taxonomic units
Year: 2014 PMID: 25276506 PMCID: PMC4178461 DOI: 10.7717/peerj.593
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Schematic view of the greedy clustering approach and comparison with swarm.
(A) Visualization of the widely used greedy clustering approach based on centroid selection and a global clustering threshold, t, where closely related amplicons can be placed into different OTUs. (B) By contrast, Swarm clusters iteratively by using a small user-chosen local clustering threshold, d, allowing OTUs to reach their natural limits.
Figure 2Uneven mock-community.
Comparisons of five clustering methods, over 20 different clustering thresholds, and 100 amplicon input-order shufflings of a community composed of species of uneven abundances. Clustering precision and recall are estimated using amplicon taxonomic assignments as ground truth, and are summarized by the adjusted Rand index.