| Literature DB >> 26966652 |
Dominik Forster1, Micah Dunthorn1, Thorsten Stoeck1, Frédéric Mahé1.
Abstract
Discovery of novel diversity in high-throughput sequencing studies is an important aspect in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on the discovery of novel diversity, we clustered an environmental marine high-throughput sequencing dataset of protist amplicons together with reference sequences from the taxonomically curated Protist Ribosomal Reference (PR(2)) database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The potentially novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and in the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as potentially novel by USEARCH and Swarm were more than 97% similar to references of PR(2). Using shortest path analyses on sequence similarity network OTUs and Swarm OTUs we found additional novel diversity within OTUs that would have gone unnoticed without further exploiting their underlying network topologies. These results demonstrate that graph theory provides powerful tools for microbial ecology and the analysis of environmental high-throughput sequencing datasets. Furthermore, sequence similarity networks were most accurate in delineating novel diversity from previously discovered diversity.Entities:
Keywords: Barcoding; Environmental diversity; Molecular operational taxonomic unit
Year: 2016 PMID: 26966652 PMCID: PMC4782723 DOI: 10.7717/peerj.1692
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Sequence clustering results of the three tested approaches.
Indicated is the amount of OTUs and the amount (and type) of amplicons within these OTUs for each class of OTUs defined in our analyses.
| USEARCH | Sequence similarity networks | Swarm | |
|---|---|---|---|
| OTUs | 12,427 | 8,202 | 13,240 |
| OTUs containing environmental and reference amplicons | 2,527 | 1,619 | 1,993 |
| 223,735 | 253,965 | 142,946 | |
| 33,386 | 54,988 | 18,774 | |
| OTUs containing exclusively reference amplicons | 4,558 | 3,138 | 5,019 |
| 59,368 | 46,255 | 49,147 | |
| OTUs containing exclusively environmental amplicons | 5,342 | 3,445 | 6,228 |
| 71,337 | 47,116 | 81,073 |
Figure 1Venn-Diagram of the number of amplicons in exclusively environmental OTUs.
The area of each clustering approach was proportionally adjusted to the amount of amplicons in exclusively environmental OTUs detected in that approach. Overlapping areas reflect amplicons detected in each of the respective approaches. Numbers indicate how many amplicons are represented by each area, whereas each area’s size is proportional to the number of amplicons included.
Figure 2Genetic divergence of amplicons in exclusively environmental OTUs to PR2 references by clustering approach.
Each point represents one amplicon clustered into an exclusively environmental OTU by the respective clustering approach. Position on the x-axis gives the abundance of each amplicon in the initial dataset before dereplication. The y-axis gives the highest pairwise sequence similarity score of an amplicon to any entry in the PR2 database as calculated by VSEARCH.
Figure 3Shortest path analyses of CCs and swarms.
The plots illustrate how many edges separated each environmental amplicon from its closest reference amplicon in sequence similarity networks and Swarm. A distance of ‘1’ edge means that the environmental amplicon was directly connected to a reference. ‘Infinite’ means that the environmental amplicon was placed into an exclusively environmental OTU (see also Table 1) and did not exhibit any connection to a reference amplicon.