| Literature DB >> 30577803 |
Timothy Chappell1, Shlomo Geva1, James M Hogan1, Flavia Huygens2,3, Irani U Rathnayake2,3, Stephen Rudd4, Wayne Kelly1, Dimitri Perrin5.
Abstract
BACKGROUND: Sequencing highly-variable 16S regions is a common and often effective approach to the study of microbial communities, and next-generation sequencing (NGS) technologies provide abundant quantities of data for analysis. However, the speed of existing analysis pipelines may limit our ability to work with these quantities of data. Furthermore, the limited coverage of existing 16S databases may hamper our ability to characterise these communities, particularly in the context of complex or poorly studied environments.Entities:
Keywords: Clustering; Community analysis; Metagenomics; Read signatures; Wound healing
Mesh:
Year: 2018 PMID: 30577803 PMCID: PMC6302383 DOI: 10.1186/s12859-018-2540-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Clustering and analysis pipeline
Fig. 2Needleman-Wunch global alignment cluster analysis. Histogram of Needleman-Wunch scores between random pairs of reads in the same cluster (intracluster pairs) and pairs of reads from different clusters (intercluster pairs)
Comparison of clustering methods based on Needleman-Wunsch alignment scores
| Wound Microbiome | ||||||
| 46313157 sequences, average sequence length: 337 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 16 | 5254 | -68.8 | 125.1 | 219.4 | 107.7 |
| Merged SigClust | 18 | 1260 | -73.9 | 118.0 | 180.9 | 136.9 |
| UClust T=0.75 | 156 | 7141 | -81.4 | 111.6 | 148.5 | 142.1 |
| 16S Genus | 2586 | 2221 | -82.4 | 104.6 | 98.6 | 167.4 |
| 16S Species | 2586 | 8375 | -76.9 | 110.8 | 125.6 | 162.5 |
| 16S Strains | 2586 | 9354 | -82.4 | 104.6 | 98.6 | 167.4 |
| Oral Metagenome – Human (mgp41)[ | ||||||
| 1237319 sequences, average sequence length: 59 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 0.2 | 17621 | 5.5 | 22.3 | 51.6 | 14.1 |
| UClust T=0.75 | 1.7 | 17621 | -4.4 | 14.4 | 38.8 | 13.3 |
| PRJEB4688 [ | ||||||
| 5497442 sequences, average sequence length: 253 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 1.62 | 6998 | -94.8 | 126.6 | 250.1 | 77.2 |
| UClust T=0.75 | 9 | 6998 | -109.0 | 117.4 | 121.5 | 93.5 |
Results are shown for the wound data, and for two previously published Illumina metagenomic datasets. We report for each method the clustering time in minutes and the number of clusters returned. The remaining columns of the table show the mean and standard deviation of the separation for the sampled intercluster and intracluster pairs
Fig. 3Smith-Waterman global alignment cluster analysis. Histogram of Smith-Waterman scores between random pairs of reads in the same cluster (intracluster pairs) and pairs of reads from different clusters (intercluster pairs)
Comparison of clustering methods based on Smith-Waterman alignment scores
| Wound Microbiome | ||||||
| 46313157 sequences, average sequence length: 337 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 16 | 5254 | 59.0 | 58.7 | 234.4 | 97.9 |
| Merged SigClust | 18 | 1260 | 55.5 | 51.2 | 211.6 | 104.9 |
| UClust T=0.75 | 156 | 7141 | 50.3 | 41.8 | 202.9 | 95.0 |
| 16S Genus | 2586 | 2221 | 43.9 | 28.0 | 188.9 | 92.5 |
| 16S Species | 2586 | 8375 | 48.7 | 39.6 | 205.8 | 93.6 |
| 16S Strains | 2586 | 9354 | 50.1 | 42.6 | 206.7 | 93.5 |
| Oral Metagenome – Human (mgp41)[ | ||||||
| 1237319 sequences, average sequence length: 59 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 0.2 | 17621 | 25.5 | 11.8 | 53.7 | 9.6 |
| UClust T=0.75 | 1.7 | 17621 | 20.2 | 6.9 | 43.4 | 9.1 |
| PRJEB4688 [ | ||||||
| 5497442 sequences, average sequence length: 253 | ||||||
| Method | Time | Clusters | Intercluster | Intracluster | ||
| (m) | Avg | SD | Avg | SD | ||
| SigClust | 1.62 | 6998 | 44.4 | 48.5 | 257.7 | 67.0 |
| UClust T=0.75 | 9 | 6998 | 37.1 | 38.0 | 159.1 | 66.9 |
As before, results are shown for the wound data, and for two previously published Illumina metagenomic datasets. We report for each method the clustering time in minutes and the number of clusters returned. The remaining columns of the table show the mean and standard deviation of the separation for the sampled intercluster and intracluster pairs
Fig. 4Relative Cluster Abundance for wound #4059
Fig. 5Bray Curtis dissimilarity analysis for wound #4059. Each series shows the variation in BC dissimilarity for each time point relative to the observation immediately before, commencing with the time point following the label. So, label W4 shows observations for W5 – relative to W4, for W6 – relative to W5, and so on. For W11, we see only the single observation at W12, relative to W11
Fig. 6Relative Cluster Abundance for wound #4032
Fig. 7Bray Curtis dissimilarity analysis for wound #4032. Each series shows the variation in BC dissimilarity for each time point relative to the observation immediately before, commencing with the time point following the label. See the caption for Fig. 5 for a more detailed explanation
Fig. 8Relative Cluster Abundance for wound #4068
Fig. 9Relative Cluster Abundance for wound #4046