| Literature DB >> 30918626 |
Javier Ramiro-Garcia1,2,3, Gerben D A Hermes1,2, Christos Giatsis4, Detmer Sipkema2, Erwin G Zoetendal1,2, Peter J Schaap1,3, Hauke Smidt2.
Abstract
Background: Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises meta-analyses, although this fact is often disregarded.Entities:
Keywords: 16S rRNA amplicon analysis; bioinformatic pipeline; microbial community analysis; microbial ecology; next-generation sequencing
Year: 2016 PMID: 30918626 PMCID: PMC6419982 DOI: 10.12688/f1000research.9227.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. NG-Tax layout.
Input files are depicted in blue, output files are depicted in green and clustering processes using usearch are indicated with dashed lines. Details for some steps of the pipeline are marked with red numbers.
Figure 2. NG-Tax Assignment quality of the 55 MC phylotypes.
Three taxonomic assignments are shown: RDP full length, NG-Tax V5-V6 trimmed and NG-Tax V4 trimmed. If NG-Tax assignments are in agreement with SINA full length assignment, that classification is shown in green. Assignment specificity (the fraction of hits with an identical label) and the total number of hits supporting this taxonomic label are shown in blue for V5-V6 region and in red for V4 region
Performance of NG-Tax and QIIME at different taxonomic levels for region V4 and V5-V6.
Classified reads are defined as reads mapped to a sequence for which a genus, family or order level classification is given, without considering accuracy. The percentage represents the average over all samples. Spurious taxa are taxonomic classes not included in the MCs. The percentage of spurious reads is the percentage of total reads in the misclassified classes. F: forward read, R: reverse read.
|
| ||||||||
| Classified reads (%) | Spurious taxa (#) | Spurious reads (%) | ||||||
| NG-Tax | QIIME | NG-Tax | QIIME F & R | NG-Tax | QIIME F & R | |||
|
| 86.23 | 60.66 | 4 | 110 | 110 | 0.19 | 9.02 | 15.05 |
|
| 99.97 | 96.23 | 1 | 82 | 81 | 0.19 | 8.43 | 6.42 |
|
| 100 | 100.00 | 1 | 49 | 47 | 0.19 | 6.40 | 5.47 |
|
| ||||||||
| Classified reads (%) | Spurious taxa (#) | Spurious reads (%) | ||||||
| NG-Tax | QIIME | NG-Tax | QIIME F & R | NG-Tax | QIIME F & R | |||
|
| 99.23 | 69.99 | 5 | 53 | 51 | 0.28 | 13.42 | 18.65 |
|
| 99.89 | 93.63 | 0 | 29 | 29 | 0.00 | 9.64 | 12.05 |
|
| 100 | 99.81 | 0 | 15 | 17 | 0.00 | 6.33 | 6.45 |
Figure 3. Observed composition of all MCs compared with the expected ones (EXP) for both regions obtained with NG-Tax.
Figure 4. Observed composition of all MCs compared with the expected ones (EXP) for both regions and each read separately obtained with QIIME.
Figure 5. Distances to expected taxonomical profiles.
NG-Tax results are depicted in blue and QIIME in red.
Figure 6. PCoA using Weighted Unifrac of all sequenced and expected MCs as obtained after processing of data using NG-Tax ( A) and QIIME ( C). Darker colored triangles represent the expected composition while lighter colored circles represent sequenced samples. B/D. Rarefaction curves of PD for all MCs and their expected counterparts for NG-Tax ( B) and QIIME ( D). Dashed lines represent the expected composition while solid lines represent sequenced samples.
Figure 7. Pairwise Weighted UniFrac distances.
NG-Tax results are depicted in blue and QIIME in red.