| Literature DB >> 30271256 |
Sten Anslan1, R Henrik Nilsson2, Christian Wurzbacher3, Petr Baldrian4, Mohammad Bahram5,6,7.
Abstract
Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.Entities:
Keywords: Microbial communities; amplicon sequencing; fungal biodiversity; metagenomics; microbiome; mycobiome
Year: 2018 PMID: 30271256 PMCID: PMC6160831 DOI: 10.3897/mycokeys.39.28109
Source DB: PubMed Journal: MycoKeys ISSN: 1314-4049 Impact factor: 2.984
Used software, sequence and OTU counts (values in bold) by a) Illumina and b) PacBio analysis platforms. The number of sequences denotes raw input reads and remaining reads after each analysis step. Singleton OTUs were excluded from the OTU counts.
| a) | LotuS | Qiime2 | PipeCraft | Galaxy | PIPITS |
|---|---|---|---|---|---|
| Raw reads | 7,981,812a | 7,335,838b | 7,981,812a | 7,981,812a | 7 335 838b |
| Assembly | FLASH/ NA | DADA2/ NA | VSEARCH/ 7,511,274 | FASTQ joiner/ 7,911,554 | VSEARCH/ 7,198,094 |
| Quality filtering | sdm/NA | DADA2/ 5,428,563 | VSEARCH/ 7,511,274 | trimmomatic/ 7,879,960 | fastqx/ 7,142,354 |
| Demultiplexing | sdm/ 6,727,631 | NP | mothur/ 6,558,772 | mothur/ 1,643,879 | NP |
| Chimera filtering | USEARCH/ 6,486,802 | NP | VSEARCH/ 6,300,085 | VSEARCH/ 1,621,330 | VSEARCH/ NA |
| 5,919,084 | NP | 6,262,000 | NP | 6,401,097 | |
| Clustering (OTUs) | UPARSE/ 8,659 | VSEARCH/ 7,477 | UPARSE/ 7,598 | VSEARCH/ 23,167 | VSEARCH/ 7,887 |
|
|
|
|
| ||
| CCSc reads | 720,222a | 720,222a | 720,222a | ||
| Quality filtering | sdm/ NA | VSEARCH/ 462,010 | trimmomatic/ 672,292 | ||
| Demultiplexing | sdm/ 258,085 | mothur/ 380,722 | mothur/ 457,173 | ||
| Chimera filtering | USEARCH/ 255,746 | VSEARCH/ 341,154 | VSEARCH/ 405,025 | ||
| 192,485 | 338,150 | NP | |||
| Clustering (OTUs) | UPARSE/ 942 | UPARSE/ 4,176 | VSEARCH/ 8,338 | ||
amultiplexed input data; bdemultiplexed input data; ccircular consensus sequences; NA: indicate not available; NP: not performed.
Figure 1.Outline of workflow in different analysis pipelines.
Figure 2.OTU accumulation curves of the evaluated pipelines for a) PacBio and b) Illumina datasets.
Figure 3.Number of OTUs per sample for Illumina data recorded from a) pipeline-generated OTU tables (median differences = 38 OTUs) and from b) filtered OTU tables (median differences = 12 OTUs). The Galaxy workflow was excluded here.