| Literature DB >> 32161947 |
Haris Zafeiropoulos1, Ha Quoc Viet1, Katerina Vasileiadou1,2, Antonis Potirakis1, Christos Arvanitidis1,3, Pantelis Topalis4, Christina Pavloudi1, Evangelos Pafilis1.
Abstract
BACKGROUND: Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution.Entities:
Keywords: Docker; HPC; container; eDNA; high performance computing; metabarcoding; pipeline; singularity
Mesh:
Substances:
Year: 2020 PMID: 32161947 PMCID: PMC7066391 DOI: 10.1093/gigascience/giaa022
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:PEMA comprises 4 parts. The first step (top left) is the quality control and pre-processing of the Illumina sequencing reads. This step is common for both 16S rRNA and COI marker genes. The second step (top right) is the clustering of reads to (M)OTUs or their inferring to ASVs. The third step (bottom left) is the taxonomy assignment to the generated (M)OTUs/ASVs. In the fourth step (bottom right), the results of the metabarcoding analysis are provided to the user and visualized. *noun project icons by: ProSymbols (US), IconMark (PH), Nithinan Tatah (TH). clustering figure adapted from DOI: 10.7717/peerj.1420/fig-1.
Figure 2:Phylogeny-based taxonomy assignment. A: Building a reference tree for the phylogeny-based taxonomy assignment to 16S rRNA marker gene OTUs: from the latest edition of Silva SSU, all entries referring to Bacteria and Archaea were used and using the “art” algorithm, 10,000 consensus taxa were kept. B: Using PaPaRa and the OTUs that come up from every analysis, an MSA was made and EPA-ng took over the phylogeny-based taxonomy assignment. *noun project icons by: Rockicon and A Beale.
Summary benchmark of PEMA marker-gene–specific mock community recovery (precision)
| Marker gene | Precision | Recall | F1 |
|---|---|---|---|
| 16S rRNA | 0.81 | 0.85 | 0.83 |
| 18S rRNA | 0.75 | 0.90 | 0.82 |
| ITS | 0.79 | 0.94 | 0.86 |
| COI | 0.62 | 0.93 | 0.74 |
Comparison of the basic features of the different pipelines
OTU predictions and execution time for the different pipelines
| QIIME 2 | ||||||
|---|---|---|---|---|---|---|
| Parameter | LotuS | mothur | Deblur | DADA2 | PEMA | Pavloudi et al. [ |
| No. of OTUs | 9,849 | 142,669 | 517 | 1,023 | 6,028 | 7,050 |
| Execution time (h) | ∼9 | ∼67 | 2.5 | ∼5 | ∼1.5 | ∼26 |
(∼56 if the reference database is already built).
Figure 3:OTU bar plot at the phylum level. Bar plot depicting the taxonomy of the retrieved OTUs from PEMA for the dataset of Pavloudi et al. [49], at the phylum level for the case of the 16S marker gene. AR: Arachthos; ARO: Arachthos Neochori; ARDelta: Arachthos Delta; LOin: Logarou station inside the lagoon; LOout: Logarou station in the channel connecting the lagoon to the gulf; Kal: Kalamitsi.
PEMA's output and execution time
| Parameter |
|
|
|
|
|
|---|---|---|---|---|---|
| MOTUs after pre-process and clustering steps | 83,791 | 59,833 | 33,227 | 7,384 | 4,829 |
| MOTUs after chimera removal | 80,347 | 57,863 | 32,539 | 7,339 | 4,796 |
| Non-singleton MOTUs | 6,381 | 4,947 | 2,658 | 1,914 | 1,634 |
| Assigned species | 62 | 83 | 86 | 86 | 84 |
| Execution time (h) | 2:01:35 | 2:09:49 | 1:51:44 | 2:17:26 | 2:31:15 |
PEMA's output and execution time (using a 20-core node) for different values of Swarm's d parameter.
Comparison of the taxonomy of retrieved MOTUs among PEMA, Barque, and the positive controls of Bista et al. [50]
| Barque | PEMA | Bista et al. [ |
|---|---|---|
|
|
|
|
|
|
| |
|
|
| |
| Chironomidae sp. | Chironomidae sp. | |
|
|
| |
|
|
| |
|
|
|
|
|
|
| |
|
|
|
Taxonomies identical to the published study (species level).
Taxonomies identical to the published study (genus level).