| Literature DB >> 33329686 |
Alejandro Abdala Asbun1, Marc A Besseling1, Sergio Balzano1, Judith D L van Bleijswijk1, Harry J Witte1, Laura Villanueva1,2, Julia C Engelmann1.
Abstract
Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.Entities:
Keywords: 16S/18S rRNA gene; Illumina; amplicon sequencing; community profiling; microbiome; pipeline; snakemake
Year: 2020 PMID: 33329686 PMCID: PMC7718033 DOI: 10.3389/fgene.2020.489357
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Input file structure for Cascabel. (A) This input file structure is generated from the file paths provided in the config file when the dataset consists of a single sequencing library. For multiple libraries, it is created from a text file specifying the individual libraries or by the helper script initSample.sh. (B) Example of a barcode mapping file for four samples. Barcode and primer sequences are listed in 5′-3′ direction and have been abbreviated.
Figure 2Overview of Cascabel. The workflow indicates input files (config file, sequence data in fastq format, barcode mapping file), mandatory and optional steps of the pipeline (blue boxes) as well as the main output files. The boxes of optional steps have dashed borders. “Clean and filter” refers to removing primers/adapters and chimeras. Table 1 shows a detailed summary of the steps, available tools and output files.
Outline of the steps performed by Cascabel. “Script(s)” refers to Cascabel scripts in bash, java or R.
| Initialize structure | Script | Project folder and file structure |
| Quality Control | FastQC (Andrews, | FastQC report |
| Merge reads | PEAR (Zhang et al., | Merged (assembled) sequences |
| Demultiplex | QIIME (Caporaso et al., | Sequences assigned to samples in one file and per sample |
| Align vs. reference | Mothur (Schloss et al., | Aligned sequences |
| Remove chimeras | usearch61 (Edgar, | Chimera-free sequences |
| Remove adapters | Cutadapt (Martin, | Adapter-free sequences |
| Size filter | Script | Filtered sequences |
| Dereplicate | VSEARCH | Dereplicated sequences |
| Generate OTUs | Mothur (Schloss et al., | OTU table |
| Pick representatives (OTUs) | Random, longest, most_abundant, first | Fasta file with representative sequences |
| Generate ASVs | DADA2 (Callahan et al., | ASV table |
| Assign taxonomy OTUs | QIIME [BLAST (Altschul et al., | Taxonomic assignments for each OTU |
| Assign taxonomy ASVs | RDP | Taxonomic assignments for each ASV |
| Generate OTU table | QIIME, scripts | Annotated OTU table |
| Generate ASV table | DADA2 | Annotated ASV table |
| Alignment | Pynast (Caporaso et al., | Multiple sequence alignment |
| Make tree | Muscle, clustalw, raxml (Stamatakis, | Phylogenetic tree |
| Report | Scripts, Krona (Ondov et al., | HTML, pdf report, Krona charts |
Figure 3Figures shown in Cascabel reports. (A) Smoothed sequence length distribution after merging reads, for one library. The plot is meant to help making a sensible choice for sequence length filtering. (B) Number of sequences per sample. This histogram is part of the OTU report (including all libraries). (C) Number of sequences after individual pre-processing steps. “Assembled” refers to the number of raw read pairs which could be merged based on their overlap. “Demultiplexed” refers to the number of raw reads which could be assembled and assigned to a sample, and “Length filtering” indicates the number of raw reads passing the previous and the sequence length criteria. This plot is part of the library report. (D) Number of sequences after individual steps after potentially combining several libraries (total number of reads) and generating OTUs. “Derep.” indicates the number of dereplicated reads and their percentage relative to the total combined reads. “OTUs” is the total number of OTUs and the percentage is relative to the number of combined reads. “Assigned OTUs” is the number and percentage of OTUs with a taxonomic assignment. “No singletons” refers to the number and percentage of OTUs excluding singleton OTUs, and “Assigned NO singletons” is the number and percentage of singleton-free OTUs with a taxonomic assignation. The plot is part of the OTU report. (E) Krona chart for one sample. The krona charts are interactive and can be viewed with a web browser. Colors indicate the taxonomic groups to which the OTU was assigned. Each ring of the pie chart represents a different taxonomic level. An example of a full library report is shown in Supplementary Datasheet 3, and an OTU report is provided in Supplementary Datasheet 4.