| Literature DB >> 28824589 |
Xiaoli Dong1, Manuel Kleiner1, Christine E Sharp1, Erin Thorson1, Carmen Li2, Dan Liu1, Marc Strous1.
Abstract
Microbial community profiling by barcoded 16S rRNA gene amplicon sequencing currently has many applications in microbial ecology. The low costs of the parallel sequencing of multiplexed samples, combined with the relative ease of data processing and interpretation (compared to shotgun metagenomes) have made this an entry-level approach. Here we present the MetaAmp pipeline for processing of SSU rRNA gene and other non-coding or protein-coding amplicon sequencing data by investigators that are inexperienced with bioinformatics procedures. It accepts single-end or paired-end sequences in fasta or fastq format from various sequencing platforms. It includes read quality control, and merging of forward and reverse reads of paired-end reads. It makes use of UPARSE, Mothur, and the SILVA database for clustering, removal of chimeric reads, taxonomic classification, and generation of diversity metrics. The pipeline has been validated with a mock community of known composition. MetaAmp provides a convenient web interface as well as command line interface. It is freely available at: http://ebg.ucalgary.ca/metaamp. Since its launch 2 years ago, MetaAmp has been used >2,800 times, by many users worldwide.Entities:
Keywords: amplicon sequencing; bioinformatics; metagenomics; microbial ecology; microbiome
Year: 2017 PMID: 28824589 PMCID: PMC5540949 DOI: 10.3389/fmicb.2017.01461
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Overview of MetaAmp workflow.
Expected vs. detected bacterial species in the mock community from the two separate MiSeq runs.
| 3,459 | 7,032 | ||
| 968 | 2,171 | ||
| 3,303 | 7,858 | ||
| N/A | N/A | ||
| 1,753 | 3,748 | ||
| 3,609 | 9,463 | ||
| 1,703 | 4,010 | ||
| 989 | 2,105 | ||
| 2,262 | 5,351 | ||
| 1,539 | 3,921 | ||
| 1,933 | 4,503 | ||
| 4,589 | 9,359 | ||
| 1,092 | 2,206 | ||
| 6,741 | 15,621 | ||
| 1,373 | 3,143 | ||
| 2,761 | 6,411 | ||
| 341 | 913 | ||
| 2 | N/A | ||
| 25 | 89 | ||
| N/A | N/A | ||
| N/A | N/A | ||
| + | 18 | 38 | |
| + | 7 | 16 | |
| + | N/A | 5 | |
| + | N/A | 2 | |
| + | ML635J-21 | N/A | 2 |
| + | N/A | 2 | |
| + | N/A | 3 | |
| + | N/A | 11 |
Indicates the organism's DNA input is very low and only in UEC samples. +Indicates the unexpected OTUs detected in the analysis of the mock samples. #Indicates multiple strains were added.
Blue color indicates the co-culture contaminants or other contaminants and orange color indicates carry-over or cross-talk.
Figure 2Comparison of the inferred and expected community composition. The line plots show the expected species abundance and the scatter plots show the observed abundance based on MetaAmp analysis of the MiSeq runs. Each point in the plot represents a biological or technical replicate. (A) ECC is the equal cell number community. In the ECC community, the same number of cells was added to the community for each strain. (B) EPC is the equal protein amount community. In the EPC community, the same amount of each strain was added to the community based on protein mass. (C) UEC is the uneven community. In the UEC community, the cell count or protein amount of the different strains cover a large abundance range. The species names starting with # indicate that multiple strains were added to the mock samples and the species names starting with * indicate that those species were only added to the UEC community and their DNA input abundance was very low. The expected relative abundance of each organism was corrected by the 16s rRNA gene copy number in each organism.
MetaAmp parameter configuration for mock community analysis.
| -an | Analysis name | Benchmark |
| Email address | your@ucalgary.ca | |
| -seqformat | Sequence format | Default |
| -seqtype | Sequencing type | Default |
| -map | Mapping.txt | |
| -oligos | Oligos.txt | |
| -g | Marker gene type | Default |
| -s | Similarity cutoff | Default |
| -minoverlen | Minimum length of overlap | 100 |
| -maxdiffs | Maximum number of mismatches in the overlap region | 8 |
| -pdiffs | Maximum number of differences to the primer sequence | Default |
| -maxee | Maximum number of expected errors | Default |
| -trunclen | Trim amplicon to a fixed length | Default |
.
mapping.txt file (Table .
oligos.txt file contains amplicon sequence primers. It can have multiple primers and each unique primer must be in separate lines. It can also accept IUPAC code. Oligos.txt file of the mock community analysis only contains two lines as flowing:
forward CCTACGGGAGGCAGCAG.
reverse GACTACHVGGGTATCTAATCC.
Figure 3A screenshot montage of MetaAmp output in different view. (A) Mock sample relation dendrogram describes the similarity of the mock samples to each other measured by Bray-Curtis dissimilarity distance. (B) Mock sample NMDS analysis based on the Bray-Curtis dissimilarity distance.