| Literature DB >> 27130479 |
Mohamed Mysara1,2,3,4, Natalie Leys1, Jeroen Raes3,4, Pieter Monsieurs5.
Abstract
BACKGROUND: The development of high-throughput sequencing technologies has revolutionized the field of microbial ecology via the sequencing of phylogenetic marker genes (e.g. 16S rRNA gene amplicon sequencing). Denoising, the removal of sequencing errors, is an important step in preprocessing amplicon sequencing data. The increasing popularity of the Illumina MiSeq platform for these applications requires the development of appropriate denoising methods.Entities:
Keywords: 16S rRNA gene amplicon sequencing; Denoising; Error correction; Metagenomics; MiSeq
Mesh:
Substances:
Year: 2016 PMID: 27130479 PMCID: PMC4850673 DOI: 10.1186/s12859-016-1061-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of the mock sequencing data discussed in this work. It contains information on the amplified regions, samples ID’s, number of paired-end reads (i.e. contigs), average contig length (i.e. length after merging both paired-end reads), and average length of the overlapping part between both paired-end reads
| Name | Region | length | Overlap | ID | #contigs |
|---|---|---|---|---|---|
| MOCK1 [ | V34 | 430 | 70 | 130401 | 184216 |
| 130403 | 131241 | ||||
| 130417 | 102547 | ||||
| V4 | 250 | 250 | 130422 | 79701 | |
| 130401 | 1217529 | ||||
| 130403 | 1191998 | ||||
| V45 | 375 | 125 | 130417 | 1015673 | |
| 130422 | 871118 | ||||
| 130401 | 826262 | ||||
| MOCK2 [ | V4 | 250 | 250 | v4.I.1 | 213043 |
| v4.I.05 | 240682 | ||||
| V45 | 390 | 110 | v4.v5.I.1 | 2484 | |
| v4.v5.I.11 | 90126 | ||||
| MOCK3 (SRP066114) | V34 | 421 | 140 | M1 | 35168 |
| M2 | 60488 | ||||
| M3 | 21723 |
Fig. 1Schematic overview showing the different steps of the iped algorithm
Overview table comparing error rates of the samples treated with UNOISE (after USEARCH preprocessing) and those without applying a denoising algorithm, after applying Pre-cluster or after applying IPED (after mothur preprocessing). Due to the difference preprocessing steps applied in USEARCH and mothur, the amount of reads removed differ, where around 53 % and 39 % of reads are removed in respective order
| Variable region | Sample ID | Error rates | |||
|---|---|---|---|---|---|
| USEARCH + UNOISE | Mothur (make.contigs) | Mothur + | Mothur + IPED | ||
| V34 | 130403 | 0.0003 | 0.0026 | 0.0013 | 0.0002 |
| 130417 | 0.0004 | 0.0023 | 0.0010 | 0.0003 | |
| 130422 | 0.0008 | 0.0028 | 0.0017 | 0.0008 | |
| M1 | 0.0006 | 0.00149 | 0.0007 | 0.0004 | |
| M2 | 0.0007 | 0.00150 | 0.0008 | 0.0006 | |
| M3 | 0.0005 | 0.00140 | 0.0007 | 0.0005 | |
| V4 | 130403 | 0.00011 | 0.00056 | 0.00013 | 0.00010 |
| 130417 | 0.00009 | 0.00051 | 0.00010 | 0.00008 | |
| 130422 | 0.00009 | 0.00049 | 0.00010 | 0.00008 | |
| v4.I.1 | 0.00002 | 0.00061 | 0.00008 | 0.00004 | |
| v4.I.05 | 0.00002 | 0.00068 | 0.00010 | 0.00004 | |
| V45 | 130403 | 0.0030 | 0.0084 | 0.0055 | 0.0022 |
| 130417 | 0.0029 | 0.0069 | 0.0041 | 0.0020 | |
| 130422 | 0.0026 | 0.0060 | 0.0033 | 0.0016 | |
| v4.v5.I.1 | 0.0082 | 0.0066 | 0.0061 | 0.0041 | |
| v4.v5.I.11 | 0.0084 | 0.0033 | 0.0034 | 0.0031 | |
| Average | All samples | 0.0018 | 0.0029 | 0.0018 | 0.0010 |
Fig. 2Plot showing the error rate versus the position in the read after being treated with Pre-cluster (blue), UNOISE (violet) and IPED (red). The raw error rates (i.e. without applying a denoising algorithm) are colored black