| Literature DB >> 25885646 |
John M Gaspar1, W Kelley Thomas2.
Abstract
BACKGROUND: Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been designed that can reduce error rates in mock community data, but they change the sequence data in a manner that can be inconsistent with the process of removing errors in studies of real communities. In addition, they are limited by the size of the dataset and the sequencing technology used.Entities:
Mesh:
Year: 2015 PMID: 25885646 PMCID: PMC4380255 DOI: 10.1186/s12859-015-0532-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results of the filtering step of FlowClus
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| Reads eliminated | 2226 | 0 | 107 | n/a | 3702 | 6035 |
| Reads truncated | n/a | n/a | 1359 | 12486 | 15199 | 29044 |
Figure 1Effects of altering the denoising distance parameter of FlowClus. The numbers of changes to the reads made during denoising using different values for the maximum allowed distance between flow values. A: Effects of changing the constant denoising value (-j). B: Effects of using different multiples (-k) of the distances based on the standard deviations of Balzer et al. [22].
Figure 2Denoising “misses” with FlowClus. As FlowClus denoises flowgrams by comparing pairs of flow values, it records the flow values of the cluster and the read each time they are judged as being distinct. The set of these “misses” for a denoising run can be visualized as a levelplot, such as those shown here. The red-orange colors represent a large number of misses at those particular pairs of cluster and read flow values. A: Denoising using a constant distance value of 0.50. B: Denoising using a multiple of five distances based on the standard deviations of Balzer et al. [22].
Comparison of the run-times (in seconds) of different denoising pipelines
|
|
|
|
| |
|---|---|---|---|---|
| Number of reads denoised | 40627 | 34592 | 31868 | 36281 |
| Filtering | 63 | 64 | 5568 | 2211 |
| Denoising | clustering: 16 | clustering: 8 | 16536 | 35796 |
| trie: 3 | trie: 2 |
*Filtering options as shown in Table 1.
**Denoising by the PyroNoise step only.
Figure 3Analyzing a mock community dataset. A comparison of the error rates (solid lines) and total sequence alignment length (dashed lines) of the Titanium mock community dataset (Quince et al. [6]) analyzed by FlowClus and AmpliconNoise.