| Literature DB >> 26635739 |
Michal Strejcek1, Qiong Wang2, Jakub Ridl3, Ondrej Uhlik1.
Abstract
Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frameshifts (FS). Genes encoding for alpha subunits of biphenyl (bphA) and benzoate (benA) dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 44% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of maximum expected error filtering and single linkage pre-clustering proved to be the most efficient read processing approach. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study or available at https://github.com/strejcem/FBdenovo. The tool was also implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/.Entities:
Keywords: FrameBot; Frameshift; amplicon sequencing; benzoate dioxygenase; biphenyl dioxygenase; functional genes
Year: 2015 PMID: 26635739 PMCID: PMC4656815 DOI: 10.3389/fmicb.2015.01267
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Frame shift corrections reported by FrameBot (FB; reference-based mode and de novo mode).
| Data treatment | FB reference-based corrected (%) | FB | FB reference-based sequences discarded (%) | FB |
|---|---|---|---|---|
| P_BenA 1.0 MEE + SLP | 1.5 | 1.4 | 0.1 | <0.1 |
| P_BenA AmpliconNoise | 1.0 | 1.0 | 0.3 | 0.3 |
| C_BenA 1.0 MEE + SLP | 9.6 | 8.8 | 1.1 | 0.5 |
| C_BenA AmpliconNoise | 9.5 | 9.7 | 2.1 | 1.3 |
| P_BphA 1.0 MEE + SLP | 2.7 | 1.9 | 10.1 | 3.4 |
| P_BphA AmpliconNoise | 4.7 | 4.5 | 12.3 | 4.7 |
| C_BphA 1.0 MEE + SLP | 0.6 | 2.4 | 41.2 | 0.4 |
| C_BphA AmpliconNoise | 0.8 | 6.0 | 43.6 | 0.6 |