| Literature DB >> 26076725 |
Kishori M Konwar1, Niels W Hanson2, Maya P Bhatia1, Dongjae Kim3, Shang-Ju Wu3, Aria S Hahn1, Connor Morgan-Lang2, Hiu Kan Cheung1, Steven J Hallam4.
Abstract
UNLABELLED: Next-generation sequencing is producing vast amounts of sequence information from natural and engineered ecosystems. Although this data deluge has an enormous potential to transform our lives, knowledge creation and translation need software applications that scale with increasing data processing and analysis requirements. Here, we present improvements to MetaPathways, an annotation and analysis pipeline for environmental sequence information that expedites this transformation. We specifically address pathway prediction hazards through integration of a weighted taxonomic distance and enable quantitative comparison of assembled annotations through a normalized read-mapping measure. Additionally, we improve LAST homology searches through BLAST-equivalent E-values and output formats that are natively compatible with prevailing software applications. Finally, an updated graphical user interface allows for keyword annotation query and projection onto user-defined functional gene hierarchies, including the Carbohydrate-Active Enzyme database.Entities:
Mesh:
Year: 2015 PMID: 26076725 PMCID: PMC4595896 DOI: 10.1093/bioinformatics/btv361
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Quantitative functional and taxonomic improvements. (a) WTD provides a measure of taxonomic agreement between observed RefSeq Lowest common ancestor (LCA) taxonomy and the expected taxonomic range of predicted MetaCyc pathways, separated into the ‘High’ (Red), ‘Medium’ (Orange) and ‘Low’ (Green) taxonomic hazard classes based on negative quartile order statistics. Positive distances represent taxa found within a pathways expected taxonomic range and so have a hazard class of ‘None’ (Grey). (b) The LAST and BLAST homology search algorithms are highly correlated in terms of E-value (, P < 0.01). (c) ORF Count and the RPKM measure show a linear relationship (R2 = 0.816, P < 0.01). Ninety percent of prediction intervals, displayed as a pair of thin blue lines about the fitted line, capture ∼96.7 and 91.3% of observed points in (b) and (c), respectively. Analysis code can be found in the Supplementary information