Literature DB >> 29211828

ASAR: visual analysis of metagenomes in R.

Askarbek N Orakov^1,2, Nazgul K Sakenova^1,2, Anatoly Sorokin^3,4, Igor I Goryanin^1,5,6.

Abstract

Motivation: Functional and taxonomic analyses are critical steps in understanding interspecific interactions within microbial communities. Currently, such analyses are run separately, which complicates interpretation of results. Here we present the ASAR interactive tool for simultaneous analysis of metagenomic data in three dimensions: taxonomy, function, metagenome.
Results: An interactive data analysis tool for selection, aggregation and visualization of metagenomic data is presented. Functional analysis with a SEED hierarchy and pathway diagram based on KEGG orthology based upon MG-RAST annotation results is available. Availability and implementation: Source code of the ASAR is accessible at GitHub (https://github.com/Askarbek-orakov/ASAR). Contact: askarbek.orakov@nu.edu.kz or goryanin@gmail.com.

Entities: Species

Mesh：

Year: 2018 PMID： 29211828 PMCID： PMC5905653 DOI： 10.1093/bioinformatics/btx775

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Metagenomics allows investigation of microbial communities in a culture-independent way, taking advantage of the fact that estimated 99% of prokaryotes have not been successfully cultured (Schloss and Handelsman, 2005). In addition, the decreasing cost of sequencing and the increasing throughput of metagenomic data generation make the development of tools for functional, taxonomic and metabolic analyses of metagenomes extremely important (Hugenholtz and Tyson, 2008; Lindgreen ). However, even the most useful extant metagenomic analysis tools provide either only taxonomic (Menzel ) or only functional (Westbrook ) or both, but separately (Keegan ). Although currently, annotations cannot be performed impeccably, crosslinking taxonomic and functional annotations at the read level could resolve many important questions, such as which taxonomic group in a sample is the main contributor to a particular function or metabolic pathway. Moreover, the capacity to analyze changes in microbiomes in the context of metabolic networks and to find the most interesting pathways, i.e. those most changed are the critical requirements for understanding biotechnological processes and would considerably improve analysis. These challenges have been addressed in ASAR (Advanced metagenomic Sequence Analysis in R). The core advantage of ASAR is its ability to perform taxonomic and functional analyses simultaneously, by subsetting and aggregating abundance data at various levels of taxonomic and functional hierarchies. It is designed to let researchers develop the most meaningful view of their data in a convenient way.

2 Materials and methods

The ASAR application was written in the R programming language (R Core Team, 2014) on the Shiny platform (Chang ). The application can both be used locally on machines with installed R or as a web-service. Sequencing and annotation data are combined to form a 3D data cube (Kimball, 1996) with taxonomy, function and metagenome as dimensions. Interactively applying selection and aggregation operations at a user defined level, we provide an interactive interface for drill-down analysis of metagenomic data. For analysis, we combine the ‘best hit’ functional and taxonomic classification from SEED (Overbeek, 2005) and the ‘best hit’ functional classification from KEGG orthology (Kanehisa ) provided by MG-RAST (Keegan ). At the moment, we use the annotation files from MG-RAST, but any other annotation pipeline that assigns annotations at the read level could be incorporated as well. A detailed description of data preparation is available in the Supplementary Material.

3 Results

The ASAR interface consists of a main panel with seven tabs and a control panel. The control panel provides a set of parameter selectors to control the displayed tab content. When tabs share the same set of parameter inputs, these are maintained when moving across tabs to analyze different projections of the same data subset. Users can select a color scheme for heatmaps from the RColorBrewer package (Neuwirth, 2014). All heatmaps and KEGG diagrams in the ASAR are downloadable as a high-resolution, publication quality images in PDF or PNG formats.

3.1 Three-dimensional dataset visualization

The combination of taxonomic and functional annotations in several metagenomic samples form a three-dimensional data cube, where each cell represents those reads mapped to a particular function and taxon in a particular metagenome. We have implemented the interactive tool for visual analysis of data cube contents by applying selection, aggregation and projection operations and by representing two-dimensional projections of selected subsets of the data cube as heatmaps. Taxonomic and functional dimensions are organized into a hierarchy, so the user can specify the level at which to select and aggregate data. All reads annotated with a chosen value at a selected level of a hierarchy are collected and aggregated according to their annotation at the aggregated level. In the metagenomic dimension, each metagenome is annotated with a set of user-defined metadata properties. So for this dimension, aggregation is implemented by averaging data that belongs to metagenomes with the same values at a selected property. The combination of operations described above allows precise selection of data together with concise and interpretable visualization.

3.1.1 Function versus Taxonomy heatmap

This heatmap projects a data cube along the metagenomic dimension by combining functional and taxonomic data for single metagenome. This is useful for discovering the relationship between functions and taxonomic groups in a given metagenome.

3.1.2 Function versus Metagenome heatmap

This heatmap projects a data cube along the taxonomic dimension by aggregating abundance data for a specified set of taxa. This is designed to compare abundances of functional groups in selected taxonomic groups between metagenomes.

3.1.3 Taxonomy versus Metagenome heatmap

This heatmap projects a data cube along the functional dimension by aggregating the abundances of selected functional groups. It converges to standard taxonomic analysis if the root of the functional hierarchy is selected. This heatmap is designed for exploration of taxonomic groups that differ in abundance within selected metagenomes.

3.2 KEGG pathway abundance analysis

The KEGG Pathway Abundance heatmap shows pathways with genes that differ most in abundance for selected taxonomic groups and samples. The pathway diagram can be visualized in the KEGG Diagram tab with genes color coded by the pathview package (Luo and Brouwer, 2013). The color of each enzyme on the KEGG diagram represents the percentage of enzyme abundance provided by selected taxa in each metagenome. This allows estimation of the role of selected taxa in providing this function to the whole microbial community. Click here for additional data file.

9 in total

1. Microbiology: metagenomics.

Authors: Philip Hugenholtz; Gene W Tyson
Journal: Nature Date: 2008-09-25 Impact factor: 49.962

2. MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.

Authors: Kevin P Keegan; Elizabeth M Glass; Folker Meyer
Journal: Methods Mol Biol Date: 2016

Review 3. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot.

Authors: Patrick D Schloss; Jo Handelsman
Journal: Genome Biol Date: 2005-08-01 Impact factor: 13.583

4. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.

Authors: Ross Overbeek; Tadhg Begley; Ralph M Butler; Jomuna V Choudhuri; Han-Yu Chuang; Matthew Cohoon; Valérie de Crécy-Lagard; Naryttza Diaz; Terry Disz; Robert Edwards; Michael Fonstein; Ed D Frank; Svetlana Gerdes; Elizabeth M Glass; Alexander Goesmann; Andrew Hanson; Dirk Iwata-Reuyl; Roy Jensen; Neema Jamshidi; Lutz Krause; Michael Kubal; Niels Larsen; Burkhard Linke; Alice C McHardy; Folker Meyer; Heiko Neuweger; Gary Olsen; Robert Olson; Andrei Osterman; Vasiliy Portnoy; Gordon D Pusch; Dmitry A Rodionov; Christian Rückert; Jason Steiner; Rick Stevens; Ines Thiele; Olga Vassieva; Yuzhen Ye; Olga Zagnitko; Veronika Vonstein
Journal: Nucleic Acids Res Date: 2005-10-07 Impact factor: 16.971

5. Pathview: an R/Bioconductor package for pathway-based data integration and visualization.

Authors: Weijun Luo; Cory Brouwer
Journal: Bioinformatics Date: 2013-06-04 Impact factor: 6.937

6. KEGG as a reference resource for gene and protein annotation.

Authors: Minoru Kanehisa; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal: Nucleic Acids Res Date: 2015-10-17 Impact factor: 16.971

7. An evaluation of the accuracy and speed of metagenome analysis tools.

Authors: Stinus Lindgreen; Karen L Adair; Paul P Gardner
Journal: Sci Rep Date: 2016-01-18 Impact factor: 4.379

8. Fast and sensitive taxonomic classification for metagenomics with Kaiju.

Authors: Peter Menzel; Kim Lee Ng; Anders Krogh
Journal: Nat Commun Date: 2016-04-13 Impact factor: 14.919

9. PALADIN: protein alignment for functional profiling whole metagenome shotgun data.

Authors: Anthony Westbrook; Jordan Ramsdell; Taruna Schuelke; Louisa Normington; R Daniel Bergeron; W Kelley Thomas; Matthew D MacManes
Journal: Bioinformatics Date: 2017-05-15 Impact factor: 6.937