Askarbek N Orakov1,2, Nazgul K Sakenova1,2, Anatoly Sorokin3,4, Igor I Goryanin1,5,6. 1. Biological Systems Unit, Okinawa Institute of Science and Technology, Onna-son 904-0412, Japan. 2. Department of Biology, School of Science and Technology, Nazarbayev University, Astana 010000, Kazakhstan. 3. Mechanism of Cell Genome Functioning Laboratory, Institute of Cell Biophysics RAS, Pushchino 142290, Russia. 4. Laboratory of Ion and Molecular Physics, Moscow Institute of Physics and Technology, Dolgoprundy, Moscow 141701, Russia. 5. School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK. 6. Biodesign Centre, Tianjin Institute of Industrial Biotechnology, Tianjin 300308, China.
Abstract
Motivation: Functional and taxonomic analyses are critical steps in understanding interspecific interactions within microbial communities. Currently, such analyses are run separately, which complicates interpretation of results. Here we present the ASAR interactive tool for simultaneous analysis of metagenomic data in three dimensions: taxonomy, function, metagenome. Results: An interactive data analysis tool for selection, aggregation and visualization of metagenomic data is presented. Functional analysis with a SEED hierarchy and pathway diagram based on KEGG orthology based upon MG-RAST annotation results is available. Availability and implementation: Source code of the ASAR is accessible at GitHub (https://github.com/Askarbek-orakov/ASAR). Contact: askarbek.orakov@nu.edu.kz or goryanin@gmail.com.
Motivation: Functional and taxonomic analyses are critical steps in understanding interspecific interactions within microbial communities. Currently, such analyses are run separately, which complicates interpretation of results. Here we present the ASAR interactive tool for simultaneous analysis of metagenomic data in three dimensions: taxonomy, function, metagenome. Results: An interactive data analysis tool for selection, aggregation and visualization of metagenomic data is presented. Functional analysis with a SEED hierarchy and pathway diagram based on KEGG orthology based upon MG-RAST annotation results is available. Availability and implementation: Source code of the ASAR is accessible at GitHub (https://github.com/Askarbek-orakov/ASAR). Contact: askarbek.orakov@nu.edu.kz or goryanin@gmail.com.
Metagenomics allows investigation of microbial communities in a culture-independent way, taking advantage of the fact that estimated 99% of prokaryotes have not been successfully cultured (Schloss and Handelsman, 2005). In addition, the decreasing cost of sequencing and the increasing throughput of metagenomic data generation make the development of tools for functional, taxonomic and metabolic analyses of metagenomes extremely important (Hugenholtz and Tyson, 2008; Lindgreen ). However, even the most useful extant metagenomic analysis tools provide either only taxonomic (Menzel ) or only functional (Westbrook ) or both, but separately (Keegan ).Although currently, annotations cannot be performed impeccably, crosslinking taxonomic and functional annotations at the read level could resolve many important questions, such as which taxonomic group in a sample is the main contributor to a particular function or metabolic pathway. Moreover, the capacity to analyze changes in microbiomes in the context of metabolic networks and to find the most interesting pathways, i.e. those most changed are the critical requirements for understanding biotechnological processes and would considerably improve analysis. These challenges have been addressed in ASAR (Advanced metagenomic Sequence Analysis in R). The core advantage of ASAR is its ability to perform taxonomic and functional analyses simultaneously, by subsetting and aggregating abundance data at various levels of taxonomic and functional hierarchies. It is designed to let researchers develop the most meaningful view of their data in a convenient way.
2 Materials and methods
The ASAR application was written in the R programming language (R Core Team, 2014) on the Shiny platform (Chang ). The application can both be used locally on machines with installed R or as a web-service.Sequencing and annotation data are combined to form a 3D data cube (Kimball, 1996) with taxonomy, function and metagenome as dimensions. Interactively applying selection and aggregation operations at a user defined level, we provide an interactive interface for drill-down analysis of metagenomic data. For analysis, we combine the ‘best hit’ functional and taxonomic classification from SEED (Overbeek, 2005) and the ‘best hit’ functional classification from KEGG orthology (Kanehisa ) provided by MG-RAST (Keegan ). At the moment, we use the annotation files from MG-RAST, but any other annotation pipeline that assigns annotations at the read level could be incorporated as well. A detailed description of data preparation is available in the Supplementary Material.
3 Results
The ASAR interface consists of a main panel with seven tabs and a control panel. The control panel provides a set of parameter selectors to control the displayed tab content. When tabs share the same set of parameter inputs, these are maintained when moving across tabs to analyze different projections of the same data subset.Users can select a color scheme for heatmaps from the RColorBrewer package (Neuwirth, 2014). All heatmaps and KEGG diagrams in the ASAR are downloadable as a high-resolution, publication quality images in PDF or PNG formats.
3.1 Three-dimensional dataset visualization
The combination of taxonomic and functional annotations in several metagenomic samples form a three-dimensional data cube, where each cell represents those reads mapped to a particular function and taxon in a particular metagenome. We have implemented the interactive tool for visual analysis of data cube contents by applying selection, aggregation and projection operations and by representing two-dimensional projections of selected subsets of the data cube as heatmaps. Taxonomic and functional dimensions are organized into a hierarchy, so the user can specify the level at which to select and aggregate data. All reads annotated with a chosen value at a selected level of a hierarchy are collected and aggregated according to their annotation at the aggregated level. In the metagenomic dimension, each metagenome is annotated with a set of user-defined metadata properties. So for this dimension, aggregation is implemented by averaging data that belongs to metagenomes with the same values at a selected property. The combination of operations described above allows precise selection of data together with concise and interpretable visualization.
3.1.1 Function versus Taxonomy heatmap
This heatmap projects a data cube along the metagenomic dimension by combining functional and taxonomic data for single metagenome. This is useful for discovering the relationship between functions and taxonomic groups in a given metagenome.
3.1.2 Function versus Metagenome heatmap
This heatmap projects a data cube along the taxonomic dimension by aggregating abundance data for a specified set of taxa. This is designed to compare abundances of functional groups in selected taxonomic groups between metagenomes.
3.1.3 Taxonomy versus Metagenome heatmap
This heatmap projects a data cube along the functional dimension by aggregating the abundances of selected functional groups. It converges to standard taxonomic analysis if the root of the functional hierarchy is selected. This heatmap is designed for exploration of taxonomic groups that differ in abundance within selected metagenomes.
3.2 KEGG pathway abundance analysis
The KEGG Pathway Abundance heatmap shows pathways with genes that differ most in abundance for selected taxonomic groups and samples. The pathway diagram can be visualized in the KEGG Diagram tab with genes color coded by the pathview package (Luo and Brouwer, 2013). The color of each enzyme on the KEGG diagram represents the percentage of enzyme abundance provided by selected taxa in each metagenome. This allows estimation of the role of selected taxa in providing this function to the whole microbial community.Click here for additional data file.
Authors: Ross Overbeek; Tadhg Begley; Ralph M Butler; Jomuna V Choudhuri; Han-Yu Chuang; Matthew Cohoon; Valérie de Crécy-Lagard; Naryttza Diaz; Terry Disz; Robert Edwards; Michael Fonstein; Ed D Frank; Svetlana Gerdes; Elizabeth M Glass; Alexander Goesmann; Andrew Hanson; Dirk Iwata-Reuyl; Roy Jensen; Neema Jamshidi; Lutz Krause; Michael Kubal; Niels Larsen; Burkhard Linke; Alice C McHardy; Folker Meyer; Heiko Neuweger; Gary Olsen; Robert Olson; Andrei Osterman; Vasiliy Portnoy; Gordon D Pusch; Dmitry A Rodionov; Christian Rückert; Jason Steiner; Rick Stevens; Ines Thiele; Olga Vassieva; Yuzhen Ye; Olga Zagnitko; Veronika Vonstein Journal: Nucleic Acids Res Date: 2005-10-07 Impact factor: 16.971
Authors: Anthony Westbrook; Jordan Ramsdell; Taruna Schuelke; Louisa Normington; R Daniel Bergeron; W Kelley Thomas; Matthew D MacManes Journal: Bioinformatics Date: 2017-05-15 Impact factor: 6.937
Authors: Bilal Wajid; Faria Anwar; Imran Wajid; Haseeb Nisar; Sharoze Meraj; Ali Zafar; Mustafa Kamal Al-Shawaqfeh; Ali Riza Ekti; Asia Khatoon; Jan S Suchodolski Journal: Funct Integr Genomics Date: 2021-10-18 Impact factor: 3.410