Literature DB >> 23587322

AXIOME: automated exploration of microbial diversity.

Michael Dj Lynch1, Andre P Masella, Michael W Hall, Andrea K Bartram, Josh D Neufeld.   

Abstract

BACKGROUND: Although high-throughput sequencing of small subunit rRNA genes has revolutionized our understanding of microbial ecosystems, these technologies generate data at depths that benefit from automated analysis. Here we present AXIOME (Automation, eXtension, and Integration Of Microbial Ecology), a highly flexible and extensible management tool for popular microbial ecology analysis packages that promotes reproducibility and customization in microbial research.
FINDINGS: AXIOME streamlines and manages analysis of small subunit (SSU) rRNA marker data in QIIME and mothur. AXIOME also implements features including the PAired-eND Assembler for Illumina sequences (PANDAseq), non-negative matrix factorization (NMF), multi-response permutation procedures (MRPP), exploring and recovering phylogenetic novelty (SSUnique) and indicator species analysis. AXIOME has a companion graphical user interface (GUI) and is designed to be easily extended to facilitate customized research workflows.
CONCLUSIONS: AXIOME is an actively developed, open source project written in Vala and available from GitHub (http://neufeld.github.com/axiome) and as a Debian package. Axiometic, a GUI companion tool is also freely available (http://neufeld.github.com/axiometic). Given that data analysis has become an important bottleneck for microbial ecology studies, the development of user-friendly computational tools remains a high priority. AXIOME represents an important step in this direction by automating multi-step bioinformatic analyses and enabling the customization of procedures to suit the diverse research needs of the microbial ecology community.

Entities:  

Year:  2013        PMID: 23587322      PMCID: PMC3626533          DOI: 10.1186/2047-217X-2-3

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Findings

Rationale

Next-generation sequencing technologies have improved our ability to study complex microbial communities, but have also posed significant computational challenges associated with analyzing such large sequence datasets. The research community has developed multiple analysis platforms [1-4] to manage analysis of taxonomic high-throughput sequencing data, particularly for the study of microbial small subunit (SSU) rRNA sequence data. As implemented, these pipelines require extensive user intervention and are not particularly well suited to extension. To address these issues, we developed the Automation, eXtension, and Integration Of Microbial Ecology (AXIOME) package and the associated graphical user interface, Axiometic. AXIOME simplifies analyses common to native installs of multiple analysis platforms by using XML scripting and configuration. Generating AXIOME’s XML input using Axiometic, the companion GUI, simplifies scripting and further increases usability. AXIOME also extends functionality by offering several additional analytical plug-ins and easily enables the implementation of user-specific functionality for customized workflows.

Functionality

Automation and checkpointing and reproducible research

Analysis environments, such as QIIME and mothur, have provided open source and effective tools for the analysis of high-throughput marker sequencing data (e.g., SSU rRNA). These environments are not easily automated beyond shell scripting, which is available for essentially any software installed on a Unix/Linux-based operating system. While effective, shell scripting can present a significant barrier to researchers using the software. AXIOME avoids this difficulty by interpreting commands from an XML-based input file containing simplified instructions and analysis blocks. This XML-based configuration file can be created manually, from workflow templates, or using the simple and interactive GUI tool, Axiometic. Analysis pipelines for high-throughput marker data can consume many CPU hours due to dataset size and analysis complexity. Workflows can be interrupted for a myriad of reasons, including power failure and software error. Furthermore, analytical parameters can be modified or analyses added based on preliminary results. Built into AXIOME, using the make software package, is the ability to automatically restart a workflow from the last valid position, which does not unnecessarily reproduce previous valid results. This behavior can significantly reduce computational load and errors caused by excessive user intervention. Efforts such as metadata standardization [5] and bioinformatic tools stressing reproducibility [6,7] represent attempts to make bioinformatic workflows documented and reproducible, thus facilitating collaboration. The use of XML-based instructions defining an entire AXIOME workflow has the added benefit of contributing to these reproducible research initiatives. This is a useful addition to both individual researchers and to the collaboration of different investigators, and can be packaged with research publications.

Extension

AXIOME manages QIIME [2] analyses and supports sequence processing and α-diversity in mothur [1]. In addition to offering common α- and β-diversity measures, there are several functions specific to AXIOME v.1.6. AXIOME enables the assembly and de-multiplexing of Illumina paired-end reads through the use of PAired-eND Assembler for Illumina sequences (PANDAseq [8]). Post-assembly analysis techniques unique to AXIOME include (i) non-negative matrix factorization [9], a technique that identifies overlapping patterns between samples, generating a concordance model, that can then be used for a non-negative factorization of the sample matrix (used to visualize the importance of specific taxa within a sample or a cluster of samples), (ii) multi-response permutation procedures (MRPP [10]), which tests for significant differences between sampling groups and the degree of within-group sample clustering, (iii) recovering and exploring phylogenetic novelty (SSUnique [11]) and (iv) indicator species analysis [12], a method for identifying operational taxonomic units (OTUs) that are significantly associated with user-defined sample treatment groups. Future releases of AXIOME will continue to incorporate analysis techniques reflecting advances in the analysis of marker data in ecology. One advantage of the AXIOME package is the ability to quickly extend functionality to include new protocols, easily customizing research workflows. This also provides the opportunity to test alternative approaches to sequence analysis before implementing them in standard distributions of software environments, such as QIIME or mothur. In AXIOME, individual analyses are managed through the corresponding XML tag within the configuration file, and custom XML tags that invoke novel analyses can be built into the source to extend AXIOME. Full instructions and templates for extension are provided with the source and documentation. By facilitating extension of analysis pipelines, the implementation of previously existing ecological methods and the development of novel techniques can progress at a faster rate than through standard release schedules. This will foster experimentation and increase community involvement in these efforts.

AXIOME workflow

All user-defined workflow analyses and parameters are outlined in a single XML configuration file, which is processed by AXIOME, generating a Makefile that controls and manages analyses. By leveraging these tools, an entire run of QIIME and all requested extensions requires only three steps (Figure 1). Additionally, any interruption in processing can be circumvented by re-running the make command, which will act as a checkpoint by restarting from the last valid position. Furthermore, a companion GUI, Axiometic (Figure 2), allows users to easily construct this XML configuration file. Even though AXIOME is designed to work within a Linux environment, the XML configuration file can be generated on any system and transferred to the analysis environment. Axiometic is based on a platform-independent toolkit, further facilitating XML script generation.
Figure 1

Schematic representation of the AXIOME workflow and its relation to existing QIIME analyses.

Figure 2

Axiometic, a multi-platform companion GUI for AXIOME with four steps defining: metadata associated with samples, the data source files, expressions for sorting samples within source file and the analyses for the AXIOME pipeline to run. The resulting XML control file is interpreted by AXIOME to run analyses. Axiometic is available for Linux, Windows and is in development for Mac OS X.

Schematic representation of the AXIOME workflow and its relation to existing QIIME analyses. Axiometic, a multi-platform companion GUI for AXIOME with four steps defining: metadata associated with samples, the data source files, expressions for sorting samples within source file and the analyses for the AXIOME pipeline to run. The resulting XML control file is interpreted by AXIOME to run analyses. Axiometic is available for Linux, Windows and is in development for Mac OS X. Distributed with the source is a sample analysis, including instructions, XML workflow script, data files and expected output. The workflow management of AXIOME fosters reproducible, predictable and automated analyses, which are challenging goals given the large sequence datasets now being generated. AXIOME adds to the expanding computational toolbox supporting the next generation of research efforts in microbial ecology.

Comparison to related work

The Galaxy project [7,13,14] is a web-based analysis management and distribution system for biomedical research. Currently, analysis of SSU rRNA marker data within Galaxy can be accomplished through a mothur v.1.27.0 [1] implementation. Additionally, a QIIME [2] wrapper (QIIME-Galaxy) is under development for Galaxy. Within Galaxy, these implementations are intended to provide a managed workflow, either locally or within the cloud, which will be a suitable solution for many researchers. Besides analysis techniques specific to AXIOME, our software offers some features not present in either Galaxy-associated packages. For example, AXIOME uses a package manager, greatly simplifying installation (apt-get install). Furthermore, AXIOME excels at local management and extension of marker gene workflows with modular XML scripting and checkpointing, allowing for rapid exploration of parameter and analysis modifications. When marker gene workflows are more fully integrated into Galaxy, we envision AXIOME as a complementary system for workflow management and extension. To actively contribute to the open source analysis community, we are in the process of contributing to Galaxy the various analysis routines specific to AXIOME (Figure 1).

Availability of AXIOME

AXIOME is an actively developed, open source project written in Vala and available from GitHub (http://neufeld.github.com/axiome) and also as a Debian package, serving as a companion to any native QIIME (v.1.4 and above) or mothur install. Specific details of the design and implementation of AXIOME are packaged with the source, including detailed explanations of tool development and the relationship between files in the repository. Axiometic, a GUI companion tool for easily generating AXIOME XML analysis instructions, is also freely available (http://neufeld.github.com/axiometic). AXIOME is compatible with the .biom universal data file format [15] and was designed to work within a Linux environment, including suitable cloud-computing infrastructures such as Amazon EC2. A comprehensive manual, dependencies list, tutorial and sample data are also provided. Given that data analysis has become an important bottleneck for the effectiveness of microbial ecology studies, the development of user-friendly computational tools remains a high priority. AXIOME represents an important step in this direction by automating multi-step bioinformatic analyses and enabling the customization of procedures to suit the diverse research needs of the microbial ecology community.

Availability and requirements

Project name: AXIOME, Axiometic Project home page:http://neufeld.github.com/axiome, http://neufeld.github.com/axiometic Operating system: Linux (AXIOME), Platform independent (Axiometic) Programming language: C, Vala, R, Python Other requirements: QIIME (and dependencies therein), make, awk, mothur (optional), PANDAseq (optional); see documentation for a comprehensive list of optional dependencies, based on workflow requirements. License: GPL v3 Any restrictions to use by non-academics: No

Abbreviations

AXIOME: Automation, eXtension, and Integration Of Microbial Ecology; CPU: Central processing unit; GUI: Graphical user interface; MRPP: Multi-response permutation procedures; NMF: Non-negative matrix factorization; OTU: Operational taxonomic unit; PANDAseq: PAired-eND assembler for Illumina sequences; QIIME: Quantitative insights into microbial ecology; SSU: Small subunit

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MDJL contributed to the design of AXIOME and manuscript preparation. APM designed and implemented AXIOME. MWH contributed to AXIOME and implemented Axiometic. AKB contributed to testing of AXIOME. JDN contributed to the design and coordination of the project, testing of AXIOME and manuscript preparation. All authors read and approved the final manuscript.
  13 in total

1.  Galaxy: a platform for interactive large-scale genome analysis.

Authors:  Belinda Giardine; Cathy Riemer; Ross C Hardison; Richard Burhans; Laura Elnitski; Prachi Shah; Yi Zhang; Daniel Blankenberg; Istvan Albert; James Taylor; Webb Miller; W James Kent; Anton Nekrutenko
Journal:  Genome Res       Date:  2005-09-16       Impact factor: 9.043

2.  A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data.

Authors:  Xingpeng Jiang; Joshua S Weitz; Jonathan Dushoff
Journal:  J Math Biol       Date:  2011-06-01       Impact factor: 2.259

3.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

Authors:  Pelin Yilmaz; Renzo Kottmann; Dawn Field; Rob Knight; James R Cole; Linda Amaral-Zettler; Jack A Gilbert; Ilene Karsch-Mizrachi; Anjanette Johnston; Guy Cochrane; Robert Vaughan; Christopher Hunter; Joonhong Park; Norman Morrison; Philippe Rocca-Serra; Peter Sterk; Manimozhiyan Arumugam; Mark Bailey; Laura Baumgartner; Bruce W Birren; Martin J Blaser; Vivien Bonazzi; Tim Booth; Peer Bork; Frederic D Bushman; Pier Luigi Buttigieg; Patrick S G Chain; Emily Charlson; Elizabeth K Costello; Heather Huot-Creasy; Peter Dawyndt; Todd DeSantis; Noah Fierer; Jed A Fuhrman; Rachel E Gallery; Dirk Gevers; Richard A Gibbs; Inigo San Gil; Antonio Gonzalez; Jeffrey I Gordon; Robert Guralnick; Wolfgang Hankeln; Sarah Highlander; Philip Hugenholtz; Janet Jansson; Andrew L Kau; Scott T Kelley; Jerry Kennedy; Dan Knights; Omry Koren; Justin Kuczynski; Nikos Kyrpides; Robert Larsen; Christian L Lauber; Teresa Legg; Ruth E Ley; Catherine A Lozupone; Wolfgang Ludwig; Donna Lyons; Eamonn Maguire; Barbara A Methé; Folker Meyer; Brian Muegge; Sara Nakielny; Karen E Nelson; Diana Nemergut; Josh D Neufeld; Lindsay K Newbold; Anna E Oliver; Norman R Pace; Giriprakash Palanisamy; Jörg Peplies; Joseph Petrosino; Lita Proctor; Elmar Pruesse; Christian Quast; Jeroen Raes; Sujeevan Ratnasingham; Jacques Ravel; David A Relman; Susanna Assunta-Sansone; Patrick D Schloss; Lynn Schriml; Rohini Sinha; Michelle I Smith; Erica Sodergren; Aymé Spo; Jesse Stombaugh; James M Tiedje; Doyle V Ward; George M Weinstock; Doug Wendel; Owen White; Andrew Whiteley; Andreas Wilke; Jennifer R Wortman; Tanya Yatsunenko; Frank Oliver Glöckner
Journal:  Nat Biotechnol       Date:  2011-05       Impact factor: 54.908

4.  Phyloseq: a bioconductor package for handling and analysis of high-throughput phylogenetic sequence data.

Authors:  Paul J McMurdie; Susan Holmes
Journal:  Pac Symp Biocomput       Date:  2012

5.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

6.  QIIME allows analysis of high-throughput community sequencing data.

Authors:  J Gregory Caporaso; Justin Kuczynski; Jesse Stombaugh; Kyle Bittinger; Frederic D Bushman; Elizabeth K Costello; Noah Fierer; Antonio Gonzalez Peña; Julia K Goodrich; Jeffrey I Gordon; Gavin A Huttley; Scott T Kelley; Dan Knights; Jeremy E Koenig; Ruth E Ley; Catherine A Lozupone; Daniel McDonald; Brian D Muegge; Meg Pirrung; Jens Reeder; Joel R Sevinsky; Peter J Turnbaugh; William A Walters; Jeremy Widmann; Tanya Yatsunenko; Jesse Zaneveld; Rob Knight
Journal:  Nat Methods       Date:  2010-04-11       Impact factor: 28.547

7.  PANDAseq: paired-end assembler for illumina sequences.

Authors:  Andre P Masella; Andrea K Bartram; Jakub M Truszkowski; Daniel G Brown; Josh D Neufeld
Journal:  BMC Bioinformatics       Date:  2012-02-14       Impact factor: 3.169

8.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

9.  Targeted recovery of novel phylogenetic diversity from next-generation sequence data.

Authors:  Michael D J Lynch; Andrea K Bartram; Josh D Neufeld
Journal:  ISME J       Date:  2012-07-12       Impact factor: 10.302

10.  The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.

Authors:  Daniel McDonald; Jose C Clemente; Justin Kuczynski; Jai Ram Rideout; Jesse Stombaugh; Doug Wendel; Andreas Wilke; Susan Huse; John Hufnagle; Folker Meyer; Rob Knight; J Gregory Caporaso
Journal:  Gigascience       Date:  2012-07-12       Impact factor: 6.524

View more
  21 in total

1.  Temporal Variations of Microbiota Associated with the Immature Stages of Two Florida Culex Mosquito Vectors.

Authors:  Dagne Duguma; Michael W Hall; Chelsea T Smartt; Josh D Neufeld
Journal:  Microb Ecol       Date:  2017-05-11       Impact factor: 4.552

2.  Evaluating bias of illumina-based bacterial 16S rRNA gene profiles.

Authors:  Katherine Kennedy; Michael W Hall; Michael D J Lynch; Gabriel Moreno-Hagelsieb; Josh D Neufeld
Journal:  Appl Environ Microbiol       Date:  2014-07-07       Impact factor: 4.792

3.  Evaluating primers for profiling anaerobic ammonia oxidizing bacteria within freshwater environments.

Authors:  Puntipar Sonthiphand; Josh D Neufeld
Journal:  PLoS One       Date:  2013-03-07       Impact factor: 3.240

4.  Developmental succession of the microbiome of Culex mosquitoes.

Authors:  Dagne Duguma; Michael W Hall; Paul Rugman-Jones; Richard Stouthamer; Olle Terenius; Josh D Neufeld; William E Walton
Journal:  BMC Microbiol       Date:  2015-07-24       Impact factor: 3.605

5.  Biogeography of anaerobic ammonia-oxidizing (anammox) bacteria.

Authors:  Puntipar Sonthiphand; Michael W Hall; Josh D Neufeld
Journal:  Front Microbiol       Date:  2014-08-06       Impact factor: 5.640

6.  Multisubstrate isotope labeling and metagenomic analysis of active soil bacterial communities.

Authors:  Y Verastegui; J Cheng; K Engel; D Kolczynski; S Mortimer; J Lavigne; J Montalibet; T Romantsov; M Hall; B J McConkey; D R Rose; J J Tomashek; B R Scott; T C Charles; J D Neufeld
Journal:  MBio       Date:  2014-07-15       Impact factor: 7.867

7.  Culture-independence for surveillance and epidemiology.

Authors:  Benjamin C Kirkup
Journal:  Pathogens       Date:  2013-09-24

8.  Microbial biogeography of a university campus.

Authors:  Ashley A Ross; Josh D Neufeld
Journal:  Microbiome       Date:  2015-12-01       Impact factor: 14.650

9.  Bacterial communities associated with culex mosquito larvae and two emergent aquatic plants of bioremediation importance.

Authors:  Dagne Duguma; Paul Rugman-Jones; Michael G Kaufman; Michael W Hall; Josh D Neufeld; Richard Stouthamer; William E Walton
Journal:  PLoS One       Date:  2013-08-15       Impact factor: 3.240

Review 10.  Metagenomics: Retrospect and Prospects in High Throughput Age.

Authors:  Satish Kumar; Kishore Kumar Krishnani; Bharat Bhushan; Manoj Pandit Brahmane
Journal:  Biotechnol Res Int       Date:  2015-11-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.