Literature DB >> 25332377

Seed: a user-friendly tool for exploring and visualizing microbial community data.

Daniel Beck¹, Christopher Dennis¹, James A Foster¹.

Abstract

SUMMARY: In this article we present Simple Exploration of Ecological Data (Seed), a data exploration tool for microbial communities. Seed is written in R using the Shiny library. This provides access to powerful R-based functions and libraries through a simple user interface. Seed allows users to explore ecological datasets using principal coordinate analyses, scatter plots, bar plots, hierarchal clustering and heatmaps.
AVAILABILITY AND IMPLEMENTATION: Seed is open source and available at https://github.com/danlbek/Seed. CONTACT: danlbek@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Year: 2014 PMID： 25332377 PMCID： PMC4325548 DOI： 10.1093/bioinformatics/btu693

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The proliferation of microbial community profiling is allowing researchers to study microbial communities in new ways. Increasingly, researchers in diverse fields are asking questions relating to how microbial communities vary across samples. For example, researchers studying the human microbiome are interested in how microbial composition changes across body sites and through time (HMP Consortium ). Researchers studying disease look at how microbial communities differ between samples from healthy and unhealthy individuals (Srinivasan and Fredricks, 2009). It is now standard practice to use cultivation independent high-throughput sequencing to identify the microbial composition of many samples. This produces a wealth of data about microbial composition in many different environments and conditions. In conjunction with advances in sequencing resources, researchers have developed a number of powerful software tools to analyze and visualize this wealth of data. Packages such as mothur (Schloss ) and Qiime (Caporaso ) aggregate many tools to allow researchers to quickly and efficiently process large sequencing datasets. These currently available packages excel at performing robust, computationally intensive calculations that attempt to minimize the effects of noise and sequencing artifacts on downstream analyses. They often use a non-visual interface for analysis, even when they provide a graphical user interface for their own functions, requiring the user to know specific command and parameter combinations. While this setup is ideal for pipeline development, it is often a hindrance for data exploration. Simple Exploration of Ecological Data (Seed) fills a currently unmet need for a tool that allows researchers to quickly and easily visualize and explore the data that results from these pipelines. This so-called exploratory data analysis has an ‘important place in the toolbox of ecologists’ (Borcard ). Though there are texts that recommend specific exploratory techniques (Borcard ; Legendre and Legendre, 2012), we know of no tool such as Seed that bundles appropriate tools into an easy-to-use system for non-programmers. In this article, we present Seed, a software package that focuses on data exploration and visualization of microbial community data derived from high-throughput sequencing.

2 SEED SOFTWARE

Seed is an open-source application that allows researchers to visually explore microbial community data. It is designed to allow many different analyses and visualizations including principal component and coordinate analysis (PCA/PCoA), hierarchal clustering, scatter plots, bar plots and heatmaps. These plots allow users to visualize similarities and differences among samples and how environmental and microbial features vary across samples. Seed is written in the R programming language (R Core Team, 2013) using the Shiny framework (RStudio Inc., 2013). R is open source and available for Linux, MacOS and Windows operating systems. The use of R allows us to take advantage of the wealth of R packages available for complex analyses and visualizations. Seed is a web-based application, which may be installed locally or hosted on a remote server. When running Seed from a central server, users can access it through a web browser and are not required to install it locally. This means non-expert users can quickly and easily begin using Seed, even without local installations of R. Additionally, updates to R, Shiny, Seed and underlying packages can be done seamlessly and invisibly to the end user. The use of a web browser also provides a familiar interface to most users, allowing them to quickly and easily learn to use Seed. The user interface for seed can be seen in Figure 1.

Fig. 1.

This figure shows Seed’s simple web-based interface. The stacked bar plot shown here is based on data originally published by Ravel et al.

This figure shows Seed’s simple web-based interface. The stacked bar plot shown here is based on data originally published by Ravel et al. Currently, Seed requires two types of data, microbial abundance data and sample metadata. The microbial abundance data contain counts or abundances of each microbial taxon in each sample. The sample metadata contain information about each sample, for example the sample pH or temperature. Seed allows the user to modify the abundance data using a number of common transformations including presence/absence, relative abundance and Hellinger transformations. Seed is not limited to microbial data, though that was our primary research domain. It can be used to explore any data that include both feature counts and values for response variables. Once the user has imported and verified their dataset, they may easily explore their data with many plot types. Examples of some of the plots generated by Seed are shown in the supplementary information. Many of the plots include options to incorporate sample information by coloring points or bars according to metadata values. This allows users to easily visualize the relationship between the sample metadata and the structure of the microbial communities present in the samples. The design of Seed emphasizes simplicity over exhaustive inclusion of parameters. In many or most cases, researchers will use Seed to understand general trends in the data, which may then inform more specialized analyses. Seed is designed to quickly explore ecological datasets and to act as a hypothesis-generating tool. Publication quality figures and polished analyses are beyond the current scope of this project, though Seed can output all plots in pdf or png format. Additionally, large dataset analysis may be too slow for a comfortable user experience. Note, however, that we used published microbiome and patient data with nearly 400 samples and 250 taxa (Ravel ) on a standard laptop while preparing this publication. Seed is certainly capable of handling datasets with hundreds of samples and more than a thousand taxa. As with any software package, not all analyses have been implemented in Seed. We encourage users to also consider other visualization tools including phyloseq (McMurdie and Holmes, 2013) for analyses incorporating phylogenetic relationships and EMPeror (Vázquez-Baeza ) for PCoA analyses of very large datasets. Additionally, while Seed provides some guidance for users, tool selection and result interpretation still relies on user expertise.

3 FUTURE DIRECTIONS

Seed is freely available at https://github.com/danlbek/Seed. Development of Seed is ongoing. We are continuing to add new visualizations and to improve existing ones. Future development will focus on adding phylogenetic and taxonomic data structures, which will allow for analyses that take microbial relationships into account. We welcome user contributions to the project and encourage labs to copy and modify the code to suit their own needs.

7 in total

1. Vaginal microbiome of reproductive-age women.

Authors: Jacques Ravel; Pawel Gajer; Zaid Abdo; G Maria Schneider; Sara S K Koenig; Stacey L McCulle; Shara Karlebach; Reshma Gorle; Jennifer Russell; Carol O Tacket; Rebecca M Brotman; Catherine C Davis; Kevin Ault; Ligia Peralta; Larry J Forney
Journal: Proc Natl Acad Sci U S A Date: 2010-06-03 Impact factor: 11.205

2. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.

Authors: Patrick D Schloss; Sarah L Westcott; Thomas Ryabin; Justine R Hall; Martin Hartmann; Emily B Hollister; Ryan A Lesniewski; Brian B Oakley; Donovan H Parks; Courtney J Robinson; Jason W Sahl; Blaz Stres; Gerhard G Thallinger; David J Van Horn; Carolyn F Weber
Journal: Appl Environ Microbiol Date: 2009-10-02 Impact factor: 4.792

3. QIIME allows analysis of high-throughput community sequencing data.

Authors: J Gregory Caporaso; Justin Kuczynski; Jesse Stombaugh; Kyle Bittinger; Frederic D Bushman; Elizabeth K Costello; Noah Fierer; Antonio Gonzalez Peña; Julia K Goodrich; Jeffrey I Gordon; Gavin A Huttley; Scott T Kelley; Dan Knights; Jeremy E Koenig; Ruth E Ley; Catherine A Lozupone; Daniel McDonald; Brian D Muegge; Meg Pirrung; Jens Reeder; Joel R Sevinsky; Peter J Turnbaugh; William A Walters; Jeremy Widmann; Tanya Yatsunenko; Jesse Zaneveld; Rob Knight
Journal: Nat Methods Date: 2010-04-11 Impact factor: 28.547

4. Structure, function and diversity of the healthy human microbiome.

Authors:
Journal: Nature Date: 2012-06-13 Impact factor: 49.962

5. The human vaginal bacterial biota and bacterial vaginosis.

Authors: Sujatha Srinivasan; David N Fredricks
Journal: Interdiscip Perspect Infect Dis Date: 2009-02-16

6. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

Authors: Paul J McMurdie; Susan Holmes
Journal: PLoS One Date: 2013-04-22 Impact factor: 3.240

7. EMPeror: a tool for visualizing high-throughput microbial community data.

Authors: Yoshiki Vázquez-Baeza; Meg Pirrung; Antonio Gonzalez; Rob Knight
Journal: Gigascience Date: 2013-11-26 Impact factor: 6.524

7 in total

3 in total

1. Dynamic assessment of microbial ecology (DAME): a web app for interactive analysis and visualization of microbial sequencing data.

Authors: Brian D Piccolo; Umesh D Wankhade; Sree V Chintapalli; Sudeepa Bhattacharyya; Luo Chunqiao; Kartik Shankar
Journal: Bioinformatics Date: 2018-03-15 Impact factor: 6.937

2. Microbiome-mediated neutrophil recruitment via CXCR2 and protection from amebic colitis.

Authors: Koji Watanabe; Carol A Gilchrist; Md Jashim Uddin; Stacey L Burgess; Mayuresh M Abhyankar; Shannon N Moonah; Zannatun Noor; Jeffrey R Donowitz; Brittany N Schneider; Tuhinur Arju; Emtiaz Ahmed; Mamun Kabir; Masud Alam; Rashidul Haque; Patcharin Pramoonjago; Borna Mehrad; William A Petri
Journal: PLoS Pathog Date: 2017-08-17 Impact factor: 6.823

3. Genetically Modified Sugarcane Intercropping Soybean Impact on Rhizosphere Bacterial Communities and Co-occurrence Patterns.

Authors: Jinlian Zhang; Beilei Wei; Rushuang Wen; Yue Liu; Ziting Wang
Journal: Front Microbiol Date: 2021-12-09 Impact factor: 5.640

3 in total