Literature DB >> 34479875

Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes.

George Armstrong^1,2,3, Kalen Cantrell², Shi Huang^1,2, Daniel McDonald¹, Niina Haiminen⁴, Anna Paola Carrieri⁵, Qiyun Zhu^6,7, Antonio Gonzalez¹, Imran McGrath^2,8, Kristen L Beck⁹, Daniel Hakim^1,3, Aki S Havulinna^10,11, Guillaume Méric^12,13, Teemu Niiranen^10,14,15, Leo Lahti¹⁶, Veikko Salomaa¹⁰, Mohit Jain^2,17,18, Michael Inouye^12,19, Austin D Swafford², Ho-Cheol Kim⁹, Laxmi Parida⁴, Yoshiki Vázquez-Baeza², Rob Knight^1,2,20,21.

Abstract

The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34479875 PMCID： PMC8559715 DOI： 10.1101/gr.275777.121

Source DB: PubMed Journal: Genome Res ISSN： 1088-9051 Impact factor: 9.043

In microbiome research, particular attention is given to evaluating the diversity of microbes within samples (The Human Microbiome Project Consortium 2012; Thompson et al. 2017; McDonald et al. 2018a). Alpha diversity (within sample diversity) represents a family of summary statistics that can summarize the breadth of diversity present in an environment. More recently, many examples have been reported on the associations between various host or environmental factors and alpha diversity of microbiomes, including country and diet in human guts (McDonald et al. 2018a), disease status in humans and canines (Gevers et al. 2014; Vázquez-Baeza et al. 2016), and the pH (Lauber et al. 2009), salinity (Thompson et al. 2017), and temperature (Zhou et al. 2016) of soils, among many others (Jeffery et al. 2016; Youngblut et al. 2019). A popular metric that accounts for the phylogenetic relatedness of the community members, Faith's phylogenetic diversity (Faith's PD) (Faith 1992), has been noted to be more sensitive in distinguishing disease factors in the human digestive system, relative to other alpha diversity indices (Scherson and Faith 2018; Youngblut et al. 2021). Modern DNA sequencing instruments have enabled microbiome studies at the scale of tens of thousands of samples, which presents a computational challenge for metrics that rely on a phylogeny, such as Faith's PD. This metric is computed by summing the branch lengths (edge weights) of the phylogeny that exclusively represents the sequences contained in a biological sample. The amount of memory and number of necessary operations needed to calculate Faith's PD depends on the number of edges in the phylogenetic tree, as well as the number of samples in the underlying data table. In today's increasingly large and sparse data sets and meta-analyses, these phylogenetic trees and tables can exceed hundreds of thousands of samples and millions of tree tips (McDonald et al. 2018b). Recent advances have enabled efficient computation of the UniFrac metric for beta diversity. UniFrac is also a metric computed over phylogenetic trees (Lozupone and Knight 2005) and is mathematically related to Faith's PD (Faith et al. 2009). Specifically, Striped UniFrac (McDonald et al. 2018b) improves upon previous UniFrac implementations (Hamady et al. 2010) by using space- and time-efficient tree data structures (Cordova and Navarro 2016) and reducing the number of vectors required to store intermediate scores in the tree. Additionally, the usefulness of techniques like Faith's PD and UniFrac remains underexplored for metagenomics sequencing. Recent molecular protocol optimizations, such as SHOGUN (Hillmann et al. 2018), have enabled the metagenomic characterization of large human cohorts (Borodulin et al. 2015; Kaplan et al. 2019; Salosensaari et al. 2021). In this context, the applicability of Faith's PD has largely been limited by the technical difficulties associated with constructing phylogenies from metagenomic features (Zhu et al. 2019). Efforts like the Web of Life (WoL) (Zhu et al. 2019) and Genome Taxonomy Database (GTDB) (Parks et al. 2018, 2020) are now addressing this issue by providing a phylogenomic tree as part of their database releases that can be used for phylogeny-informed analysis. Motivated by these advances in algorithms and resources for analyzing phylogenies, phylogenomic trees, and sparse data, we developed a new algorithm and implementation, stacked Faith's phylogenetic diversity (SFPhD), for rapidly computing Faith's PD. Additionally, we aim to demonstrate concrete benefits of phylogeny-informed analysis in metagenomic studies where this metric is less frequently used.

Results

SFPhD is a new implementation for calculating Faith's PD. The key advances of SFPhD are using a sparse matrix representation, an efficient tree structure, and partial aggregation of metric constituents. Our BSD-licensed implementation of this algorithm is available in the “unifrac” package (via PyPI and bioconda;Grüning et al. 2018), which has 57,007 total conda downloads and 40,434 conda downloads since the introduction of SFPhD, as of the time of writing (August 28, 2021). The package produces a C/C++ shared library with Python bindings and is additionally linkable by any programming language (https://github.com/biocore/unifrac). Additionally, by investigating the previously documented relationship between age and bacterial richness of the gut microbiome (de la Cuesta-Zuluaga et al. 2019), we demonstrate that accounting for phylogeny in metagenomic data can increase the statistical power for detecting group differences (Supplemental Code).

Stacked Faith's PD provides a faster and memory-efficient implementation over the previous state-of-the-art algorithm

SFPhD uses the structure of microbiome data along with other practical considerations to achieve decreased time and memory requirements. An example feature table is shown in Figure 1A, with a corresponding phylogenetic tree in Figure 1B. Note that, for a given tree , Faith's PD can be expressed as where PDi is Faith's PD for sample i, I indicates if sample i has any features that descend from node j, and branchLenj () indicates the length of the branch to node j in the tree .

Figure 1.

Partially aggregating branch lengths reduces the space complexity of the algorithm. (A) Faith's PD calculation depends on the representation of features present in samples. In the table, the letters (R, O, B, K) represent samples and the numbers (0, 1, 2, 4, 6, 9, 10) represent features. A “1” in an entry indicates the presence of a feature in the sample. SFPhD uses sparse table data structures, which reduce memory by only keeping track of the nonzero values in a matrix (highlighted in gray). (B) A mock reference phylogenetic tree is shown, with the features from A as tips. Labels for the samples from A are located next to tips that they contain. The nodes are labeled by their order in a postorder traversal of the tree. (C) Graphic depiction of the reference implementation's calculation of Faith's PD by first aggregating the presence/absence information for each branch in the tree, followed by multiplication by the branch lengths to get the metric constituents, and finally a sum over the entire branch × metric constituent table. (D) Graphic representation of the execution of SFPhD. On the left, the stack of presence/absence information is shown at three points during the algorithm's execution (i, ii, iii). Each of these times shows the stack immediately before memory is freed. On the right, the state of the partially aggregated phylogenetic diversity (PD) is shown after each node is added to the stack. Each row represents the vector after a step in the algorithm. In practice, there is only one such vector. (E) The balanced parentheses’ representation for the phylogenetic tree from B. The previous state-of-the-art reference implementation (scikit-bio, http://scikit-bio.org/) computes Faith's PD for a batch of samples by first fully computing I. I is computed by traversing the entire phylogenetic tree in a postorder traversal, where the children of a node must be visited before the node itself can be visited (the nodes in Fig. 1B are labeled in the order of a postorder traversal). During the traversal, when a given node I is visited, all j are set by determining the features present in all children of node j. Subsequently, the for all branches is calculated. The results are obtained by summing over the branches for each sample (Fig. 1C). However, this approach tends to use much more space than is needed. Microbiome data are known to be sparse (Morton et al. 2017; Kumar et al. 2018; Martino et al. 2019), that is, of the entries in a data table, many are likely to be zero. This issue is exacerbated in large data sets, where many microbes are only observed in a handful of samples. In an extreme case, such a table (McDonald et al. 2018b), with 113,721 samples rarefied at 500 sequences per sample, has only 0.0126% nonzero entries. Sparse representations have been used previously for storing microbiome data (McDonald et al. 2012a) and have been applied for accelerating microbiome analyses (McDonald et al. 2018b), but they have not been previously applied to Faith's PD. We identified that a major downfall of the state-of-the-art implementation in scikit-bio is that it uses a full, dense table to represent all of I in memory at once. A key advancement of our approach is the use of a sparse matrix implementation for storing information on the taxa present for each sample and feature. Sparse matrices save space by only retaining information about positions in the matrix that have nonzero values (e.g., only the gray values in Fig. 1A and information about their positions are retained by a sparse matrix). Another key advance is the partial aggregation of Faith's PD (Fig. 1D). Note that , which we will call a metric constituent, can be added in any order and that I only depends on the children of node j. Thus, if node k is a child of node j, I is no longer needed once metric constituents for node k have been computed and I is known. As a result, we can reduce the memory used to store I by traversing the phylogeny with a postorder traversal and freeing I after they are no longer needed. Furthermore, we can reduce the storage needed for the metric constituents by keeping a running summation of them while traversing the tree. Thus, this approach reduces the expected space complexity for storing the metrics from O(nk) to O(n log[k]), where n is the number of samples and k is the number of vertices in the tree. In addition to the algorithmic improvements, we have included several practical enhancements that improve the performance of the code. The topology of the phylogenetic tree (Fig. 1B) is now represented as balanced-parentheses vector (Fig. 1E) that corresponds to additional vectors of branch lengths and node names; this structure has a lower memory footprint and a sequential memory representation which reduces the number of cache misses during a tree traversal (Cordova and Navarro 2016). Finally, the software is written using C/C++ (with Python extensions using Cython; https://cython.org/) and builds upon the foundation established by Striped UniFrac (McDonald et al. 2018b). Reuse of this library facilitated our access to a much faster Newick format parser, which reduces the overhead when reading a tree from disk. These factors make for an improved expected and in-practice performance, despite the time complexity and worst-case memory complexity remaining the same. To demonstrate the scalability of SFPhD, we used a collection of 307,237 public and anonymized private 16S rRNA V4 microbiome samples amounting to 1,264,796 phylogenetic tree tips (after rarefaction at 500 sequences per sample). The samples were retrieved using the redbiom command line interface (McDonald et al. 2019) which queried a cache of public and anonymized private studies available in Qiita (Gonzalez et al. 2018). Amplicon sequence variants (ASVs) were placed into the Greengenes (DeSantis et al. 2006; McDonald et al. 2012b; Gonzalez et al. 2018) phylogeny using SEPP (Mirarab et al. 2012). Computing the full alpha diversity vector took SFPhD 1 h and 5 min wall-clock time and required a maximum resident set size of less than 3 GB (see Methods for hardware details). In addition, we iteratively measured runtime and memory consumption for increasingly large random subsets of samples while fixing the size of the tree at 100,000 tips (Fig. 2A,B; Supplemental Table S1). For the iteration with 20,000 samples, the memory usage of the reference implementation exceeded 150 GB and the process ran for over 15 min. Contrastingly, with SFPhD, the process took 14 sec to execute and required less than 0.5 GB of memory. Additionally, using Green Algorithms (Lannelongue et al. 2021), we estimated the carbon footprint of the scikit-bio reference implementation on the 20,000 sample table to be 12.84 g CO2e, whereas we estimated the carbon footprint of SFPhD would be 0.04 g CO2e in the United States, which is a 321-fold reduction in impact on global warming.

Figure 2.

SFPhD outperforms the reference implementation in terms of runtime and memory usage. (A) Runtime in seconds for computing Faith's PD on data sets with thousands of samples and 100,000 tips in the phylogeny. Data are independently subsampled from a collection of 113,721 public samples in Qiita (Gonzalez et al. 2018; Zhu et al. 2019) as previously processed (McDonald et al. 2018b). Mean of n = 10 repetitions with 95% CI error bars. (B) Memory usage for the same experiment as in A. For both A and B, jobs were terminated if they exceeded 250 GB of memory.

Phylogenetic diversity is a suitable metric to analyze stool metagenomic samples

To demonstrate SFPhD's versatility and applicability to newer data sets, we reanalyzed 2661 paired 16S rRNA and metagenomic data of stool samples from the FINRISK (Borodulin et al. 2015, 2018; Salosensaari et al. 2021) study (n = 1563 aged 60 and older; n = 1098 aged 35 and under). In this experiment, we select random subsets of the full sample set and compare each metric's (observed features and Faith's PD) ability to detect differences in mean alpha diversity distributions. For each step, we randomly select N paired 16S and metagenomic samples and then compute the difference in mean alpha diversity between samples taken from younger adults (under 35 yr) and older adults (over 60 yr) together with an empirical P-value. For both 16S and metagenomics, the alpha diversity of younger adults is lower than in older adults. In metagenomics, but not in 16S sequencing, Faith's PD provides improved statistical power over observed features, a phylogenetically-agnostic alternative (Fig. 3A,B). With 16S data, the difference between the two metrics is subtle (Fig. 3A). In both cases, the statistical power increases as the number of samples grows. With metagenomic data, the number of observed features shows a weaker effect compared to Faith's PD regardless of the number of samples (Fig. 3B). Unlike 16S data sets (5600 features), metagenomic data sets (1700 features) are resolution-limited by the reference databases, whereas the nature of amplicon sequence variants allows for a broader feature space that can capture age differences without the need for a phylogeny.

Figure 3.

Phylogenetic diversity provides increased statistical power to differentiate age groups in shotgun metagenomics but not in 16S rRNA sequencing. (A) Statistical power to differentiate young adults from old adults in two alpha diversity metrics at different sample sizes using 16S rRNA sequencing in the FINRISK cohort. (B) Same as A but for shallow shotgun metagenomic sequencing. We investigated the difference in mean alpha diversity in metagenomic samples (Fig. 4A) by computing the log of the likelihood ratio of older to younger adult samples present for each branch in the WoL phylogenomic tree (Zhu et al. 2019). We were able to identify portions of the WoL tree responsible for the increase in phylogenetic diversity (Fig. 4B). From this analysis, we found that the majority of the tree is comparably represented in young and old adult samples. However, we also found two clades where older adult samples were more prevalent than younger adult samples (Clade 1 has a log likelihood ratio bounded with an 80% confidence interval of [1.20, 1.45] and Clade 2 has an 80% confidence interval of [0.55, 0.74]). Clade 1 corresponds to a majority of Lactobacillales genomes, and Clade 2 corresponds to Proteobacteria genomes. The branches in Clade 1 primarily have a large log likelihood ratio, indicating that the features across the entire clade are more likely to be found in samples from older adults. However, the internal branches in Clade 2 additionally have low log likelihood ratios, indicating that the enrichment of features in older adults is not completely consistent across the entire clade. Lastly, although not confined to a few clades, there are several tips (e.g., Staphylococcus aureus, Bavariicoccus seileri, Nitratireductor indicus, and Campylobacter ureolyticus) in the phylogeny that are only associated with younger adults.

Figure 4.

Phylogenetic tree colored by age-group log of the likelihood ratio of older to younger adults per node. (A) Distribution of Faith's PD by age group on the full data set. (B) Web of Life (WoL) phylogenetic tree with branches colored by the log of likelihood ratio of old adults compared to young adults in descendants of the branch, for the FINRISK data set. The inner circle is colored by the log of likelihood ratio of older adults compared to younger adults in the tips of the tree. The outer circle is colored by the phylum of the taxon represented by each tree tip. Red ellipses mark two clades enriched for samples from older individuals.

Discussion

By accounting for the relationship between features in a data set, Faith's PD can mitigate issues with sparsity and heterogeneity common to modern “omics” data sets. Although this metric was first introduced 30 yr ago, the underlying algorithm for computing this metric had largely remained unchanged. In this paper, we demonstrated that our novel algorithm, SFPhD, performed efficiently on data sets with hundreds of thousands of samples and millions of tree tips, producing identical results to those of previous algorithms for computing this metric while producing a speedup of up to 64× and requiring as little as 0.21% of the memory in our benchmarks. An important aspect of SFPhD's underlying algorithm is substituting calculation of the full presence/absence table over the phylogeny, for a tree traversal that partially aggregates diversity values and frees presence/absence information when no longer needed. The result is a high-performance implementation that demonstrates improved scaling with the number of samples in the input data set. Much of the engineering work here was facilitated by the balanced parenthesis tree implementation provided in the UniFrac package (McDonald et al. 2018b). Therefore, we believe that increasing the availability of efficient and flexible data structures for phylogenetic analyses is likely to accelerate and facilitate the development of novel analytical methods. In a broader sense, this is similar to the impact of NumPy's (McDonald et al. 2018b; Harris et al. 2020) N-dimensional array in image processing, machine learning, neuroscience, and other fields. In addition, in a stool metagenomic study, Faith's PD demonstrates increased statistical power compared to observed features for differentiating younger from older subjects based on their microbial communities. In this context, we show that Faith's PD consistently provided increased statistical power for determining age-based differences in the shotgun metagenomic sequencing data. While this metric was originally developed to analyze data with vastly different statistical and biological properties, its use here demonstrates the versatility and applicability behind measuring diversity using a tree. Furthermore, enabling efficient Faith's PD computation on microbiome data sets is of particular importance when examining the impact of COVID-19 on gut health (Kim et al. 2021). Although we show the utility of SFPhD in large and complex microbiome studies, the underlying implementation is not tied to a particular molecular technology. Thus, this implementation will be relevant to fields outside of microbiology, such as conservation prioritization, which inspired the original version of Faith's PD (Faith 1992) and where it continues to be applied (Rosauer et al. 2017). We also envision that our implementation will be applicable in fields like nutrition and metabolomics research, that only recently began adopting trees for analytical tasks (Johnson et al. 2019; Tripathi et al. 2021).

Methods

Construction of benchmarking tables

Data for the benchmarking in this study were subsampled from a BIOM table of 113,721 and 761,003 ASVs, which is composed of studies aggregated from several large sources of publicly available microbiome data in Qiita (Amir et al. 2017; Gonzalez et al. 2018). This data table was produced as previously described (McDonald et al. 2018b). The data was subset by uniformly randomly sampling the desired number of ASVs and samples from the table. Ten different tables were created for each number of samples and ASVs. The published insertion tree (McDonald et al. 2018b) was collapsed to only contain sequences that were selected to be included in the given subsampled table. The table with 307,237 public and anonymized private 16S rRNA V4 microbiome samples and 1,264,796 phylogenetic tree tips was also prepared as previously described (McDonald et al. 2018b) but included samples with private sequencing data from Qiita.

Benchmarking time and memory estimates

The SFPhD implementation available in the Python package unifrac v0.10.0 was used. The reference implementation uses the Faith's PD implementation from scikit-bio v0.5.4. All methods were run single-threaded on shared compute nodes that were not running other compute tasks. The nodes all had Intel Xeon CPU E5-2640 v3 @ 2.60GHz processors. A job was terminated if it exceeded 6 h of wall time or 250 GB of memory (system max). Space was tracked using GNU Time. Time for both implementations was tracked with a Python wrapper script. The time needed to parse data is not included in the scikit-bio timings but is included in the SFPhD timings, due to the lack of access to this information in the unifrac interface. This is acceptable given that it results in a conservative estimate of the speedup with SFPhD.

Carbon footprint estimation

The Green Algorithms interface (Lannelongue et al. 2021) was used to estimate the carbon dioxide equivalent (CO2e) of the benchmarked methods. The Intel Xeon CPU E5-2640 v3 CPUs used in benchmarking have a thermal design power (TDP) per core of the 11.25 TDP/core.

FINRISK processing

The 16S rRNA data were demultiplexed, quality filtered, and denoised with deblur (Amir et al. 2017). The Greengenes (McDonald et al. 2012b) 13.8 with a clustering level of 99% was used as the reference phylogeny for open-reference feature picking with SEPP (Mirarab et al. 2012). ASVs with a total frequency fewer than 10 were discarded, and the table was then rarefied to a sampling depth of 1000 reads/sample. The resulting table and insertion tree were used for calculation of Faith's PD. The shotgun metagenomic data were trimmed and quality-filtered using Atropos (Didion et al. 2017). They were aligned to the WoL database using SHOGUN pipeline (v1.0.8) with a Bowtie 2 alignment option. A table was generated from the alignments using the OGU workflow (Zhu et al. 2021). OGUs with a total frequency fewer than 10 were discarded, and the table was then rarefied to a sampling depth of 1000 reads/sample. The WoL phylogenomic tree (Zhu et al. 2021, 2019) was used for Faith's PD. Both tables were filtered to include only samples from individuals 35 and younger (younger criteria) or 60 and older (older criteria).

Power estimation for mean difference in alpha diversity

For a given N (shown on the horizontal axis in Fig. 3A,B), the FINRISK processed samples matching the younger/older criteria were sampled to this depth. On the subsampled data, the difference in mean alpha diversity between younger and older adults, , was computed. A null distribution, , was generated by repeating 1000 repetitions of shuffling the age category associated with an alpha diversity and recomputing the difference of mean alpha diversity between the groups. The P-value was computed by finding the percentile of in . This test procedure was repeated for 1000 repetitions. The power for N is estimated as the proportion of tests found significant at α = 0.05.

Older-younger log likelihood ratio calculation

The WoL tree (Zhu et al. 2019) was pruned and filtered to only include the OGUs (Zhu et al. 2021) belonging to the FINRISK samples with age ≤35 and ≥60. For each node in the tree, where Descendants (t) is the set of descendants of t in , and for a set of nodes , is the set of samples that contain any features in .

Phylogenetic visualization

Tree was visualized using EMPress (Cantrell et al. 2021). A node in the tree was considered old if its agelog > 0 and young if its agelog < 0.

Software availability

The data used for benchmarking Faith's PD timing and memory usage are available as per the Striped UniFrac paper (McDonald et al. 2018b). The code for the benchmarking is available on GitHub (https://github.com/biocore/faiths-pd-benchmarking). The data and code needed for benchmarking the FINRISK metagenomics data are also available on GitHub. The SFPhD code is available in the unifrac Python package (https://github.com/biocore/unifrac). All of the software is also available in the Supplemental Code.

41 in total

1. Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale.

Authors: Christian L Lauber; Micah Hamady; Rob Knight; Noah Fierer
Journal: Appl Environ Microbiol Date: 2009-06-05 Impact factor: 4.792

2. Daily Sampling Reveals Personalized Diet-Microbiome Associations in Humans.

Authors: Abigail J Johnson; Pajau Vangay; Gabriel A Al-Ghalith; Benjamin M Hillmann; Tonya L Ward; Robin R Shields-Cutler; Austin D Kim; Anna Konstantinovna Shmagel; Arzang N Syed; Jens Walter; Ravi Menon; Katie Koecher; Dan Knights
Journal: Cell Host Microbe Date: 2019-06-12 Impact factor: 21.023

3. Phylogenetically informed spatial planning is required to conserve the mammalian tree of life.

Authors: Dan F Rosauer; Laura J Pollock; Simon Linke; Walter Jetz
Journal: Proc Biol Sci Date: 2017-10-25 Impact factor: 5.349

4. Striped UniFrac: enabling microbiome analysis at unprecedented scale.

Authors: Daniel McDonald; Yoshiki Vázquez-Baeza; David Koslicki; Jason McClelland; Nicolai Reeve; Zhenjiang Xu; Antonio Gonzalez; Rob Knight
Journal: Nat Methods Date: 2018-11 Impact factor: 28.547

5. A complete domain-to-species taxonomy for Bacteria and Archaea.

Authors: Donovan H Parks; Maria Chuvochina; Pierre-Alain Chaumeil; Christian Rinke; Aaron J Mussig; Philip Hugenholtz
Journal: Nat Biotechnol Date: 2020-04-27 Impact factor: 54.908

6. A communal catalogue reveals Earth's multiscale microbial diversity.

Authors: Luke R Thompson; Jon G Sanders; Daniel McDonald; Amnon Amir; Joshua Ladau; Kenneth J Locey; Robert J Prill; Anupriya Tripathi; Sean M Gibbons; Gail Ackermann; Jose A Navas-Molina; Stefan Janssen; Evguenia Kopylova; Yoshiki Vázquez-Baeza; Antonio González; James T Morton; Siavash Mirarab; Zhenjiang Zech Xu; Lingjing Jiang; Mohamed F Haroon; Jad Kanbar; Qiyun Zhu; Se Jin Song; Tomasz Kosciolek; Nicholas A Bokulich; Joshua Lefler; Colin J Brislawn; Gregory Humphrey; Sarah M Owens; Jarrad Hampton-Marcell; Donna Berg-Lyons; Valerie McKenzie; Noah Fierer; Jed A Fuhrman; Aaron Clauset; Rick L Stevens; Ashley Shade; Katherine S Pollard; Kelly D Goodwin; Janet K Jansson; Jack A Gilbert; Rob Knight
Journal: Nature Date: 2017-11-01 Impact factor: 49.962

7. Analysis and correction of compositional bias in sparse sequencing count data.

Authors: M Senthil Kumar; Eric V Slud; Kwame Okrah; Stephanie C Hicks; Sridhar Hannenhalli; Héctor Corrada Bravo
Journal: BMC Genomics Date: 2018-11-06 Impact factor: 3.969

8. EMPress Enables Tree-Guided, Interactive, and Exploratory Analyses of Multi-omic Data Sets.

Authors: Kalen Cantrell; Marcus W Fedarko; Gibraan Rahman; Daniel McDonald; Yimeng Yang; Thant Zaw; Antonio Gonzalez; Stefan Janssen; Mehrbod Estaki; Niina Haiminen; Kristen L Beck; Qiyun Zhu; Erfan Sayyari; James T Morton; George Armstrong; Anupriya Tripathi; Julia M Gauglitz; Clarisse Marotz; Nathaniel L Matteson; Cameron Martino; Jon G Sanders; Anna Paola Carrieri; Se Jin Song; Austin D Swafford; Pieter C Dorrestein; Kristian G Andersen; Laxmi Parida; Ho-Cheol Kim; Yoshiki Vázquez-Baeza; Rob Knight
Journal: mSystems Date: 2021-03-16 Impact factor: 6.496

9. Reversion of Gut Microbiota during the Recovery Phase in Patients with Asymptomatic or Mild COVID-19: Longitudinal Study.

Authors: Han-Na Kim; Eun-Jeong Joo; Chil-Woo Lee; Kwang-Sung Ahn; Hyung-Lae Kim; Dong-Il Park; Soo-Kyung Park
Journal: Microorganisms Date: 2021-06-07

10. Temperature mediates continental-scale diversity of microbes in forest soils.

Authors: Jizhong Zhou; Ye Deng; Lina Shen; Chongqing Wen; Qingyun Yan; Daliang Ning; Yujia Qin; Kai Xue; Liyou Wu; Zhili He; James W Voordeckers; Joy D Van Nostrand; Vanessa Buzzard; Sean T Michaletz; Brian J Enquist; Michael D Weiser; Michael Kaspari; Robert Waide; Yunfeng Yang; James H Brown
Journal: Nat Commun Date: 2016-07-05 Impact factor: 14.919

3 in total

1. How gut reactions are shaping cancer treatment.

Authors: Bianca Nogrady
Journal: Nature Date: 2022-04 Impact factor: 49.962

2. Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy.

Authors: Qiyun Zhu; Shi Huang; Antonio Gonzalez; Imran McGrath; Daniel McDonald; Niina Haiminen; George Armstrong; Yoshiki Vázquez-Baeza; Julian Yu; Justin Kuczynski; Gregory D Sepich-Poore; Austin D Swafford; Promi Das; Justin P Shaffer; Franck Lejzerowicz; Pedro Belda-Ferre; Aki S Havulinna; Guillaume Méric; Teemu Niiranen; Leo Lahti; Veikko Salomaa; Ho-Cheol Kim; Mohit Jain; Michael Inouye; Jack A Gilbert; Rob Knight
Journal: mSystems Date: 2022-04-04 Impact factor: 7.324

3. Changes in Oral Microbial Diversity in a Piglet Model of Traumatic Brain Injury.

Authors: Julie Heejin Jeon; Jeferson M Lourenco; Madison M Fagan; Christina B Welch; Sydney E Sneed; Stephanie Dubrof; Kylee J Duberstein; Todd R Callaway; Franklin D West; Hea Jin Park
Journal: Brain Sci Date: 2022-08-21

3 in total