Literature DB >> 34093989

Towards multi-label classification: Next step of machine learning for microbiome research.

Shunyao Wu¹, Yuzhu Chen¹, Zhiruo Li², Jian Li¹, Fengyang Zhao¹, Xiaoquan Su¹.

Abstract

Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.

Entities: Chemical Disease Gene Species

Keywords: Machine learning; Microbiome; Multi-label classification; Single-label classification

Year: 2021 PMID： 34093989 PMCID： PMC8131981 DOI： 10.1016/j.csbj.2021.04.054

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Microbiome analysis characterizes the dynamics of complex microbial communities, thus provides opportunities to investigate the associations between microbial profiles and human diseases [1], [2], [3]. Recently years, the scale of publicly-available microbiome data is increasing intensively due to high-throughput sequencing. Usually, microbiome features can be surveyed on shallow taxonomy by clustering amplicon-based DNA reads into OTUs (Operational Taxonomic Units) [4], [5], or species/strain level taxonomy and metabolic functions by decoding shotgun metagenomic sequences [6], [7]. Such features (e.g. species, OTU, functions, etc.) are quantified by either sequence count, or normalized into relative abundance by sequence proportion. Then machine learning (ML) algorithms uncover unique patterns of microbiome features under different statuses, thus promote microbiome-based disease detection and treatment [8], [9], [10]. As an important technique of machine learning, supervised classification has been widely used in prediction of inflammatory bowel disease (IBD) [11], [12], cancer [13], [14], diabetes [15], gingivitis [16], [17] and other diseases based on human microbiome profiles [18], [19]. By constructing classifiers and models using taxonomical or functional profiles from patients and their healthy control as training data, ML classifiers determine the status of new samples. In addition, some ML approaches such as support vector machines (SVM) [20] and random forest (RF) [21] can further measure the importance of each feature during the model training, which can identify microbial biomarkers that highly contribute to the classification [2], [22], [23]. To clearly understand the interaction between microbes and healthy status, previously research cohorts are always well designed, in which a sample has only one exact label that describes its healthy status, e.g. a sample is either confident healthy, or associated with a definite disease (Fig. 1a). Nevertheless, such strategy exhibits its limitations in practical and clinical applications, since a patient may have more than one label (multiple diseases, also denoted as complications or comorbidities; Fig. 1b). For example, in American Gut Project [24] cohort, 8297 of 13,545 patients were marked with at least 2 diseases. In this case, regular classifiers do not work well for the prediction could be significantly interfered by the co-effect and interactions of multiple diseases. More importantly, since only a single label (e.g. a specific disease) was presented in the prediction result, comorbidities or complications were always missed or omitted by single-label ML models [2], [22].

Fig. 1

Comparison of single-label classification and multi-label classification. a. Single-label classification requires a sample has one label (status). b. Multi-label classification can detect more than one status for each sample. In this work, we summarize the typical and classical machine learning methods for microbiome research, and demonstrate their limitations in disease recognition using American Gut Project dataset. Then we prospect a further step to solve the aforementioned problems by a series of promising strategies on multi-label classification [25], [26], [27], [28]. Finally, we also raise and discuss some key technical issues in applying multi-label classification into microbiome-based disease detection.

Single-label classification in microbiome studies

Microbiome-based disease detection can be considered as a classification problem using microbial profiles, which are parsed from DNA sequences by bioinformatics tools such as UPARSE [5], QIIME/QIIME2 [29], [30], Parallel-Meta3 [31], MetaPhlAn2 [6], HUMANn2 [7], Kraken [32], according to the sequencing method and type [3]. Given microbiome profiles for n samples ( is the microbial profile of a sample that can be represented by normalized richness of features like species, OTU, function, etc.) and their corresponding status meta-data (label) , ML classifier solves a function that maps the profiles to their meta-data, thus predict the status of a new subject based on its profile. Usually, the classifier requires each subject has a single label (status), which is known as single-label classification. Here label y in meta-data Y is a discrete variable that , here c is a status (e.g. a specific disease). Specifically, when (e.g., is healthy and is IBD), the ML works as binary classifier that only differentiates IBD samples from healthy ones; When , it becomes multiple-category classifier that can determine the disease type of a new sample from multiple disease categories (Fig. 1a). Here we review the commonly used single-label classification approaches, including logistic regression, support vector machine, k nearest neighbors, random forest, gradient boosting tree and neural networks (Table 1).

Table 1

Characteristics of machine learning methods widely used for microbiome-based disease detection.

ML approach	Feature importance measurement	Interpretability	Package and applicable programming language
LR	Y	Excellent	Scikit-learn (Python) [33]
SVM	Y	Good	Scikit-learn (Python), LibSVM(Python/R/Java) [34]
k-NN	N	Weak	Scikit-learn (Python)
RF	Y	Good	Scikit-learn (Python)randomForest (R) [35]
GBDT	Y	Good	Xgboost (Python/R/C++) [36], [37]Lightgbm (Python/R/C++) [38]Catboost (Python/R/C++) [39]
Neural Networks	N	Weak	Tensorflow (Python/Java) [40]PyTorch (Python) [41]Keras (Python)[42]

Characteristics of machine learning methods widely used for microbiome-based disease detection. Logistic regression (LR) is a typical linear model for binary classification that utilizes a logistic function to model a binary dependent variable [43]. Basically, it calculates the probability for the occurrence of a specified event, e.g., a microbiome sample is healthy or disease. Due to the advantages in efficiency and interpretability, it is commonly used as a benchmark in microbiome-based disease detection [9], [44], although the performance is not as well as other methods. Different from LR, support vector machine (SVM) captures non-linear associations of microbiome profiles and host status to maximize the margin between healthy and disease samples [20], which achieves much better performance than LR. Noteworthy, as binary classifiers, LR and SVM can also be extended as multi-category classifier, by assigning a respective classifier for each disease. Another method is k-nearest neighbors (k-NN), which directly label a new sample by its k nearest neighbors [45]. One crucial problem of k-NN is how to appropriately measure the neighborship among microbiomes [46] by geometry-based distance metrices such as Bray-Curtis, JSD, JCCARD [47], or phylogeny-based algorithms like UniFrac [48] or Meta-Storms [49]. Recently, A search-based strategy employed microbiome search engine (MSE) [50] to separate unhealthy microbiomes from health ones by outlier novelty score, and then recognize their detailed disease type via a phylogeny-distance based k-NN, which outperforms traditional ML implementations in sensitivity, robustness and speed [51]. To further improve the performance for microbiome disease detection, ensemble classification approaches are developed by integrating individual ML methods [52], [53]. As an ensemble classifier, random forest (RF) constructs a multitude of decision trees through random selection of samples and features in training data, and then combines the predicted status of new samples by voting [2], [8], [9], [21]. Different from RF, gradient boosting decision tree (GBDT) assigns a weight to each microbiome sample, builds the tree-like model in a stage-wise fashion [54], [55] and then update parameters iteratively to minimize estimation errors [56]. Both RF and GBDT are not only superior to individual ML methods in precision, but can also evaluate the elucidate the contribution of each microbial feature for classification [22], [23]. In traditional ML, feature extraction from input data is fundamental for accuracy and sensitivity, e.g. select out biomarker species that play as signatures during the development of a disease, while such process always requires artificial efforts [57]. Deep learning performs feature extraction automatically and trains deep neural networks in an end-to-end way [58], which can alleviate the high dimensionality introduced by the complexity of microbial communities. Neural networks (such as deep neural networks (DNNs) [59], recurrent neural networks (RNNs) [60], convolutional neural networks (CNNs) [61], etc.) have been successfully transited from image analysis to microbiome research. In computer vision, CNNs make convolution operation for neighboring pixels to generate new variables. However, neighborship relations between microbes are not well-defined in a community. Therefore, Sharma et al. [62] developed a novel method based on CNNs by incorporating a stratified approach to group OTUs into phylum clusters. Lo et al. [63] also modeled microbiome profiles with a negative binomial distribution and solved over fitting problem by data augmentation technique in CNNs.

Limitations of single-label classification on real microbiome dataset

To measure the feasibility of single-label classifiers in handling microbiomes with multiple labels, we performed the disease detection using a subset of American Gut Project [24] cohort (refer to Materials and Methods for details). 16S rRNA amplicon microbiomes were collected from 3433 healthy hosts as control and 10,826 patients recorded with five target diseases, including Irritable bowel syndrome (IBS), Autoimmune, Lung disease, Migraine and Thyroid (Table 2). For each target disease, microbiome samples were divided into two groups: i) Single Disease group (SD) that contains controls and samples only with this target disease; ii) Multiple Disease group (MD) that contains controls and samples with this target disease and other comorbidities. Controls in each group were randomly selected from the healthy samples, and the sample number was set as equal to disease samples. We implemented two ensemble single-label classifiers of RF and GBDT to detect the target disease using OTU level profiles in each group, respectively. Performance was evaluated by AUC (Area Under the receiver operating characteristic Curve) using 5-fold cross-validation (refer to Materials and Methods for detailed configurations and parameters).

Table 2

Brief summary of samples labeled with target diseases.

Target disease	Total number of disease samples	Number of single-disease samples	Number of comorbidities samples
IBS	2351	1064	1287
Autoimmune	2301	487	1814
Lung disease	2251	1248	1003
Migraine	2109	938	1171
Thyroid	1814	559	1255

Brief summary of samples labeled with target diseases. As results shown in Table 3, to detect the target disease in single-disease group, classifiers trained by SD outperformed those by MD, mainly due to eliminating additional variations on microbiota patterns of comorbidities. On the other side, classifiers trained by MD was superior to those by SD on multi-disease samples. Then we further dissected the microbial biomarkers and ML models between SD and MD that led to such results. A distribution-free test [64], [65] on autoimmune samples showed that biomarkers selected from SD were shared with MD (Fig. 2; taxonomy was annotated on genus level; refer to Materials and Methods for details). However, the decision tree constructed by GBDT binary classifier from SD was quite different from that from MD (Fig. 3; e.g. the structure and interactions between nodes in the MD tree were more complicated), implying the variation of microbial interactions between single disease and multiple disease. Therefore, influences of comorbidities on microbiota should be considered for ML model design and construction in practical cases. Notably, although the precision on target disease detection can be optimized, neither of the single-label ML classifier is able to detect the comorbidities or complications beyond the target disease.

Table 3

Results of single-label classifiers on target diseases detection.

a. Performance (AUC) on IBS
Testing set	SD		MD
Training set	SD	MD	SD	MD
RF	0.681 ± 0.039	0.661 ± 0.032	0.718 ± 0.025	0.757 ± 0.018
GBDT	0.713 ± 0.025	0.689 ± 0.036	0.731 ± 0.022	0.787 ± 0.015

Fig. 2

Microbial biomarkers of autoimmune selected from SD and MD by distribution-free independence test.

Fig. 3

Decision tree of GBDT binary classifier constructed from SD (A) was less complicated than that from MD (B). In each tree internal nodes represent taxa on genus-level, leaf nodes represent labels, and branch weights represent criteria for decision.

Results of single-label classifiers on target diseases detection. Microbial biomarkers of autoimmune selected from SD and MD by distribution-free independence test. Decision tree of GBDT binary classifier constructed from SD (A) was less complicated than that from MD (B). In each tree internal nodes represent taxa on genus-level, leaf nodes represent labels, and branch weights represent criteria for decision.

Multi-label classification: one step forward of machine learning for microbiome

Different from single-label ML classifiers (Fig. 1a), multi-label classification allows each sample to have more than one status (label; Fig. 1b). It is natural to introduce multi-label classification into microbiome-based disease detection for a sample (patient) may have multiple labels (comorbidities or complications). Here we introduce two schemes for multi-label classification: algorithm adaption and problem transformation [27]. Algorithm adaptation processes multi-label data by directly modifying single-label classifiers. For example, ML-kNN (multi-label k-nearest neighbors) combines the k-NN and Bayesian rule to determine the label set of a new sample [66]. Another example is a decision tree algorithm named C4.5 [67] that makes leaves represent a set of labels and modifying entropy-like function [68] for multi-label classification. Recently, a new ML-DT (Multi-Label Decision Tree) algorithm has been developed based on the non-parametric predictive inference model on multinomial data, which achieves a robust performance using precise probabilities [69]. Problem transformation, as the name suggests, transforms the multi-label problem into single-label ones by binary relevance, calibrated label ranking or class chains. Binary relevance bases on a one-against-all strategy that converts m (m > 1) labels into separate m binary classification problems, and determines each label by a binary classifier. Although it provides a simple and efficient solution, binary relevance ignores the possible correlations between labels thus leads to erroneous results [70]. To tackle such disadvantage, calibrated label ranking transforms m-label classification into label ranking problem [71] by considering the relevance in pairwise labels and constructs m * (m − 1)/2 binary classifiers. Hence, each label is voted by m-1 binary classifiers. Additionally, voting probabilities of m-1 binary classifiers can be utilized as features to train a new binary classifier to further improve the performance. Furthermore, one label may depend on some other labels, e.g. diagnosis and treatment of cardiovascular disease has been linked with those of IBD [72]. In this condition, class chains [73] that treats dependent labels as features of binary classifiers will be an ideal option.

Key technical issues of multi-label classification for microbiome-based disease detection

Well-established multi-label classification methods also exhibit shortages in processing microbiome datasets due to the high data complexity, data heterogeneity and microbe-disease interaction. In the past years, hundreds of microbiome-diseases interactions have been studied and reported, e.g. Disbiome database [74] collected 10,934 experimentally verified microbe-disease associations between 372 diseases and 1622 microbes. A general challenge is that such a large number of labels can lead to unexpected high computational cost (Fig. 4a). For example, to train a 100-label classification model (a sample has more than one disease from a total number of 100 diseases), binary relevance approach needs 100 binary classifiers, and calibrated label ranking requires up to 4950 classifiers. Recently, embedding methods such as SLEEC (Sparse Local Embeddings for Extreme Classification) algorithm [75] are proposed for many-label challenge. It projects labels into lower dimension-space vectors, constructs a regression for each label, and decodes the predicted labels via compressed techniques. To fit large-scale datasets, SLEEC uses unsupervised k-means algorithm to partition training data into several smaller subsets before the projection step. However, due to omitting the label information, the pre-partition may affect the quality of afterward projection. Therefore, embedding methods are further improved by incorporating feature vectors and label information using graph embedding algorithm [76] and an adaptive feature agglomeration technique like DEFRAG (aDaptive Extreme FeatuRe AGglomeration) [77].

Fig. 4

Three key technical issues in multi-label classification. a. Too many labels in training data leads to unexpected high computational cost. b. Missed label reduces the detection sensitivity. c. Ambiguous label introduces false positive results. Label missing is another common problem in multi-label classification (Fig. 4b). It is possible that in the American Gut Project cohort, some multi-disease samples were incorrectly grouped in SD for inadequate clinical examinations, making a ‘Negative’ or ‘Not provided’ record for some diseases in meta-data. Such label missing may also occur in multi-label classification result due to the low sensitivity when detecting multiple statuses at the same time. Here we introduce two alternatives including graph-based method and low-rank method to improve the sensitivity. The former one, graph-based method, estimates the comprehensive labels derived from label-specific graph [78] or label vectors [79]. The later one, low rank method, formulates multi-label learning as a matrix completion problem that contains side information [80], which can be estimated by empirical risk minimization framework [81] to avoid label missing. In real-world scenarios, it is possible that the diseases meta-data is based on hosts’ personal experience or other unreliable conclusions without clinical diagnosis or confirmation from medical professionals. Such ambiguous labels in training data (Fig. 4c) may introduce false positive results. Partial multi-label approaches can eliminate errors caused by ambiguous or erroneous labels, mainly by maintaining a confidence value for each candidate label [82]. Based on how to calculate the confidence value, partial multi-label approaches are generally divided into two types, two-stage method and end-to-end learning method. Two-stage method estimates the confidence of candidate labels for each sample by iterative label propagation, and then train multi-label classifiers using credible labels with high confidence [83]. This straightforward concept however can be error-prone due to insufficient disambiguation. Different from separating confidence estimation and classifier construction as two stages, the end-to-end method treats confidence values as weights of model training functions [82], [84], [85] and enhance label disambiguation by combining two stages into a unified framework.

Conclusion and discussion

In this work, we reviewed typical single-label machine learning methods in microbiome research. While such ML approaches can help in interpreting the pattern of microbiome-disease linkages and predicting the status for newly sequenced samples, a significant limitation is raised, mainly in handling multi-label problems that a single microbiome can be associated with several different healthy conditions. Hence, we prospect one step forward of ML in microbiome filed towards multi-label classification that provides promising opportunities to tackle such limitation in research and application. Another concern is that interactions among microbes has not been effectively considered by ML classifiers. Although biomarker fractions from single disease and comorbidities were similar (Fig. 1), their hierarchies in the GBDT decision tree are highly diverse (Fig. 2), probably directed by different interactions between microbes. Recently, co-occurrence or correlation among microbes have been widely studied in various ecosystems [86], [87], [88], [89], which survey microbe-microbe interactions from biological aspect. Nevertheless, how to efficiently and effectively integrate such biological information into ML classifiers is still an opening problem for further work [22], [90]. Few studies concentrated on the interpretability of ML model in microbiome studies, however it is meaningful to explain the disease prediction results. Among single-label classification methods, logistic regression has the best interpretability and the lowest performance, while NNs are on the opposite side. Although RF and GBDT also output feature importance, the calculation are too rough for further causal interpretation. Advanced statistical methods such as single index model that combines flexibility of modeling with interpretability of (linear) coefficients [91], [92] may provide a potential solution for balancing the interpretability and performance. Meanwhile, host heterogeneity on age, gender, diet, life style and other factors [93], as well as the sparsity, variance, and high-dimensionality [94] of microbiome data can also confound the disease detection and interpretation, which should be evaluated and considered in experiment design and ML analysis.

Materials and methods

Experiment design and datasets

The American Gut Project cohort contains 29,344 subjects including 15,799 healthy controls and 13,545 patients. The disease statuses of each subject were obtained from the original questionnaire -based meta-data that consists of information in diet, health status and hygiene. 16S rRNA OTU profiles of gut microbiomes (by close-OTU-picking) were download from Qiita [95], and taxonomy annotation on genus level was parsed by GreenGenes 13-8 database [96] using Parallel-META 3 [31]. The relative abundance on OTU and genus level was directly calculated by sequence count, and then normalized by 16S rRNA gene copy number from PICRUSt 2 [97]. We also drop subjects without microbiome samples. A subject was treated either as a patient if recorded as ‘Diagnosed by a medical professional (doctor, physician assistant)’ for a specified disease in the meta-data, or as healthy if marked as ‘I do not have this condition’ for all diseases. Finally, we collected data of 3433 healthy samples and 10,826 patients. For each target disease, microbiome samples were selected and divided into two groups: Single Disease group (SD) contains controls and samples only with the target disease; Multiple Disease group (MD) contains controls and samples with the target disease and other comorbidities. Controls samples in each group were randomly selected from the 3,433 healthy samples, and the sample number was set as equal to disease samples. For each target disease we performed two experiments. First, we assessed the ML classifiers in distinguishing disease samples and healthy controls in SD group. Classifier models were constructed by SD group and MD group. Specifically, 5-fold cross-validation was employed when detecting SD samples by models trained from SD group (in which 80% of the samples were randomly selected as the training set for model construction and the remaining 20% were the testing set for validation). Meanwhile, in each of the 5 folds we also randomly select the same number of samples from MD group to train another model for target disease detection in the identical SD testing set. AUCs of the SD-trained model and MD-trained model were recorded for comparison. Secondly, we then assessed the ML classifiers in detecting MD group, and models were also constructed by SD group and MD group in the previous procedure.

Machine learning methods and biomarker selection

Two popular ensemble single-label classification methods, random forest and GBDT were employed to construct single-label classifiers. Random forest was implemented by ‘scikit-learn’ package in python, the ‘number of trees’ is set as 500, while other parameters were kept as default configuration. GBDT was implemented by ‘lightgbm’ package in python with parameters of ‘learning rate’ = 0.02, ‘maximum tree depth’=6, ‘number of boosted trees’ = 1000, ‘maximum tree leaves’ = 64, ‘subsample ratio’=0.8 and ‘colsample_bytree’=0.8. Biomarkers analysis was performed by distribution-free test (‘mvtpy’ package in python) on genus-level abundance between disease and control samples, and the top 10 taxa on the test statistic with p-value < 0.01were selected out as biomarkers.

Code and data availability

All datasets and code in this work are available at https://github.com/BruceQD/Microbiome-based-disease-detection. All other relevant data is available upon request.

Author statement

S.W and Y.C. contributed to the description and summary of algorithms, and performed the analysis. Z.L., J.L. and F.Z. reviewed and edited the manuscript before submission. X.S. conceived the idea and wrote the manuscript. Author order was determined by mutual agreement.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

47 in total

1. Inferring correlation networks from genomic survey data.

Authors: Jonathan Friedman; Eric J Alm
Journal: PLoS Comput Biol Date: 2012-09-20 Impact factor: 4.475

2. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities.

Authors: Gongchao Jing; Zheng Sun; Honglei Wang; Yanhai Gong; Shi Huang; Kang Ning; Jian Xu; Xiaoquan Su
Journal: Sci Rep Date: 2017-01-12 Impact factor: 4.379

3. Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level.

Authors: Gongchao Jing; Lu Liu; Zengbin Wang; Yufeng Zhang; Li Qian; Chunxiao Gao; Meng Zhang; Min Li; Zhenkun Zhang; Xiaohan Liu; Jian Xu; Xiaoquan Su
Journal: mSystems Date: 2021-01-19 Impact factor: 6.496

4. TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction.

Authors: Divya Sharma; Andrew D Paterson; Wei Xu
Journal: Bioinformatics Date: 2020-11-01 Impact factor: 6.937

5. Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors: Derrick E Wood; Steven L Salzberg
Journal: Genome Biol Date: 2014-03-03 Impact factor: 13.583

6. Disbiome database: linking the microbiome to disease.

Authors: Yorick Janssens; Joachim Nielandt; Antoon Bronselaer; Nathan Debunne; Frederick Verbeke; Evelien Wynendaele; Filip Van Immerseel; Yves-Paul Vandewynckel; Guy De Tré; Bart De Spiegeleer
Journal: BMC Microbiol Date: 2018-06-04 Impact factor: 3.605

7. A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems.

Authors: Begüm D Topçuoğlu; Nicholas A Lesniak; Mack T Ruffin; Jenna Wiens; Patrick D Schloss
Journal: mBio Date: 2020-06-09 Impact factor: 7.867

8. Multiple-Disease Detection and Classification across Cohorts via Microbiome Search.

Authors: Xiaoquan Su; Gongchao Jing; Zheng Sun; Lu Liu; Zhenjiang Xu; Daniel McDonald; Zengbin Wang; Honglei Wang; Antonio Gonzalez; Yufeng Zhang; Shi Huang; Gavin Huttley; Rob Knight; Jian Xu
Journal: mSystems Date: 2020-03-17 Impact factor: 6.496

9. Qiita: rapid, web-enabled microbiome meta-analysis.

Authors: Antonio Gonzalez; Jose A Navas-Molina; Tomasz Kosciolek; Daniel McDonald; Yoshiki Vázquez-Baeza; Gail Ackermann; Jeff DeReus; Stefan Janssen; Austin D Swafford; Stephanie B Orchanian; Jon G Sanders; Joshua Shorenstein; Hannes Holste; Semar Petrus; Adam Robbins-Pianka; Colin J Brislawn; Mingxun Wang; Jai Ram Rideout; Evan Bolyen; Matthew Dillon; J Gregory Caporaso; Pieter C Dorrestein; Rob Knight
Journal: Nat Methods Date: 2018-10-01 Impact factor: 28.547

10. Guild-based analysis for understanding gut microbiome in human health and diseases.

Authors: Guojun Wu; Naisi Zhao; Chenhong Zhang; Yan Y Lam; Liping Zhao
Journal: Genome Med Date: 2021-02-09 Impact factor: 11.117