| Literature DB >> 30035234 |
M Noguera-Julian1,2,3, M L Calle2, J Rivera-Pinto1,2, J J Egozcue4, V Pawlowsky-Glahn5, R Paredes1,2,3,6.
Abstract
High-throughput sequencing technologies have revolutionized microbiome research by allowing the relative quantification of microbiome composition and function in different environments. In this work we focus on the identification of microbial signatures, groups of microbial taxa that are predictive of a phenotype of interest. We do this by acknowledging the compositional nature of the microbiome and the fact that it carries relative information. Thus, instead of defining a microbial signature as a linear combination in real space corresponding to the abundances of a group of taxa, we consider microbial signatures given by the geometric means of data from two groups of taxa whose relative abundances, or balance, are associated with the response variable of interest. In this work we present selbal, a greedy stepwise algorithm for selection of balances or microbial signatures that preserves the principles of compositional data analysis. We illustrate the algorithm with 16S rRNA abundance data from a Crohn's microbiome study and an HIV microbiome study. IMPORTANCE We propose a new algorithm for the identification of microbial signatures. These microbial signatures can be used for diagnosis, prognosis, or prediction of therapeutic response based on an individual's specific microbiota.Entities:
Keywords: balances; compositional data; microbiome
Year: 2018 PMID: 30035234 PMCID: PMC6050633 DOI: 10.1128/mSystems.00053-18
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1 Mean area under the ROC curve (AUC) as a function of the number of components included in the balance in the cross-validation process for Crohn’s disease. The optimal number of components according to the “1se rule” is highlighted with a vertical dashed line.
FIG 2 Description of the global balance for Crohn’s disease. The two groups of taxa that form the global balance are specified at the top of the plot. The box plot represents the distribution of the balance scores for CD and non-CD individuals. The right part of the figure contains the ROC curve with its AUC value (0.838) and the density curve for each group.
FIG 3 Cross-validation (CV) results for Crohn’s disease study: most frequent taxa and most frequent balances selected in the CV procedure compared to the global balance obtained with the whole data set. Colored rectangles indicate if the component is in the numerator of the balance (BAL) (red), in the denominator (blue), or not included (white). FREQ, frequency.
Comparison of model complexity and discrimination accuracy of microbial signatures for Crohn’s disease status
| Method | Median no. of taxa | Mean cv-AUC |
|---|---|---|
| 12 | 0.8196 | |
| DESeq2 | 33 | 0.7752 |
| edgeR | 34 | 0.7721 |
| ANCOM | 5 | 0.7125 |
| ALDEx2 | 31 | 0.8156 |
For each method, the table indicates the median number of taxa included in the model and the mean cv-AUC for 10 iterations of a 5-fold cross-validation process.