| Literature DB >> 32868914 |
Cameron Martino1,2,3, Liat Shenhav4, Clarisse A Marotz3, George Armstrong2,3, Daniel McDonald3, Yoshiki Vázquez-Baeza1,5, James T Morton6, Lingjing Jiang7, Maria Gloria Dominguez-Bello8,9, Austin D Swafford1, Eran Halperin4,10,11,12,13, Rob Knight14,15,16,17.
Abstract
The translational power of human microbiome studies is limited by high interindividual variation. We describe a dimensionality reduction tool, compositional tensor factorization (CTF), that incorporates information from the same host across multiple samples to reveal patterns driving differences in microbial composition across phenotypes. CTF identifies robust patterns in sparse compositional datasets, allowing for the detection of microbial changes associated with specific phenotypes that are reproducible across datasets.Entities:
Year: 2020 PMID: 32868914 PMCID: PMC7878194 DOI: 10.1038/s41587-020-0660-7
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1.Overview of the CTF algorithm.
(a) CTF utilizes feature abundance matrices for subjects over time. For each subject with a phenotype of interest, the data is represented as relative abundances of features (abundance gradient represented in grayscale) over time. (b) The matrices are concatenated, robust-centered log-ratio transformed (R-CLR) and structured into a tensor format with modes corresponding to subjects, features and time. (c) The resulting tensor is then factored based only on observed data into loading vectors for each dimension (i.e. subject, timepoint, and feature). (d) Simulated count data is plotted on the y-axis for three taxa with the mean counts in bold and missing values absent from the bold line. Standard deviation of distributions are shaded behind. Two phenotypes are compared; a control unchanging in time (left) and a dynamic phenotype with a perturbation at time point 2 (right). Taxon 1 (blue) is highly abundant and noisy, taxon 2 (red) is lowly abundant but growing exponentially in phenotype 2, and taxon 3 (orange) is oscillatory with increasing amplitude in phenotype 2. The first two principal component axes (i.e. loadings) from CTF (PC1 (top) and PC2 (bottom)) are plotted on the y-axis with the corresponding sample (e), time (f), and feature loadings (g). In PC1, phenotype 2 is linked to the unstable oscillatory waveform of highly loaded taxon 3 (orange, top). Similarly, in PC2, phenotype 2 is linked to the sigmoidal waveform of highly loaded taxon 2 (red, bottom).
Figure 2.CTF outperforms popular distance metrics in longitudinal in silico data-driven simulations.
Increasing sequencing depth (500 – 10,000; rows) over differing temporal sampling densities (x-axis) evaluated for PERMANOVA F-statistic as a measure of discriminatory power (left column), in addition to KNN-classification cross-validation by AUC (n=100; middle column), and APR (n=100; right column). Compared among CTF (green) and popular distance metrics Aitchison (blue), Bray-Curtis (orange), Jaccard (grey), unweighted (purple), and weighted (red) UniFrac. Error bars represent standard error of the mean.