| Literature DB >> 24076764 |
Joseph N Paulson1, O Colin Stine, Héctor Corrada Bravo, Mihai Pop.
Abstract
We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24076764 PMCID: PMC4010126 DOI: 10.1038/nmeth.2658
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1Clustering analysis is improved substantially by CSS normalization
We plot the first two principal coordinates in a multi-dimensional scaling analysis of mouse stool data normalized by (A) CSS, (B) DESeq size factors, (C) trimmed mean of M-values, and (D) total-sum. Colors indicate clinical phenotype (diet). CSS normalization data successfully separates samples by diet while controlling within-group variability. (E) Class posterior probability log-ratio for Western diet obtained from linear discriminant analysis (LDA). Each box corresponds to the distribution of leave-one-out posterior probability of assignment to the “Western” cluster across normalization methods (whiskers indicate 1.5 times inter-quartile range). Samples were best distinguished by phenotypic similarity using CSS normalization.
Figure 2Simulation results indicate that metagenomeSeq has greater sensitivity and specificity in a variety of settings
We use area under the receiver operating characteristic curve (AUC) to compare Metastats[13], Xipe[12], Kruskal-Wallis test as used in Lefse[14], a non-zero inflated log-normal model[30], edgeR[19] and DESeq[18]. (A) AUC as dataset sparsity decreases. MetagenomeSeq achieves larger AUC values than any other method in datasets with high sparsity (vertical dashed line represents the least sparse metagenomic dataset). (B) AUC as the effect-size between two conditions increases. Both metagenomeSeq and Lefse are better at detecting features with small effect size. (C) AUC as the variability in depth of sequencing increases. MetagenomeSeq and Kruskal-Wallis are robust to high variability in sequencing depth. (D) AUC as average sequencing depth increases. All models (except the non-zero inflated log-normal model and XIPE) perform similarly well at sufficient depth of coverage.