| Literature DB >> 29136663 |
Yifei Zhang1, Alexander V Alekseyenko1,2.
Abstract
The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29136663 PMCID: PMC5685621 DOI: 10.1371/journal.pone.0187940
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 2Tree distances of alignment-free methods and alignment-based methods relative to the gold standard taxonomic trees.
Ten replicate subsets of sequences from Greengenes have been obtained and phylogenies inferred using alignment-free methods (ACS, CVTree, and Kr) and alignment-based methods (PyNAST or MUSCLE-based alignment, FastTree or RAxML inference). TREEDIST distances between the phylogenies inferred by each method as well as the taxonomic gold standard have been computed. Smaller distances indicate better resemblance of the taxonomy in the corresponding inferred phylogenies. Sequences from HMP derived stool samples have been used to compare all the methods. Distances across the replicates are reported. P = PyNAST, M = MUSCLE, F = FastTree, R = RAxML.
Significance and effect size estimates for PERMANOVA testing (10,000 permutations) of the association of the microbiome and experimental variables.
| R2 | ω2 | P-value | |
|---|---|---|---|
| Location | |||
| FastTree | 0.546 | 0.536 | <0.0001 |
| ACS | 0.362 | 0.351 | <0.0001 |
| Kr | 0.370 | 0.359 | <0.0001 |
| CV | 0.464 | 0.454 | <0.0001 |
| Consensus (ACS, Kr, CV) | 0.297 | 0.286 | <0.0001 |
| FastTree | 0.098 | 0.056 | <0.0001 |
| ACS | 0.127 | 0.084 | <0.0001 |
| Kr | 0.121 | 0.079 | <0.0001 |
| CV | 0.111 | 0.068 | <0.0001 |
| Consensus (ACS, Kr, CV) | 0.138 | 0.095 | <0.0001 |