| Literature DB >> 26905735 |
Jennifer Fouquier1, Jai Ram Rideout2, Evan Bolyen3, John Chase4, Arron Shiffer5,6, Daniel McDonald7, Rob Knight8, J Gregory Caporaso9,10, Scott T Kelley11,12,13.
Abstract
BACKGROUND: Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26905735 PMCID: PMC4765138 DOI: 10.1186/s40168-016-0153-6
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1ghost-tree workflow diagram
Quantitative group comparisons using ANOSIM and PCoA to analyze large effect sizes (between environments) of simulated and unsimulated human saliva and public restroom floor samples
| Test statistic ( |
| % explained | |
|---|---|---|---|
| Unsimulated (real) data community analysis | |||
| Jaccard (Fig. | 0.865 | 0.001 | 31.24 |
| Bray-Curtis (Fig. | 0.849 | 0.001 | 47.17 |
| Unweighted UniFrac with FastTree (Fig. | 0.734 | 0.001 | 43.56 |
| Weighted UniFrac with FastTree (Fig. | 0.263 | 0.001 | 63.52 |
| Unweighted UniFrac with | 0.753 | 0.001 | 50.79 |
| Weighted UniFrac with | 0.463 | 0.001 | 76.21 |
| Unweighted UniFrac with | 0.730 | 0.001 | 50.61 |
| Weighted UniFrac with | 0.458 | 0.001 | 76.91 |
| Unweighted UniFrac with | 0.700 | 0.001 | 66.11 |
| Weighted UniFrac with | 0.453 | 0.001 | 67.97 |
| Simulated data community analysis | |||
| Jaccard to analyze FTSCs (Fig. | 0.191 | 0.001 | 3.05 |
| Bray-Curtis to analyze FTSCs (Fig. | 0.191 | 0.001 | 2.71 |
| Jaccard to analyze GTSCs (Fig. | 0.036 | 0.001 | 1.63 |
| Bray-Curtis to analyze GTSCs (Fig. | 0.036 | 0.001 | 2.43 |
| Unweighted UniFrac with FastTree to analyze FTSCs (Fig. | 0.675 | 0.001 | 22.10 |
| Weighted UniFrac with FastTree to analyze FTSCs (Fig. | 0.255 | 0.001 | 43.69 |
| Unweighted UniFrac with FastTree to analyze GTSCs (Fig. | 0.298 | 0.001 | 68.87 |
| Weighted UniFrac with FastTree to analyze GTSCs (Fig. | 0.150 | 0.001 | 54.08 |
| Unweighted UniFrac with | 0.302 | 0.001 | 27.72 |
| Weighted UniFrac with | 0.117 | 0.001 | 35.55 |
| Unweighted UniFrac with | 0.580 | 0.001 | 20.40 |
| Weighted UniFrac with | 0.307 | 0.001 | 44.98 |
Note: For unsimulated samples, sample size is 36, and two groups were analyzed using 999 permutations. For simulated samples, sample size is 360, and two groups were analyzed using 999 permutations. The test statistic (R), p value, and percent variation explained in the first the PCoA axes are presented for each comparison
Fig. 2Principal coordinates comparing unsimulated (real) samples based on a Jaccard distances, b Bray-Curtis distances, c unweighted UniFrac distances where trees are computed using FastTree, d weighted UniFrac distances where trees are computed using FastTree, e unweighted UniFrac distances where trees are computed using ghost-tree, and f weighted UniFrac distances where trees are computed using ghost-tree. Blue points are simulated and real human saliva samples, and red points are simulated and real restroom surface samples. Plots were made using EMPeror software [26]
Fig. 3Principal coordinates comparing simulated samples based on a Jaccard distances to analyze FastTree-simulated communities (FTSCs), b Bray-Curtis distances to analyze FTSCs, c Jaccard distances to analyze ghost-tree-simulated communities (GTSCs), d Bray-Curtis distances to analyze GTSCs, e unweighted UniFrac distances where trees are computed using FastTree to analyze FTSCs, f weighted UniFrac distances where trees are computed using FastTree to analyze FTSCs, g unweighted UniFrac distances where trees are computed using FastTree to analyze GTSCs, h weighted UniFrac distances where trees are computed using FastTree to analyze GTSCs, i unweighted UniFrac distances where trees are computed using ghost-tree to analyze FTSCs, j weighted UniFrac distances where trees are computed using ghost-tree to analyze FTSCs, k unweighted UniFrac distances where trees are computed using ghost-tree to analyze GTSCs, and l weighted UniFrac distances where trees are computed using ghost-tree to analyze GTSCs. Blue points are simulated and real human saliva samples, and red points are simulated and real restroom surface samples. Plots were made using EMPeror software [26]
Quantitative group comparisons using ANOSIM to analyze small effect sizes (within environments) of simulated human saliva and public restroom floor samples
| Test statistic ( |
| |
|---|---|---|
| Restroom samples | ||
| Non-phylogenetic methods | ||
| Jaccard to analyze FTSCs | 0.225 | 0.001 |
| Bray-Curtis to analyze FTSCs | 0.239 | 0.001 |
| Jaccard to analyze GTSCs | 0.053 | 0.001 |
| Bray-Curtis to analyze GTSCs | 0.056 | 0.001 |
| FastTree | ||
| Unweighted UniFrac with FastTree to analyze FTSCs | 0.673 | 0.001 |
| Weighted UniFrac with FastTree to analyze FTSCs | 0.798 | 0.001 |
| Unweighted UniFrac with FastTree to analyze GTSCs | 0.038 | 0.057 |
| Weighted UniFrac with FastTree to analyze GTSCs | -0.001 | 0.518 |
|
| ||
| Unweighted UniFrac with | 0.125 | 0.001 |
| Weighted UniFrac with | 0.073 | 0.001 |
| Unweighted UniFrac with | 0.619 | 0.001 |
| Weighted UniFrac with | 0.655 | 0.001 |
| Saliva samples | ||
| Non-phylogenetic methods | ||
| Jaccard to analyze FTSCs | 0.250 | 0.001 |
| Bray-Curtis to analyze FTSCs | 0.253 | 0.001 |
| Jaccard to analyze GTSCs | 0.032 | 0.001 |
| Bray-Curtis to analyze GTSCs | 0.032 | 0.001 |
| FastTree | ||
| Unweighted UniFrac with FastTree to analyze FTSCs | 0.852 | 0.001 |
| Weighted UniFrac with FastTree to analyze FTSCs | 0.756 | 0.001 |
| Unweighted UniFrac with FastTree to analyze GTSCs | 0.031 | 0.001 |
| Weighted UniFrac with FastTree to analyze GTSCs | 0.023 | 0.001 |
|
| ||
| Unweighted UniFrac with | 0.125 | 0.001 |
| Weighted UniFrac with | 0.068 | 0.001 |
| Unweighted UniFrac with | 0.524 | 0.001 |
| Weighted UniFrac with | 0.596 | 0.001 |
Note: For restroom sample diversity metrics, sample size is 160, and 16 groups were analyzed using 999 permutations. For saliva sample diversity metrics, sample size is 200, and 20 groups were analyzed using 999 permutations