| Literature DB >> 30976019 |
Hyeonsoo Jeong1,2, Bushra Arif3, Gustavo Caetano-Anollés4, Kyung Mo Kim5, Arshan Nasir6.
Abstract
Horizontal gene transfer (HGT) is widespread in the evolution of prokaryotes, especially those associated with the human body. Here, we implemented large-scale gene-species phylogenetic tree reconstructions and reconciliations to identify putative HGT-derived genes in the reference genomes of microbiota isolated from six major human body sites by the NIH Human Microbiome Project. Comparisons with a control group representing microbial genomes from diverse natural environments indicated that HGT activity increased significantly in the genomes of human microbiota, which is confirmatory of previous findings. Roughly, more than half of total genes in the genomes of human-associated microbiota were transferred (donated or received) by HGT. Up to 60% of the detected HGTs occurred either prior to the colonization of the human body or involved bacteria residing in different body sites. The latter could suggest 'genetic crosstalk' and movement of bacterial genes within the human body via hitherto poorly understood mechanisms. We also observed that HGT activity increased significantly among closely-related microorganisms and especially when they were united by physical proximity, suggesting that the 'phylogenetic effect' can significantly boost HGT activity. Finally, we identified several core and widespread genes least influenced by HGT that could become useful markers for building robust 'trees of life' and address several outstanding technical challenges to improve the phylogeny-based genome-wide HGT detection method for future applications.Entities:
Mesh:
Year: 2019 PMID: 30976019 PMCID: PMC6459891 DOI: 10.1038/s41598-019-42227-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1HGT detection workflow. From a large pool of available completely sequenced genomes, non-redundant genomes are filtered and selected for downstream analysis. Putative orthologous gene sets and corresponding reference species trees are then reconstructed based on different criteria (e.g. NJ, ML, and other approaches[16]). Gene sets are called ‘putative’ orthologs as they are subjected to downstream tests for HGT participation. Each gene-species tree pair is evaluated for topological incongruence (see the dark shaded area in trees). Tree conflicts can arise from any of the following gene family evolution events: (i) duplication, (ii) HGT, and (iii), gene loss, commonly known as the duplication-transfer-loss (DTL) problem[20]. Out of the most parsimonious reconciliation (in terms of total cost of gene family evolution events)[20], conflicts arising from transfer are stored for further analysis.
Figure 2Genus and species composition of studied body sites. Six-way Venn diagrams describe the genus (A) and species (B) composition of each body site and its combinations with other body sites in the HMP-genomes dataset. Histograms below give the total count of total genera and species present in each body site. Genome names having distinct suffixes following “sp.” were treated as different species (Supplementary Table S2). Diagram generated using online version of the jvenn program[80] available from (http://jvenn.toulouse.inra.fr/app/index.html).
Composition of HMP and HGTree derived datasets used in this study. HGT-genes produced detectable conflict during gene and species tree reconciliation and this conflict was evaluated to be a result of HGT rather than gene duplication and loss (two other competing scenarios for gene family evolution), as evaluated by RANGER-DTL (ver. 1.0) software[20].
| Dataset |
|
|
|
|
|
| # Gene sets |
| # HGT events |
|---|---|---|---|---|---|---|---|---|---|
|
| 1,059 | 8 | 152 | 591 | 2 | 1,057 | 81,357 | 55,059 | 511,330 |
|
| 2,472 | 41 | 699 | 1,321 | 156 | 2,316 | 154,805 | 93,028 | 660,894 |
a, number of genomes,
b, number of distinct phyla,
c, number of distinct genera,
d, number of distinct species,
e, number of archaeal genomes,
f, number of bacterial genomes.
Figure 3The many faces of HGT. The intra-niche HGT events occur between genomes occupying the same body site either in unique or mixed phylogenetic trees and involve either one-to-one (A), one-to-many (or many-to-one) (B), or many-to-many gene transfers (C). The inter-niche HGT events occur among genomes occupying different body sites and involve one-to-one (D), one-to-many (or many-to-one) (E), or many-to-many (F) transfers, as illustrated on the trees.
HGT events detected in the microbial genomes of each body site.
| Body sites | # Genomes | # total genera | # total phyla |
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Airways | 49 | 16 | 4 | 34 | 3708 | 3742 | 98559 | 26.34 |
| Blood | 45 | 8 | 3 | 6 | 465 | 471 | 50360 | 106.92 |
| GI tract | 452 | 99 | 7 | 16301 | 139668 | 155969 | 219505 | 1.41 |
| Oral | 244 | 64 | 8 | 1080 | 37201 | 38281 | 216387 | 5.65 |
| Skin | 123 | 16 | 4 | 38 | 2843 | 2881 | 91880 | 31.89 |
| UG tract | 146 | 41 | 7 | 206 | 5366 | 5572 | 157006 | 28.18 |
a, number of intra-niche gene transfers detected in unique gene sets,
b, number of intra-niche gene transfers detected in mixed gene sets,
c, sum of a and b,
d, number of inter-niche gene transfers detected in mixed gene sets,
e, ratio (d/c).
Figure 4Evaluation of phylogenetic versus spatial effect. HGT ratio represents the total number of one-to-one (A), one-to-many (B), and many-to-many (C) HGT events detected on a gene tree divided by the total number of HGT events (i.e. the sum of one-to-one, one-to-many, and many-to-many) detected on that gene tree. Phylogenetically similar microorganisms (PS) belong to the same genus. Phylogenetically diverse microorganisms (PD) belong to different genera. Similar habitat (SH) implies microorganisms harboring the same body site or niche. Different habitats (DH) imply microorganisms residing in different body sites or niches. See Supplementary Table S6 for P-values, pairwise Mann–Whitney U test.
Figure 5Timing of detected HGT events. Protein sequence identity decreases in the order, one-to-one, one-to-many, and many-to-many for each pair of sequences involved in gene transfer. All comparisons were statistically significant (P < 2.2e-16 for all comparisons, Mann–Whitney U test).
Counts (#) of total, intra-niche, and inter-niche HGT events detected in mixed gene sets comprising genomes from only two distinct body sites. For six body sites, a total of 15 possible combinations existed involving only two body sites (for full list of combinations see Supplementary Table S7). Data sorted by the counts of inter-niche HGTs in a descending manner. Body sites including the GI tract are highlighted in bold font.
| Body site combinations | # HGTs | ||
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Oral, UG tract | 1095 | 557 | 538 |
|
|
|
|
|
|
|
|
|
|
| Airway, Oral | 491 | 237 | 254 |
| Skin, UG tract | 322 | 77 | 245 |
| Blood, Oral | 246 | 97 | 149 |
| Oral, Skin | 171 | 48 | 123 |
| Airway, Skin | 75 | 42 | 33 |
| Blood, UG tract | 29 | 7 | 22 |
| Airway, Blood | 23 | 5 | 18 |
| Blood, Skin | 6 | 0 | 6 |
| Airway, UG tract | 8 | 3 | 5 |
Figure 6Network visualization of species whose genomes were present in two or more human body sites. A total of 918 genome pairs matched with ANI similarity >95% in different body sites[24]. Data was visualized using Cytoscape ver. 3.6.1[79]. Nodes and edges indicate genomes and links between genomes, respectively. Nodes in red, blue, cyan, green, grey, and yellow represent genomes from airways, blood, GI tract, oral cavity, skin, and urogenital tract, respectively. The visualization resulted in 54 species networks (see the upper left corner), while six major networks (C1 to C6) were magnified for emphasis (consist of >10 genomes, 15 through 71 genomes).
Figure 7HGT activity increases significantly in human-associated microbes. (A) Box plots displaying the distribution of HGT-index for HMP-genomes in six body sites, and HGT-C (included a total of 2,440 genomes after excluding 32 identical genomes that were part of HMP proteomes) and HGT-R (included only 402 proteomes not belonging to any of the 8 HMP phyla) datasets extracted from HGTree-genomes. Numbers in parenthesis indicate total number of genomes in each dataset. Statistically significant (Welch’s two-tailed t-test with unequal variances, P < 0.05) comparisons are indicated in different letters (in italics) on each plot. (B) Box plots comparing HGT-index distributions for genomes belonging to phyla common between HMP- and HGTree-genomes. All comparisons were statistically significant (Welch’s two-tailed t-test with unequal variances, P < 0.05). PB, Proteobacteria (n = 214 HMP-genomes vs. 1,037 HGTree-genomes); FM, Firmicutes (470 vs. 518); AB, Actinobacteria (197 vs. 243); BT, Bacteroidetes (128 vs. 86); EY, Euryarchaeota (2 vs. 101); SN, Synergistetes (6 vs. 4); FS, Fusobacteria (25 vs. 6); SP, Spirochaetes (17 vs. 43).
Significantly enriched biological process GO terms in the top 10% frequently transferred genes (FTGs). Data sorted by the number of GO terms in a descending manner. FDR, false discovery rate.
| GO ID | # |
| GO description | |
|---|---|---|---|---|
| GO:0008152 | 205 | 2.80E-08 | 9.20E-07 | metabolic process |
| GO:0044710 | 132 | 9.40E-10 | 9.10E-08 | single-organism metabolic process |
| GO:0006807 | 107 | 2.00E-06 | 2.30E-05 | nitrogen compound metabolic process |
| GO:1901360 | 84 | 1.1 E-04 | 8.4 E-04 | organic cyclic compound metabolic process |
| GO:0046483 | 82 | 8.20E-05 | 7.00E-04 | heterocycle metabolic process |
| GO:0034641 | 82 | 3.2 E-04 | 0.0019 | cellular nitrogen compound metabolic process |
| GO:0006725 | 81 | 2.6 E-04 | 0.0017 | cellular aromatic compound metabolic process |
| GO:0055114 | 77 | 8.30E-10 | 9.10E-08 | oxidation-reduction process |
| GO:0044281 | 67 | 9.30E-08 | 2.60E-06 | small molecule metabolic process |
| GO:1901564 | 66 | 3.20E-07 | 5.10E-06 | organonitrogen compound metabolic process |
| GO:0006139 | 61 | 0.0051 | 0.022 | nucleobase-containing compound metabolic process |
| GO:1901566 | 47 | 4.10E-07 | 5.80E-06 | organonitrogen compound biosynthetic process |
| GO:0016491 | 44 | 0.0058 | 0.046 | oxidoreductase activity |
| GO:0044283 | 37 | 1.00E-08 | 5.00E-07 | small molecule biosynthetic process |
| GO:0044711 | 37 | 1.10E-07 | 2.60E-06 | single-organism biosynthetic process |
| GO:0019752 | 37 | 2.10E-05 | 2.1 E-04 | carboxylic acid metabolic process |
| GO:0043436 | 37 | 2.60E-05 | 2.4 E-04 | oxoacid metabolic process |
| GO:0006082 | 37 | 3.10E-05 | 2.7 E-04 | organic acid metabolic process |
| GO:0006520 | 34 | 3.40E-07 | 5.20E-06 | cellular amino acid metabolic process |
| GO:0044765 | 31 | 0.012 | 0.048 | single-organism transport |
| GO:0006259 | 30 | 1.2 E-04 | 8.4 E-04 | DNA metabolic process |
| GO:0016053 | 26 | 6.00E-07 | 7.30E-06 | organic acid biosynthetic process |
| GO:0046394 | 26 | 6.00E-07 | 7.30E-06 | carboxylic acid biosynthetic process |
| GO:0008652 | 25 | 1.00E-08 | 5.00E-07 | cellular amino acid biosynthetic process |
| GO:1901605 | 25 | 2.50E-08 | 9.20E-07 | alpha-amino acid metabolic process |
| GO:0055086 | 23 | 1.00E-04 | 8.00E-04 | nucleobase-containing small molecule metabolic process |
| GO:0016616 | 19 | 2.00E-10 | 2.20E-08 | oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor |
| GO:0016614 | 19 | 1.30E-08 | 6.10E-07 | oxidoreductase activity, acting on CH-OH group of donors |
| GO:0006812 | 19 | 0.012 | 0.047 | cation transport |
| GO:1901607 | 17 | 1.70E-07 | 3.10E-06 | alpha-amino acid biosynthetic process |
| GO:0006753 | 17 | 0.0022 | 0.011 | nucleoside phosphate metabolic process |
| GO:0016741 | 17 | 5.5 E-04 | 0.012 | transferase activity, transferring one-carbon groups |
| GO:0009117 | 16 | 0.0039 | 0.017 | nucleotide metabolic process |
| GO:0048037 | 14 | 0.0014 | 0.019 | cofactor binding |
| GO:0006732 | 12 | 0.0028 | 0.013 | coenzyme metabolic process |
| GO:0016747 | 12 | 0.0014 | 0.019 | transferase activity, transferring acyl groups other than amino-acyl groups |
| GO:0006790 | 11 | 3.00E-04 | 0.0019 | sulfur compound metabolic process |
| GO:0009110 | 11 | 4.5 E-04 | 0.0025 | vitamin biosynthetic process |
| GO:0042364 | 11 | 4.5 E-04 | 0.0025 | water-soluble vitamin biosynthetic process |
| GO:0006766 | 11 | 6.5 E-04 | 0.0032 | vitamin metabolic process |
| GO:0006767 | 11 | 6.5 E-04 | 0.0032 | water-soluble vitamin metabolic process |
| GO:0016835 | 11 | 0.0035 | 0.034 | carbon-oxygen lyase activity |
| GO:0004803 | 10 | 1.70E-08 | 6.10E-07 | transposase activity |
| GO:0032196 | 10 | 1.60E-07 | 3.10E-06 | transposition |
| GO:0006313 | 10 | 1.60E-07 | 3.10E-06 | transposition, DNA-mediated |
| GO:0006310 | 10 | 3.50E-06 | 3.80E-05 | DNA recombination |
| GO:0044272 | 10 | 9.90E-05 | 8.00E-04 | sulfur compound biosynthetic process |
| GO:0009108 | 10 | 0.0025 | 0.012 | coenzyme biosynthetic process |
| GO:0050662 | 10 | 0.0018 | 0.022 | coenzyme binding |
| GO:0016836 | 9 | 6.8 E-04 | 0.012 | hydro-lyase activity |
| GO:0009066 | 8 | 1.30E-05 | 1.3 E-04 | aspartate family amino acid metabolic process |
| GO:0072527 | 8 | 4.4 E-04 | 0.0025 | pyrimidine-containing compound metabolic process |
| GO:0006575 | 8 | 0.0071 | 0.029 | cellular modified amino acid metabolic process |
| GO:0072528 | 7 | 5.4 E-04 | 0.0028 | pyrimidine-containing compound biosynthetic process |
| GO:0009401 | 7 | 5.4 E-04 | 0.0028 | phosphoenolpyruvate-dependent sugar phosphotransferase system |
| GO:0016407 | 7 | 1.1 E-04 | 0.0031 | acetyltransferase activity |
| GO:0008643 | 7 | 0.0037 | 0.016 | carbohydrate transport |
| GO:0042398 | 7 | 0.0037 | 0.016 | cellular modified amino acid biosynthetic process |
| GO:0008509 | 7 | 0.0041 | 0.035 | anion transmembrane transporter activity |
| GO:0015074 | 6 | 1.2 E-04 | 8.4 E-04 | DNA integration |
| GO:0009067 | 6 | 1.2 E-04 | 8.4 E-04 | aspartate family amino acid biosynthetic process |
| GO:0016410 | 5 | 0.002 | 0.022 | N-acyltransferase activity |
| GO:0009072 | 5 | 0.007 | 0.029 | aromatic amino acid family metabolic process |
| GO:0016645 | 5 | 0.0037 | 0.034 | oxidoreductase activity, acting on the CH-NH group of donors |
| GO:0015291 | 5 | 0.0065 | 0.047 | secondary active transmembrane transporter activity |
Figure 8Widespread and core genes in human microbiota. (A) Venn diagram highlights the distribution of widespread genes in each body site. Widespread genes defined by genes present in >70% of genomes of that body site or its combinations. (B) Bar plots illustrate the proportion of COG functional categories mapped to total widespread genes in each body site. (C) Pie chart indicate the enrichment of COG functional categories in core genes that by definition were widespread in all six body sites. (D) Box plots compare HGT-index distributions of core genes, as distinguished by COG categories. HGT-index is the number of HGT events detected on a gene tree divided by the total number of taxa in that gene tree.