| Literature DB >> 34599337 |
Fernando D K Tria1, William F Martin1.
Abstract
The contribution of gene duplications to the evolution of eukaryotic genomes is well studied. By contrast, studies of gene duplications in prokaryotes are scarce and generally limited to a handful of genes or careful analysis of a few prokaryotic lineages. Systematic broad-scale studies of prokaryotic genomes that sample available data are lacking, leaving gaps in our understanding of the contribution of gene duplications as a source of genetic novelty in the prokaryotic world. Here, we report conservative and robust estimates for the frequency of recent gene duplications within prokaryotic genomes relative to recent lateral gene transfer (LGT), as mechanisms to generate multiple copies of related sequences in the same genome. We obtain our estimates by focusing on evolutionarily recent events among 5,655 prokaryotic genomes, thereby avoiding vagaries of deep phylogenetic inference and confounding effects of ancient events and differential loss. We find that recent, genome-specific gene duplications are at least 50 times less frequent and probably 100 times less frequent than recent, genome-specific, gene acquisitions via LGT. The frequency of gene duplications varies across lineages and functional categories. The findings improve our understanding of genome evolution in prokaryotes and have far-reaching implications for evolutionary models that entail LGT to gene duplications ratio as a parameter.Entities:
Keywords: frequency of events; gene duplication; lateral gene transfer; prokaryote evolution
Mesh:
Year: 2021 PMID: 34599337 PMCID: PMC8536544 DOI: 10.1093/gbe/evab224
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.Schematic representation of the approaches used for inferences of recent gene transfers and recent gene duplications. (a) Recent gene transfers were inferred using the presence-absence distribution of genes (plus symbol) across prokaryotic genomes and the assignment of the genomes to taxa (triangles). A gene in a genome was considered to be the result of a gene transfer if no homologue was present in any other member from the same taxon. Genome-taxon assignments were performed using traditional prokaryotic classifications at different taxonomic levels: domain, phylum, class, order, family, genus, and species. (b) Recent gene duplications were inferred on the basis of gene trees and were identified as pairs of genes from the same genome (paralogs) that branch as sisters in the unrooted tree (h and h′ leaves). Genes from the same genome that do not branch as sisters (for instance the a leaves) were not scored since they may be the result of either ancient gene duplication followed by differential gene loss or ancient gene transfer.
Number of prokaryotic genes with recent gene duplication and recent LGT crossing different taxonomic boundaries (taxon level). The inferences were performed using all genomes (no filter) with taxonomic classifications available. To counter biases stemming from sparsely sample taxa, the analyses were repeated considering only genomes from taxa with ≥ 2 genomes and taxa with ≥ 6 genomes. Note that the genome set is variable at different taxonomic levels and only inferences of gene transfers are dependent upon taxonomic classifications. However, gene duplication inferences were performed on the same genome sets for comparisons. The number of genomes, number of taxa and the total number of genes distributed in the genome set are indicated.
| Taxon Level | Transferred Genes ( | Duplicated Genes ( |
| No. of Genomes | No. of Taxa | No. of Genes |
|---|---|---|---|---|---|---|
| No filter | ||||||
| Domain | 5,338 (2.0%) | 16,687 (6.4%) | 0.32 | 5,655 | 2 | 260,972 |
| Phylum | 54,457 (20.9%) | 16,643 (6.4%) | 3.27 | 5,652 | 34 | 260,972 |
| Class | 78,813 (30.8%) | 15,184 (5.9%) | 5.19 | 5,543 | 65 | 255,886 |
| Order | 111,173 (42.9%) | 16,369 (6.3%) | 6.79 | 5,584 | 149 | 259,218 |
| Family | 134,535 (51.6%) | 16,012 (6.1%) | 8.40 | 5,567 | 310 | 260,738 |
| Genus | 165,738 (63.5%) | 16,091 (6.2%) | 10.30 | 5,608 | 871 | 260,972 |
| Species | 227,974 (87.4%) | 16,687 (6.4%) | 13.66 | 5,655 | 2,370 | 260,972 |
| ≥ 2 genomes | ||||||
| Domain | 5,338 (2.0%) | 16,687 (6.4%) | 0.32 | 5,655 | 2 | 260,972 |
| Phylum | 50,383 (19.3%) | 16,563 (6.3%) | 3.04 | 5,646 | 28 | 260,972 |
| Class | 73,108 (28.6%) | 14,804 (5.8%) | 4.94 | 5,530 | 52 | 255,847 |
| Order | 101,692 (39.2%) | 15,401 (5.9%) | 6.60 | 5,555 | 120 | 259,194 |
| Family | 115,729 (44.4%) | 13,691 (5.3%) | 8.45 | 5,479 | 222 | 260,692 |
| Genus | 100,264 (38.8%) | 8,246 (3.2%) | 12.16 | 5,118 | 381 | 258,383 |
| Species | 50,696 (22.5%) | 741 (0.3%) | 68.42 | 3,765 | 480 | 224,990 |
| ≥6 genomes | ||||||
| Domain | 5,338 (2.0%) | 16,687 (6.4%) | 0.32 | 5,655 | 2 | 260,972 |
| Phylum | 46,472 (17.8%) | 16,393 (6.3%) | 2.83 | 5,620 | 20 | 260,455 |
| Class | 65,051 (25.6%) | 14,267 (5.6%) | 4.56 | 5,482 | 36 | 254,103 |
| Order | 89,309 (34.8%) | 13,795 (5.4%) | 6.47 | 5,433 | 81 | 256,763 |
| Family | 86,784 (34.3%) | 9,965 (3.9%) | 8.71 | 5,203 | 128 | 253,356 |
| Genus | 53,245 (23.7%) | 4,063 (1.8%) | 13.10 | 4,417 | 138 | 224,678 |
| Species | 15,129 (10.5%) | 146 (0.1%) | 103.62 | 2,821 | 140 | 143,745 |
Fig. 2.Quantification of recent gene transfers and recent gene duplications across 5,655 prokaryotic genomes. For each prokaryotic genome the number of gene duplications (horizontal axis) and gene transfers (vertical axis) are reported as fractions relative to the number of non-singleton genes (see Materials and Methods and fig. 1 for details on inferences). Recent gene transfers across different taxonomic ranges were distinguished: interdomain transfers (a, h, o), inter-phylum transfers (b, i, p), inter-class transfers (c, j, q), inter-order transfers (d, k, r), inter-family transfers (e, l, s), inter-genus transfers (f, m, t), and inter-species transfers (g, n, u). In (a–g), all genomes with taxonomic classifications were used. In (h–n), genomes belonging to taxa with less than one representative genome were discarded. In (o–u), genomes from taxa with less than five representative genomes were discarded. Inset upper numbers show the total number of genomes and taxa, respectively. t/d indicates the ratio of the mean fraction of transfers over the mean fraction of duplications. The color scale shows the number representative genomes affiliated to the same taxon (taxon size). See also supplementary figures 5–7, Supplementary Material online for the distribution plots in log scale.
Summary statistics for recent inter-species gene transfers and recent gene duplication in distinct bacterial (bottom) and archaeal (top) taxa with at least two representative genomes. The mean was taken across genomes for taxon and SD denotes the standard deviation. t/d is the ratio of mean fraction of LGT relative to the mean fraction gene duplications obtained for each taxon and a dash (‘—’) indicates lineages for which the ratio was not possible to estimate due to absence of detectable gene duplication in the genomes.
| Domain | Class | Fraction of Transfers | Fraction of Duplications | No. of Nonsingleton Genes | No. of Genomes | |
|---|---|---|---|---|---|---|
| Archaea | Mean (SD) | Mean (SD) | Mean (SD) | Total |
| |
| Thermoprotei | 7.19E−03 (1.17E−02) | 3.91E−04 (5.91E−04) | 2,480.9 (230.0) | 25 | 18.4 | |
| Archaeoglobi | 4.31E−02 (1.05E−02) | 1.07E−03 (2.77E−04) | 2,334.0 (56.6) | 2 | 40.4 | |
| Halobacteria | 2.42E−02 (2.99E−02) | 3.93E−04 (8.37E−04) | 2,687.0 (649.2) | 8 | 61.7 | |
| Methanobacteria | 2.27E−02 (1.45E−02) | 1.47E−04 (2.55E−04) | 2,276.0 (13.5) | 3 | 154.4 | |
| Methanococci | 2.21E−02 (6.49E−03) | 1.03E−03 (1.18E−03) | 1,736.6 (19.2) | 5 | 21.3 | |
| Methanomicrobia | 2.57E−02 (2.96E−02) | 9.84E−04 (1.85E−03) | 3,440.0 (236.9) | 14 | 26.1 | |
| Thermococci | 5.95E−02 (6.10E−02) | 1.06E−03 (1.83E−03) | 2,107.5 (194.1) | 4 | 55.9 | |
| Bacteria | Actinobacteria | 1.93E−02 (4.38E−02) | 2.80E−04 (8.24E−04) | 3,300.7 (1,761.1) | 337 | 68.9 |
| Aquificae | 5.36E−04 (7.58E−04) | 0.00E+00 (0.00E+00) | 1,864.5 (2.1) | 2 | ||
| Bacteroidia | 2.43E−02 (2.05E−02) | 2.72E−04 (3.76E−04) | 2,886.2 (1.100.1) | 20 | 89.3 | |
| Flavobacteriia | 2.48E−02 (4.21E−02) | 7.26E−04 (2.05E−03) | 2,683.1 (730.9) | 37 | 34.1 | |
| Chlamydiia | 4.44E−04 (1.59E−03) | 0.00E+00 (0.00E+00) | 926.6 (45.4) | 105 | — | |
| Cyanobacteria | 2.29E−02 (3.69E−02) | 6.65E−04 (1.79E−03) | 2,591.7 (810.6) | 23 | 34.4 | |
| Chlorobia | 2.63E−01 (2.02E−02) | 4.21E−03 (1.11E−03) | 2,269.5 (96.9) | 2 | 62.5 | |
| Dehalococcoidia | 1.25E−02 (7.07E−03) | 2.19E−04 (4.50E−04) | 1,402.7 (50.8) | 13 | 57.3 | |
| Deinococci | 3.13E−02 (7.48E−03) | 7.89E−04 (3.87E−04) | 2,471.8 (416.6) | 6 | 39.6 | |
| Fibrobacteria | 4.10E−03 (1.16E−03) | 0.00E+00 (0.00E+00) | 3,046.0 (2.8) | 2 | — | |
| Bacilli | 1.57E−02 (2.74E−02) | 3.61E−04 (9.00E−04) | 2,998.2 (1.263.9) | 833 | 43.6 | |
| Clostridia | 3.71E−02 (6.93E−02) | 8.78E−04 (2.40E−03) | 3,342.0 (767.1) | 72 | 42.2 | |
| Erysipelotrichia | 1.70E−02 (1.75E−02) | 0.00E+00 (0.00E+00) | 1,596.0 (142.1) | 4 | — | |
| Fusobacteriia | 2.90E−02 (1.81E−02) | 8.38E−04 (6.87E−04) | 2,116.6 (140.8) | 13 | 34.6 | |
| Nitrospira | 6.74E−02 (6.18E−03) | 9.24E−04 (6.60E−04) | 2,172.0 (21.2) | 2 | 73.0 | |
| Acidithiobacillia | 6.93E−02 (2.64E−02) | 2.15E−03 (1.89E−03) | 2,662.5 (53.9) | 4 | 32.2 | |
| Alphaproteobacteria | 2.69E−02 (4.73E−02) | 7.25E−04 (1.70E−03) | 3,230.8 (1.907.9) | 249 | 37.1 | |
| Betaproteobacteria | 2.15E−02 (4.78E−02) | 3.96E−04 (9.68E−04) | 4,262.0 (1.729.5) | 353 | 54.4 | |
| Deltaproteobacteria | 1.00E−01 (1.28E−01) | 1.34E−03 (1.79E−03) | 3,989.6 (2.133.0) | 20 | 74.9 | |
| Epsilonproteobacteria | 7.50E−03 (2.16E−02) | 3.27E−04 (6.79E−04) | 1,574.2 (144.3) | 221 | 23.0 | |
| Gammaproteobacteria | 1.58E−02 (3.20E−02) | 3.05E−04 (8.66E−04) | 4,103.4 (1.249.2) | 1,229 | 51.9 | |
| Spirochaetia | 2.35E−02 (3.12E−02) | 5.03E−04 (8.64E−04) | 2,075.6 (1.190.8) | 39 | 46.8 | |
| Mollicutes | 1.08E−02 (2.17E−02) | 7.01E−04 (2.48E−03) | 714.6 (148.5) | 102 | 15.3 | |
| Thermotogae | 1.67E−02 (3.03E−02) | 1.57E−04 (2.54E−04) | 1,897.6 (88.8) | 10 | 106.5 | |
| Verrucomicrobiae | 7.94E−02 (6.90E−03) | 6.59E−04 (9.32E−04) | 1,523.5 (7.8) | 2 | 120.6 |
Fig. 3.Effect of recent gene duplications and recent gene transfers to genome-size expansion in prokaryotes. The plot shows genome size, measured as the number of protein-coding genes (vertical axis), against the fraction (horizontal axis) of recent gene duplications (a) and recent gene transfers (b–h), using genomes affiliated to taxa with more than one representative genome (see panels h–n in fig. 2 for sample sizes). Insets: r denotes the Spearman correlation coefficients, and p denotes the FDR adjusted P-values from the two-tailed tests (see Materials and Methods).
Functional distribution of the genes analyzed in this study. All genes show the total number of annotated genes for each functional category. Genes with inter-species LGT (t) and genes with duplications (d) were scored only for species with at least 2 members. Functional annotations were performed using the KEGG database (see Methods for details). t/d denotes the ratio of transfers relative to duplications and FDR denotes the adjusted p-value from the one-tailed binomial test (enrichment test).
| KEGG Category (B Level) | All Genes | Transferred Genes ( | Duplicated Genes ( |
| FDR ( | FDR ( |
|---|---|---|---|---|---|---|
| Genetic information processing | 4,836 | 2,622 (54%) | 357 (7%) | 7.3 | 0.000 | 0.000 |
| Membrane transport | 19,982 | 9,283 (46%) | 325 (2%) | 28.6 | 0.509 | 1.000 |
| Carbohydrate metabolism | 4,831 | 2,435 (50%) | 130 (3%) | 18.7 | 0.000 | 0.001 |
| Replication and repair | 3,497 | 1,702 (49%) | 116 (3%) | 14.7 | 0.005 | 0.000 |
| Transcription | 7,244 | 3,932 (54%) | 105 (1%) | 37.4 | 0.000 | 1.000 |
| Poorly characterized | 6,211 | 2,560 (41%) | 94 (2%) | 27.2 | 1.000 | 1.000 |
| Amino acid metabolism | 3,772 | 2,054 (54%) | 85 (2%) | 24.2 | 0.000 | 0.212 |
| Metabolism | 4,257 | 2,106 (49%) | 75 (2%) | 28.1 | 0.000 | 1.000 |
| Transport and catabolism | 2,842 | 1,605 (56%) | 74 (3%) | 21.7 | 0.000 | 0.026 |
| Cellular community—prokaryotes | 3,985 | 1,771 (44%) | 62 (2%) | 28.6 | 1.000 | 1.000 |
| Cellular processes and signaling | 3,900 | 1,597 (41%) | 52 (1%) | 30.7 | 1.000 | 1.000 |
| Energy metabolism | 2,701 | 1,026 (38%) | 48 (2%) | 21.4 | 1.000 | 1.000 |
| Enzyme families | 3,732 | 1,387 (37%) | 44 (1%) | 31.5 | 1.000 | 1.000 |
| Cell motility | 3,619 | 1,383 (38%) | 41 (1%) | 33.7 | 1.000 | 1.000 |
| Glycan biosynthesis and metabolism | 3,348 | 1,513 (45%) | 41 (1%) | 36.9 | 1.000 | 1.000 |
| Metabolism of cofactors and vitamins | 2,440 | 1,122 (46%) | 41 (2%) | 27.4 | 1.000 | 1.000 |
| Xenobiotics biodegradation and metabolism | 1,602 | 941 (59%) | 41 (3%) | 23.0 | 0.000 | 0.136 |
| Signal transduction | 6,709 | 2,654 (40%) | 40 (1%) | 66.4 | 1.000 | 1.000 |
| Lipid metabolism | 2,859 | 1,402 (49%) | 37 (1%) | 37.9 | 0.004 | 1.000 |
| Nucleotide metabolism | 1,417 | 608 (43%) | 26 (2%) | 23.4 | 1.000 | 1.000 |
| Translation | 2,413 | 915 (38%) | 24 (1%) | 38.1 | 1.000 | 1.000 |
| Metabolism of terpenoids and polyketides | 1,472 | 734 (50%) | 24 (2%) | 30.6 | 0.006 | 1.000 |
| Folding, sorting, and degradation | 1,872 | 662 (35%) | 23 (1%) | 28.8 | 1.000 | 1.000 |
| Drug resistance | 1,754 | 778 (44%) | 23 (1%) | 33.8 | 1.000 | 1.000 |
| Metabolism of other amino acids | 744 | 357 (48%) | 21 (3%) | 17.0 | 0.345 | 0.136 |
| Biosynthesis of other secondary metabolites | 506 | 254 (50%) | 8 (2%) | 31.8 | 0.079 | 1.000 |
| Total | 102,545 | 47,403 (46%) | 1,957 (2%) | 24.2 | — | — |
Note.—Significantly enriched, FDR < 0.05.