| Literature DB >> 20215432 |
Ramy K Aziz1, Mya Breitbart, Robert A Edwards.
Abstract
Genes, like organisms, struggle for existence, and the most successful genes persist and widely disseminate in nature. The unbiased determination of the most successful genes requires access to sequence data from a wide range of phylogenetic taxa and ecosystems, which has finally become achievable thanks to the deluge of genomic and metagenomic sequences. Here, we analyzed 10 million protein-encoding genes and gene tags in sequenced bacterial, archaeal, eukaryotic and viral genomes and metagenomes, and our analysis demonstrates that genes encoding transposases are the most prevalent genes in nature. The finding that these genes, classically considered as selfish genes, outnumber essential or housekeeping genes suggests that they offer selective advantage to the genomes and ecosystems they inhabit, a hypothesis in agreement with an emerging body of literature. Their mobile nature not only promotes dissemination of transposable elements within and between genomes but also leads to mutations and rearrangements that can accelerate biological diversification and--consequently--evolution. By securing their own replication and dissemination, transposases guarantee to thrive so long as nucleic acid-based life forms exist.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20215432 PMCID: PMC2910039 DOI: 10.1093/nar/gkq140
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Abundance of different functional roles in 2137 genomes plotted against the ubiquity of these functional roles (defined as the number of genomes in which the functional role is represented at least once). r, Pearson’s product moment correlation between abundance and ubiquity; Cys, cysteine; Thio, thioredoxin; ThioR, thioredoxin reductase. Proteins annotated solely based on their location or posttranslational modification but not their biological functions (e.g. membrane proteins, cytoplasmic proteins, secreted proteins, transmembrane proteins and generic lipoproteins) were excluded; an exception was the ‘outer membrane protein’ annotation as it describes specific bacterial proteins rather than protein localization.
The 20 most abundant non-hypothetical protein-encoding genes in all sequenced genomes
| Rank | Functional role | nG Count | V | A | B | E | C/n | % |
|---|---|---|---|---|---|---|---|---|
| 1 | Transposase | 693 | 15 (1.1%) | 31 (66%) | 630 (86.9%) | 17 (58.6%) | 38.42 | 0.83 |
| 26 625 | 21 | 736 | 25 226 | 642 | ||||
| 2 | ABC transporter, ATP-binding protein | 738 | 1 (<1%) | 39 (83%) | 682 (94.1%) | 16 (55.2%) | 12.71 | 0.29 |
| 9382 | 1 | 264 | 8998 | 119 | ||||
| 3 | Sensor histidine kinase | 574 | – | 22 (46.8%) | 550 (75.9%) | 2 (6.9%) | 9.71 | 0.17 |
| 5575 | 294 | 5276 | 5 | |||||
| 4 | DNA-binding response regulator | 578 | – | 13 (27.7%) | 562 (77.5%) | 3 (10.3%) | 8.20 | 0.15 |
| 4708 | 33 | 4669 | 6 | |||||
| 5 | Methyl-accepting chemotaxis protein | 408 | 1 (<1%) | 15 (31.9%) | 391 (53.9%) | 1 (3.4%) | 10.76 | 0.14 |
| 4389 | 1 | 64 | 4318 | 6 | ||||
| 6 | ABC transporter, permease protein | 580 | – | 33 (70.2%) | 545 (75.2%) | 2 (6.9%) | 7.55 | 0.14 |
| 4,377 | 137 | 4238 | 2 | |||||
| 7 | Glycosyltransferase (EC 2.4.1.-) | 649 | – | 41 (87.2%) | 598 (82.5%) | 10 (34.5%) | 6.43 | 0.13 |
| 4172 | 287 | 3863 | 22 | |||||
| 8 | Transcriptional regulator, LysR family | 441 | – | 8 (17%) | 430 (59.3%) | 3 (10.3%) | 9.15 | 0.13 |
| 4037 | 10 | 4017 | 10 | |||||
| 9 | Transcriptional regulator, TetR family | 535 | – | 14 (29.8%) | 521 (71.9%) | – | 6.93 | 0.12 |
| 3709 | 45 | 3664 | ||||||
| 10 | Acetyltransferase, GNAT family | 480 | – | 19 (40.4%) | 458 (63.2%) | 3 (10.3%) | 7.33 | 0.11 |
| 3516 | 53 | 3453 | 10 | |||||
| 11 | Transcriptional regulator, AraC family | 459 | 1 (<1%) | 7 (14.9%) | 450 (62.1%) | 1 (3.4%) | 7.37 | 0.11 |
| 3382 | 1 | 7 | 3373 | 1 | ||||
| 12 | Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3) | 589 | – | 31 (66%) | 533 (73.5%) | 25 (86.2%) | 5.08 | 0.09 |
| 2995 | 68 | 2728 | 199 | |||||
| 13 | Transcriptional regulator, MarR family | 546 | – | 23 (48.9%) | 522 (72%) | 1 (3.4%) | 5.32 | 0.09 |
| 2905 | 71 | 2831 | 3 | |||||
| 14 | Permeases of the major facilitator superfamily | 393 | 1 (<1%) | 12 (25.5%) | 375 (51.7%) | 5 (17.2%) | 6.95 | 0.09 |
| 2733 | 1 | 22 | 2701 | 9 | ||||
| 15 | Acetyltransferase (EC 2.3.1.-) | 559 | 4 | 22 (46.8%) | 532 (73.4%) | 5 (17.2%) | 4.36 | 0.08 |
| 2436 | 4 | 57 | 2374 | 5 | ||||
| 16 | Cysteine desulfurase (EC 2.8.1.7) | 783 | – | 36 (76.6%) | 722 (99.6%) | 25 (86.2%) | 3.02 | 0.07 |
| 2362 | 66 | 2239 | 57 | |||||
| 17 | 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100) | 706 | – | 27 (57.4%) | 665 (91.7%) | 14 (48.3%) | 2.80 | 0.06 |
| 1975 | 68 | 1863 | 44 | |||||
| 18 | Integrase | 534 | 70 (5.2%) | 11 (23.4%) | 448 (61.8%) | 5 (17.2%) | 3.43 | 0.06 |
| 1829 | 70 | 19 | 1729 | 11 | ||||
| 19 | Outer membrane protein | 415 | 1 (<1%) | 10 (21.3%) | 402 (55.4%) | 2 (6.9%) | 4.34 | 0.06 |
| 1803 | 1 | 12 | 1786 | 4 | ||||
| 20 | Permease of the drug/metabolite transporter (DMT) superfamily | 518 | – | 28 (59.6%) | 486 (67%) | 4 (13.8%) | 3.37 | 0.05 |
| 1746 | 53 | 1688 | 5 |
nG: number of genomes in which the functional role is present at least once; Count: number of genes in all sequenced genomes; V, A, B, E: viruses, archaea, bacteria, eukarya, respectively; C/n: average number of genes per positive genome; %: percentage of genes to the total number of genes in all genomes (n = 3 204 918).
aAcetyltransferase-like proteins that were missed in the automated analysis.
The 20 most abundant functional roles in metagenomes
| Rank | Functional role | nMG | nCAI |
|---|---|---|---|
| 1 | Transposase | 178 | 4026.17 |
| 2 | Retrotransposon-related p150 protein | 69 | 3412.12 |
| 3 | Viral structural protein | 126 | 1909.75 |
| 4 | ABC transporter, ATP-binding protein | 170 | 1528.03 |
| 5 | Replication-associated protein | 32 | 1481.67 |
| 6 | Photosystem II CP43 protein (PsbC) | 47 | 1429.44 |
| 7 | Photosystem II protein D2 (PsbD) | 71 | 1224.89 |
| 8 | Replication protein Rep | 66 | 1213.18 |
| 9 | Photosystem II protein D1 (PsbA) | 83 | 930.2 |
| 10 | Cytochrome b6-f complex subunit, cytochrome b6 | 51 | 925.32 |
| 11 | Viral nonstructural protein | 39 | 847.57 |
| 12 | ATP synthase alpha chain (EC 3.6.3.14) | 157 | 804.47 |
| 13 | Ribonucleotide reductase of class Ia (aerobic), alpha subunit (EC 1.17.4.1) | 165 | 776.57 |
| 14 | Thymidylate synthase thyX (EC 2.1.1.-) | 140 | 771.16 |
| 15 | Single-stranded DNA-binding protein | 151 | 769.41 |
| 16 | Major capsid protein | 100 | 745.51 |
| 17 | ATP synthase beta chain (EC 3.6.3.14) | 156 | 661.21 |
| 18 | UDP-glucose 4-epimerase (EC 5.1.3.2) | 169 | 657.36 |
| 19 | Ribonucleotide reductase of class Ia (aerobic), beta subunit (EC 1.17.4.1) | 150 | 652.32 |
| 20 | Integrase | 164 | 633.18 |
nMG: number of metagenomes in which the functional role is present at least once; nCAI: normalized cumulative abundance index. For each metagenome, a normalized abundance index (nAI) was calculated as the relative, length-normalized number of functional roles per million EGTs, and the nAI values for each functional role were added up to generate the normalized cumulative abundance index (nCAI).
Figure 2.The normalized cumulative abundance indices (nCAI) of different functional roles in 187 metagenomes plotted against the ubiquity of these functional roles (defined as the number of metagenomes in which the functional role is represented at least once). r, Pearson’s product moment correlation between abundance and ubiquity; DNA Pol, DNA polymerase; dTDP-G 4,6 DH, dTDP-glucose 4,6 dehydratase; Rep, replication-associated protein; RNR, ribonuleotide reductase; SSB, single-stranded DNA-binding protein; ThyX, thymidylate synthase thyX (EC 2.1.1.-); UDP-G 4-epi, UDP-glucose 4-epimerase.
The 20 most ubiquitous functional roles in metagenomes
| Rank | Functional role | nMG | % |
|---|---|---|---|
| 1 | Transposase | 178 | 95.19 |
| 2 | DNA polymerase I (EC 2.7.7.7) | 171 | 91.44 |
| 3 | dTDP-glucose 4,6-dehydratase (EC 4.2.1.46) | 170 | 90.91 |
| 4 | DNA polymerase III alpha subunit (EC 2.7.7.7) | 170 | 90.91 |
| 5 | ABC transporter, ATP-binding protein | 170 | 90.91 |
| 6 | UDP-glucose 4-epimerase (EC 5.1.3.2) | 169 | 90.37 |
| 7 | Heat shock protein 60 family chaperone GroEL | 167 | 89.30 |
| 8 | Chaperone protein DnaK | 167 | 89.30 |
| 9 | Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1) | 166 | 88.77 |
| 10 | Ribonucleotide reductase of class Ia (aerobic), alpha subunit (EC 1.17.4.1) | 165 | 88.24 |
| 11 | Replicative DNA helicase (EC 3.6.1.-) | 165 | 88.24 |
| 12 | Integrase | 164 | 87.70 |
| 13 | Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3) | 164 | 87.70 |
| 14 | Phosphate starvation-inducible protein PhoH, predicted ATPase | 163 | 87.17 |
| 15 | Carbamoyl-phosphate synthase large chain (EC 6.3.5.5) | 163 | 87.17 |
| 16 | DNA primase (EC 2.7.7.-) | 163 | 87.17 |
| 17 | Glycosyltransferase | 163 | 87.17 |
| 18 | Valyl-tRNA synthetase (EC 6.1.1.9) | 163 | 87.17 |
| 19 | Thymidylate synthase (EC 2.1.1.45) | 163 | 87.17 |
| 20 | ATP-dependent Clp protease ATP-binding subunit clpX | 162 | 86.63 |
nMG: number of metagenomes in which the functional role is present at least once; %: percentage of nMG to the total number of metagenomes analyzed (187).
Figure 3.Word clouds (created on http://www.wordle.net) representing (A) the 100 most abundant functional roles (Supplementary Table S3) and (B) the 100 most ubiquitous functional roles (Supplementary Table S4) in metagenomes. The font size of each functional role is proportional to its (A) abundance index or (B) number of metagenomes in which it is present.