| Literature DB >> 28985292 |
Celine Petitjean1, Kira S Makarova1, Yuri I Wolf1, Eugene V Koonin1.
Abstract
Origin of new biological functions is a complex phenomenon ranging from single-nucleotide substitutions to the gain of new genes via horizontal gene transfer or duplication. Neofunctionalization and subfunctionalization of proteins is often attributed to the emergence of paralogs that are subject to relaxed purifying selection or positive selection and thus evolve at accelerated rates. Such phenomena potentially could be detected as anomalies in the phylogenies of the respective gene families. We developed a computational pipeline to search for such anomalies in 1,834 orthologous clusters of archaeal genes, focusing on lineage-specific subfamilies that significantly deviate from the expected rate of evolution. Multiple potential cases of neofunctionalization and subfunctionalization were identified, including some ancient, house-keeping gene families, such as ribosomal protein S10, general transcription factor TFIIB and chaperone Hsp20. As expected, many cases of apparent acceleration of evolution are associated with lineage-specific gene duplication. On other occasions, long branches in phylogenetic trees correspond to horizontal gene transfer across long evolutionary distances. Significant deceleration of evolution is less common than acceleration, and the underlying causes are not well understood; functional shifts accompanied by increased constraints could be involved. Many gene families appear to be "highly evolvable," that is, include both long and short branches. Even in the absence of precise functional predictions, this approach allows one to select targets for experimentation in search of new biology. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2017. This work is written by US Government employees and is in the public domain in the US.Entities:
Keywords: archaea; clade-specific acceleration of evolution; evolutionary rate; neofunctionalization; subfunctionalization; tree anomalies
Mesh:
Substances:
Year: 2017 PMID: 28985292 PMCID: PMC5737733 DOI: 10.1093/gbe/evx189
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Computational pipeline for identification of evolutionary rate deviations from phylogenetic trees. The main steps of the computational pipeline developed and applied here for the detection of evolutionary rate deviations in phylogenetic trees of arCOGs are shown. (A) Detection of monophyletic groups (clades) in arCOG trees. The deepest clades satisfying the purity and coverage criteria are identified for all well-represented taxa in all arCOG trees. Empty circles indicate the tree clades that are analyzed; filled circles indicate the clades that were identified for the given taxa in the given tree. (B) Calculation of the distance deviation for selected clades. The table of observed clade heights is decomposed into the expected distances r and deviations e distance deviations (in the log scale) for all clades are recorded.
. 2.—Distribution of distance deviations across archaeal lineages and functional categories of genes. (A) Probability density functions of the distance deviations (log scale) for the clades belonging to the three major archaeal phyla (Euryarchaeota, Crenarchaeota, and Thaumarchaeota). (B) Probability density functions of the distance deviation (log scale) for the clades belonging to the four functional classes of arCOGs (Information Storage and Processing, Cellular Processes and Signaling, Metabolism, Poorly Characterized [PC]).
Top 20 arCOGs with Long Branches
| arCOG, Protein, Annotation | Clade with Long and/or Short Branches (rank) | Comments | |
|---|---|---|---|
| ↑(1) | Family specific to archaea and eukaryotes. Multiple duplications in different archaeal lineages. | ||
| ↓(2) | Sub- or neofunctionalization in the duplicated copies. See text for details. | ||
| ↓(3) | |||
| ↑(6) | |||
| ↓(6) | |||
| ↑(9) | |||
| ↑(11) | |||
| ↑(16) | |||
| ↑(2) | No evidence of HGT. The only duplication in archaea most likely occurred in the common ancestor of | ||
| ↓(75) | |||
| ↑(3) | Thermococcal sequences from the fast-evolving clade lack the N-terminal membrane-associated domain but are predicted to be active proteases. Probably regulated by a dedicated transcriptional regulator (arCOG05764), which is encoded in the same predicted operon. Likely sub- or neofunctionalization of this paralog. | ||
| ↓(14) | |||
| ↑(4) | The slow-evolving and the “regular-evolving” paralogs in | ||
| ↓(35) | |||
| ↑(5) | Probable HGT from bacteria to | ||
| ↓(66) | |||
| ↑(7) | Multiple lineage-specific paralogs in | ||
| ↑(8) | Duplication in the ancestor of | ||
| ↓(55) | |||
| ↑(10) | Possible HGT from bacteria. Same pattern as in arCOG04406 [↑(96) | ||
| ↑(12) | Probable duplication in the ancestor of | ||
| ↑(13) | |||
| ↑(14) | Patchy distribution in archaea, with many cases of HGT from bacteria including the fast-evolving | ||
| ↑(15) | Multiple paralogs. No evidence of HGT from bacteria. Likely neo- or subfunctionalization in both fast-evolving lineages. | ||
| ↑(23) | |||
| ↑(17) | |||
| ↓(27) | |||
| ↑(81) | |||
| ↑(18) | HGT of the entire operon from either | ||
| ↑(19) | A second, highly diverged copy, in addition to the typical archaeal version. Predicted to be an active enzyme. Lacks ∼80 aa N-terminal region responsible for dimerization ( | ||
| ↑(20) | HGT from bacteria to | ||
| ↑(21) | Casposon-associated Cas1. Clear case of neofunctionalization. Most likely, Cas1 exapted by the CRISPR-Cas system from a casposon ( | ||
| ↑(22) | See arCOG04143 above. | ||
| ↑(24) | Likely a single HGT from bacteria to the three taxa | ||
| ↑(25) | Multiple paralogs. HGT from bacteria to | ||
| ↓(23) |
Upward arrow shows an exceptionally long branch and downward arrow shows an exceptionally short branch.
For additional information, see ftp://ftp.ncbi.nlm.nih.gov/pub/wolf/_suppl/archdev.
Top 10 arCOGs with Short Branches
| Top 10 arCOGs with Short Branches | |||
|---|---|---|---|
| arCOG, Protein, Annotation | Clade with Short and/or Long Branches (rank) | Comments | |
| ↓(1) | Large family with multiple paralogs ( | ||
| ↑(35) | |||
| ↓(57) | |||
| ↓(4) | Large family with multiple paralogs. The short branches in | ||
| ↑(34) | |||
| ↓(5) | Likely xenologous gene displacement via HGT from | ||
| ↓(7) | Likely multiple HGT events between archaea and bacteria. Potential coevolution with uncharacterized protein of arCOG01766 family. Possible subfunctionalization. | ||
| ↓(8) | Probable HGT from bacteria to | ||
| ↓(9) | Duplication in the common ancestor of | ||
| ↑(44) | |||
| ↓(10) | Large family with multiple paralogs many apparent cases of HGT. Metal-binding site conserved in all enzymes ( | ||
| ↓(11) | Two paralogs in most of | ||
| ↓(12) | No paralogs, archaea-specific. In a conserved neighborhood with an uncharacterized Zn-finger containing protein (arCOG00578). No new function predicted, the cause behind strong purifying selection is not clear. | ||
| ↓(60) | |||
| ↓(13) | No paralogs, archaea-specific. No new function predicted, the cause behind strong purifying selection is not clear. | ||
Upward arrow shows an exceptionally long branch and downward arrow shows an exceptionally short branch.
For additional information, see ftp://ftp.ncbi.nlm.nih.gov/pub/wolf/_suppl/archdev.
. 3.—Phylogeny and conserved genomic neighborhoods of ribosomal protein S10 and its paralog. (A) Schematic phylogenetic tree of the S10 ribosomal protein family in Bacteria and Archaea. Collapsed branches are shown by triangles and denoted by the taxon name. Color code: Archaea, black; Bacteria, light blue. (B) Schematic phylogenetic tree of the S10 ribosomal protein family in Archaea. Collapsed branches are shown by triangles and denoted by the taxon name. The branch identified as fast evolving is shown by a green rectangle and the regular branch—by gray rectangle. Color code: Euryarchaeota, dark blue; Crenarchaeota, light blue; Thaumarchaeota and Caldiarchaeum subterraneum, magenta; Korarchaeota, brown; Nanoarchaeota, pink. The conserved gene neighborhoods are shown next to the respective branches. Genes in these neighborhoods are shown schematically by arrows (not to scale). Abbreviations: RpsJ, ribosomal protein S10; TEF1, translation elongation factor EF-1 alpha, GTPase; FusA, translation elongation factor G, EF-G (GTPase); RpsG, ribosomal protein S7; AbgB, metal-dependent amidase/aminoacylase/carboxypeptidase; 1078, arCOG01078 NUDIX family hydrolase.
. 4.—Phylogeny, conserved genomic neighborhoods, domain organization, and phyletic patterns of the TFIIB family. (A) Schematic phylogenetic tree of the TFIIB family. The maximum likelihood tree is from previous work (Makarova etal. 2015). Collapsed branches are shown by triangles and denoted by the corresponding taxon name. Branches for multiple Halobacterial paralogs are labelled by numbers (1–3) and letters (a–e). For functionally characterized genes of Halobacterium salinarum NRC‐1 (Turkarslan etal. 2014), the respective gene name is indicated in orange to the right of the corresponding branch. Branches identified in this work as fast evolving are shown by green rectangles, and slow evolving branches are shown by orange rectangles. Color code is the same as in figure 3. The scale bar indicates a “number of substitutions per site.” The “//” symbol indicates a long branch that is shown not to scale. Conserved gene neighborhoods are shown next to the respective branches (not included for branches in which the neighborhood was not conserved). Genes in these neighborhoods are depicted schematically by arrows (not to scale). Homologous genes are shown by the same color. Abbreviations: HTH, helix turn-helix; GAR1, small nucleolar RNP required for pre-mRNA processing; Hit, HIT family hydrolase. (B) Domain organization of TFIIB and its paralogs . (C) Phyletic patterns of TFIIB-like and TBP-like genes in Archaea. Filled circles show presence and empty circles show absence of the given arCOG in the respective archaeal lineage. The mean number of paralogs is indicated inside each circle. Asterisks indicate that the number of paralogs significantly varies in different genomes of the respective lineage.
. 5.—Schematic phylogenetic tree of the Hsp20 family. The maximum likelihood tree has been built in the course of the previous work (Makarova etal. 2015). Collapsed branches are shown by triangles and denoted by the taxon name and a number. Branch that was identified in this work as fast evolving is shown by a green rectangles and the regular branch—by gray rectangle. Color code is the same as in figure 3.
. 6.—Schematic phylogeny of TtdA (A) and FumA (B) families in Archaea and bacteria. Collapsed branches are shown by triangles and denoted by the taxon name. Branches identified as long is shown by a green rectangle and the regular branch—by gray rectangle. Color code: Methanobacteria, green; rest of the Archaea, black; Bacteria, light blue. Functions for the major clades are assigned according to Kronen etal. (Kronen and Berg 2015). The “//” symbol indicates a long branch that is shown not to scale. Two extremely long branches are interrupted; their actual lengths are ∼5-fold greater than shown (see Supplementary Material online). Conserved gene neighborhoods are shown next to the respective branches. Genes in these neighborhoods are shown schematically by arrows (not to scale). Homologous genes are shown by the same color. Abbreviations: PpsA, phosphoenolpyruvate synthase/pyruvate phosphate dikinase; MfnA, L-tyrosine decarboxylase, PLP-dependent protein; GltA, Citrate synthase.
. 7.—Schematic phylogeny of the IlvD family in Archaea and bacteria. Designations are the same as in figures 3–5. Collapsed branches are shown by triangles. IlvD, light blue, EDD, dark blue. Color code: Methanocellales, green; the rest of the archaea, black; bacteria, light blue. The “//” symbol indicates a long branch that is shown not to scale. Two extremely long branches are interrupted; their actual lengths are ∼2-fold greater than shown (see Supplementary Material online).