| Literature DB >> 34730808 |
Thomas F M Cummings1, Kevin Gori2, Luis Sanchez-Pulido1, Gavriil Gavriilidis1, David Moi3,4,5, Abigail R Wilson1, Elizabeth Murchison2, Christophe Dessimoz3,4,5,6,7, Chris P Ponting1, Maria A Christophorou1,8.
Abstract
Protein posttranslational modifications add great sophistication to biological systems. Citrullination, a key regulatory mechanism in human physiology and pathophysiology, is enigmatic from an evolutionary perspective. Although the citrullinating enzymes peptidylarginine deiminases (PADIs) are ubiquitous across vertebrates, they are absent from yeast, worms, and flies. Based on this distribution PADIs were proposed to have been horizontally transferred, but this has been contested. Here, we map the evolutionary trajectory of PADIs into the animal lineage. We present strong phylogenetic support for a clade encompassing animal and cyanobacterial PADIs that excludes fungal and other bacterial homologs. The animal and cyanobacterial PADI proteins share functionally relevant primary and tertiary synapomorphic sequences that are distinct from a second PADI type present in fungi and actinobacteria. Molecular clock calculations and sequence divergence analyses using the fossil record estimate the last common ancestor of the cyanobacterial and animal PADIs to be less than 1 billion years old. Additionally, under an assumption of vertical descent, PADI sequence change during this evolutionary time frame is anachronistically low, even when compared with products of likely endosymbiont gene transfer, mitochondrial proteins, and some of the most highly conserved sequences in life. The consilience of evidence indicates that PADIs were introduced from cyanobacteria into animals by horizontal gene transfer (HGT). The ancestral cyanobacterial PADI is enzymatically active and can citrullinate eukaryotic proteins, suggesting that the PADI HGT event introduced a new catalytic capability into the regulatory repertoire of animals. This study reveals the unusual evolution of a pleiotropic protein modification.Entities:
Keywords: citrullination; enzyme; horizontal gene transfer; posttranslational modification
Mesh:
Year: 2022 PMID: 34730808 PMCID: PMC8826395 DOI: 10.1093/molbev/msab317
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
The Number and Proportion of Species Harboring a Putative PADI Ortholog.
| Group | NCBI Taxonomy ID | Unique Species with a PADI | Species with Proteomes in UniprotKB | Percentage of Species with a PADI |
|---|---|---|---|---|
| Bacteria | 2 | 295 | 38,842 | 0.76 |
| Cyanobacteria | 1,117 | 56 | 506 | 11.07 |
| Actinobacteria | 201,174 | 136 | 4,870 | 2.79 |
| Proteobacteria | 1,223 | 69 | 16,196 | 0.43 |
| Eukaryotes | 2,759 | 406 | 2,241 | 18.12 |
| Animals (Metazoa) | 33,208 | 229 | 612 | 37.42 |
| Insects | 50,557 | 0 | 142 | 0.00 |
| Worms (Annelida) | 6,340 | 0 | 2 | 0.00 |
| Fungi | 4,751 | 177 | 1,098 | 16.12 |
| Yeast (Ascomycota) | 4,890 | 176 | 760 | 23.16 |
| Yeast (Saccharomyces) | 4,930 | 0 | 13 | 0.00 |
| Plants (Viridiplantae) | 33,090 | 0 | 244 | 0.00 |
| Opisthokonta (metazoa and fungi) | 33,208 and 4,751 | 406 | 1,710 | 23.74 |
| Pre-opisthokonta (eukarya, not metazoa or fungi) | 2,759 and NOT (33,208|4,751) | 0 | 531 | 0.00 |
| Archaea | 2,157 | 1 | 2,107 | 0.05 |
| Viruses | 10,239 | 1 | 99,210 | 0.001 |
Note.—HMM searches (https://www.ebi.ac.uk/Tools/hmmer, last accessed June 7, 2020) for similarity to the vertebrate PAD_C domain from human PADI2, were carried out using HmmerWeb version 2.41.1 against the UniProtKB (v.2019_09) database. Unique species with significant sequence similarity (E-value <1×10−3) are presented. Proportions are given relative to the total number of species in within UniProtKB, for each group.
Fig. 1.Phylogeny of the PADI sequence. (a) Consensus topology for all phylogenetic methods with branch lengths from Bayesian phylogenetic inference with MrBayes. Solid circles indicate consensus node support of >95%. (b) Summary table of the different phylogenetic analyses performed corresponding to trees shown in full in the Supplementary Material online. Ultrafast bootstrap 2 values with 1,000 replicates for trees 1, 3, 4, 5, 6, 7; Felsenstein bootstrap values with 100 replicates for tree 2; or posterior probabilities for trees 10 and 11 are presented in the table for the nodes labeled in the tree that are critical to different evolutionary scenarios. Log likelihoods and the Bayesian information criterion are presented for all maximum likelihood trees. In addition, maximum likelihood constraint trees 8 and 9 were constructed where opisthokonta were constrained to be monophyletic under the maximum likelihood models used for tree 1 and tree 5. Trees were concatenated and analyzed using the AU-test with 10,000 replicates. Nomenclature for the different models is as used in IQtree 1.6.12. The best supported maximum likelihood tree and the Bayesian trees are shown in bold.
Fig. 2.Synapomorphic features among PADI orthologs. (a) Alignment of putative PAD_N domains from SPM/NX clade cyanobacterial PADI sequences with the PAD_N domain from human PADI paralogs and Rhincodon typus (whale shark). The coloring scheme indicates the average BLOSUM62 scores of each alignment column: red (>3.5), violet (between 3.5 and 2), and light yellow (between 2 and 0.5). Peach arrows shown below the cyanobacterial sequences indicate PsiPred predicted secondary structure (beta sheets). Green arrows (beta sheets) correspond to the known secondary structure of the PAD_N domain of human PADI2. (b) Analysis of synapomorphic regions, representing six PADI sequences from each of metazoa, cyanobacteria, actinobacteria, and fungi. Consensus sites across the six species are shown with standard single letter amino acid abbreviations. “nc” (nonnonserved) represents the absence of consensus conservation to one or two amino acids across the six species. The numbering given above the alignment and corresponds to the ungapped site of human PADI2 such that residues can be compared with Slade et al. Sites showing conservation across all four domains are colored in green; sequence features common to metazoan and cyanobacterial PADIs that are excluded from fungal/actinobacterial sequences are colored in purple; sequence features common to fungal and actinobacterial PADIs that are excluded from metazoanand cyanobacterial sequences are colored in yellow. The existence of both purple and yellow sequence features is indicative of synapomorphic primary sequence features. (c) Crystal structure of human PADI2 presented with PAD_N domain colored in black, PAD_M domain in gray, and PAD_C domain in white. Synapomorphic regions are colored in cyan and calcium ions are shown as yellow spheres.
Fig. 3.Sequence divergence analyses. (a, b) Analysis of the sequence divergence of 26 vertically transferred proteins, 19 candidate EGT proteins, and ten proteins encoded in the mitochondrial genome. (a) Box and whisker plot showing the calculated AGD between Cyanothece sp. PCC 8801 and Branchiostoma floridae relative to Homo sapiens. (b) Box and whisker plot showing the normalized Δbitscore between B. floridae and H. sapiens. The cross represents the mean. All protein values are plotted with outliers exceeding 1.5× the interquartile range shown. The null hypothesis that PADIs fall within the normal distribution of each set of proteins was rejected with P < 0.0001 denoted as ***; or P < 0.05 denoted as *. (c, d) Estimated divergence time of late diverging SPM/NX clade cyanobacteria and metazoa based on their PADI sequences, as calibrated using geologically defined constraints from the fossil record. Metazoan and SPM/NX DNA sequences were used for Bayesian phylogenetic analysis in BEAST2 under the strict clock and the UCLN clock models. A calibrated Yule model was used as the tree prior using a GTR model with five gamma distributed rate categories. Divergence times from the fossil record were used as normally distributed node age priors centered on the median ages of six different nodes from metazoa with a sigma value covering the uncertainty of the estimate. The marginal posterior distribution of the age of the root of the whole tree was used to estimate the divergence time. (c) Box and whisker plot for the estimate divergence time from each analysis showing two independent runs per analysis. (d) Kernel density estimate for each analysis showing two independent runs per analysis. (e) Table of summary statistics for the estimated divergence time.
Fig. 4.Biochemical analyses of the cyanobacterial PADI enzyme from Cyanothece sp. 8801 (cyanoPADI). (a) The citrullination reaction results in the converison of a positively charged peptidyl arginine residue to a neutral peptidyl citrulline and it is carried out by PADI enzymes in a calcium-dependent manner. (b, c) Immunoblot analyses of citrullination assays using GST-His-tagged recombinant enzymes. (b) Whole cell lysates from mouse embryonic stem cells were used as substrate and the presence of citrullination in a protein sequence-independent manner was assessed using the ModCit antibody. Nucleophosmin (NPM1) is used as a loading control. (c) Recombinant human histone H3 was used as substrate and citrullination of H3 arginine 2 was assessed. Total histone H3 is used as loading control.
Fig. 5.Proposed model of PADI evolution. Domain architecture is denoted in the figure legend. Horizontal transfer of the three-domain sequence from cyanobacteria to metazoa denoted by a black arrow, likely horizontal transfer of the two-domain sequence from actinobacteria to fungi denoted by a dark gray arrow and transfer of the mitochondrion to the LECA denoted by a light gray arrow. Proposed origin for the PADI sequence is within bacterial evolution and emergence of the three-domain PADI is within the SPX/NM cyanobacterial clade. Gene losses observed in various metazoan lineages after the HGT are indicated with a narrow dashed line.