| Literature DB >> 30893937 |
Matthew S Fullmer1,2, Matthew Ouellette3, Artemis S Louyakis4, R Thane Papke5, Johann Peter Gogarten6.
Abstract
Restriction⁻modification (RM) systems in bacteria are implicated in multiple biological roles ranging from defense against parasitic genetic elements, to selfish addiction cassettes, and barriers to gene transfer and lineage homogenization. In bacteria, DNA-methylation without cognate restriction also plays important roles in DNA replication, mismatch repair, protein expression, and in biasing DNA uptake. Little is known about archaeal RM systems and DNA methylation. To elucidate further understanding for the role of RM systems and DNA methylation in Archaea, we undertook a survey of the presence of RM system genes and related genes, including orphan DNA methylases, in the halophilic archaeal class Halobacteria. Our results reveal that some orphan DNA methyltransferase genes were highly conserved among lineages indicating an important functional constraint, whereas RM systems demonstrated patchy patterns of presence and absence. This irregular distribution is due to frequent horizontal gene transfer and gene loss, a finding suggesting that the evolution and life cycle of RM systems may be best described as that of a selfish genetic element. A putative target motif (CTAG) of one of the orphan methylases was underrepresented in all of the analyzed genomes, whereas another motif (GATC) was overrepresented in most of the haloarchaeal genomes, particularly in those that encoded the cognate orphan methylase.Entities:
Keywords: DNA methylase; HGT; archaea; epigenetics; gene transfer; haloarchaea; methylation; restriction; selfish genes
Mesh:
Substances:
Year: 2019 PMID: 30893937 PMCID: PMC6471742 DOI: 10.3390/genes10030233
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Collapsed homologous group descriptions $.
| Alpha Code | Numerical Code | Annotated arCOG Function $$ | arCOG Number |
|---|---|---|---|
| A | cHG_021 | T_I_M | arCOG02632 |
| B | cHG_024 | T_I_M | arCOG05282 |
| C | cHG_018 | T_I_R | arCOG00880 |
| D | cHG_034 | T_I_R | arCOG00879 |
| E | cHG_045 | T_I_R | arCOG00878 |
| F | cHG_006 | T_I_S | arCOG02626 |
| G | cHG_025 | T_I_S | arCOG02628 |
| H | cHG_036 | probable_T_II_M | arCOG00890 |
| I | cHG_001 | T_II_M | arCOG02635 |
| J | cHG_003 | T_II_M | arCOG02634 |
| K | cHG_011 | T_II_M | arCOG04814 |
| L | cHG_033 | T_II_M | arCOG03521 |
| M | cHG_007 | T_II_R | arCOG11279 |
| N | cHG_013 | T_II_R | arCOG11717 |
| O | cHG_023 | T_II_R | arCOG03779 |
| P | cHG_029 | T_II_R | arCOG08993 |
| Q | cHG_042 | Adenine_DNA_methylase_probable_T_III_M | arCOG00108 |
| R | cHG_008 | T_III_R | arCOG06887 |
| S | cHG_009 | T_III_R_probable | arCOG07494 |
| T | cHG_014 | Adenine_DNA_methylase | arCOG00889 |
| U | cHG_022 | DNA_methylase | arCOG00115 |
| V | cHG_027 | DNA_methylase | arCOG00129 |
| W | cHG_031 | dam_methylase | arCOG03416 |
| X | cHG_035 | probable_RMS_M | arCOG08990 |
| Y | cHG_044 | dcm_methylase | arCOG04157 |
| Z | cHG_048 | Adenine_DNA_methylase | arCOG02636 |
| AA | cHG_010 | RNA_methylase | arCOG00910 |
| AB | cHG_040 | SAM-methylase | arCOG01792 |
| AC | cHG_012 | RestrictionEndonuclease | arCOG05724 |
| AD | cHG_038 | PredictedRestrictionEndonuclease | arCOG06431 |
| AE | cHG_015 | HNH_endonuclease | arCOG07787 |
| AF | cHG_019 | Endonuclease | arCOG02782 |
| AG | cHG_020 | Endonuclease | arCOG02781 |
| AH | cHG_004 | HNH_endonuclease | arCOG09398 |
| AI | cHG_037 | HNH_nuclease | arCOG05223 |
| AJ | cHG_039 | HNH_nuclease | arCOG03898 |
| AK | cHG_041 | HNH_nuclease | arCOG08099 |
| AL | cHG_046 | MBF1 | arCOG01863 |
| AM | cHG_028 | CBS_domain | arCOG00608 |
| AN | cHG_005 | MarR | arCOG03182 |
| AO | cHG_030 | ParB-like nuclease | arCOG01875 |
| AP | cHG_016 | GVPC | arCOG06392 |
| AQ | cHG_002 | ASCH domain RNA binding | arCOG01734 |
| AR | cHG_017 | Uncharacterized | arCOG10082 |
| AS | cHG_026 | Uncharacterized | arCOG13171 |
| AT | cHG_032 | Uncharacterized | arCOG08946 |
| AU | cHG_043 | Uncharacterized | arCOG08856 |
| AV | cHG_047 | Uncharacterized | arCOG04588 |
$: A listing of associated Gene Ontology terms and gene family descriptions is available in Table S2. $$: T_I and T_II denote type I and type II restriction enzymes, respectively. M, R, and S denote the methylase, restriction endonuclease, and specificity subunits, respectively.
Figure 1Distribution of collapsed Homologous Group (cHG) among haloarchaeal genomes. (A) The number of genomes present in each collapsed Homologous Group (cHG). No cHG contains a representative from every genome used in this study. With the exception of one cHG, all contain members from fewer than half of the genomes. The cHGs are ordered by number of genomes they contain. (B) Rarefaction plot of the number of genomes represented as cHGs accumulate. A 95% confidence interval is shown in shaded blue area and the yellow box whisker plots give the number of taxa from random subsamples (permutations = 100) over 48 gene families.
Figure 2Presence–absence matrix of the 48 candidate RMS cHGs plotted against the reference phylogeny. For most cHGs the pattern of presence–absence does not match the reference phylogeny (compare Figures S2–S5). RMS-candidate cHGs are loosely ordered by system type and with the ambiguously assigned RM candidates at the end. Table 1 gives a key relating the column names to the majority functional annotation.
Figure 3Gene maps for syntenic clusters of gene families (A) EFA and (B) BFD found in a subset of organisms identified to the right of each map. Genes are colored by gene families with Type I methylases (families A and B) in grays, Type I restriction endonucleases (DE) in blues, and Type I site specificity unit (F) in green.
Important traits of cHGs with four or more open reading frames (ORFs).
| Alpha (Numeric) cHG | No. of Taxa | No. of Transfers a | Function b | Predicted Recognition Sites c | Frequency e |
|---|---|---|---|---|---|
| I (001) | 16 | 9 | T_II_M | GAAGGC | 31% |
| GGRCA | 31% | ||||
| J (003) | 38 | 21 | T_II_M | CANCATC | 53% |
| TAGGAG | 21% | ||||
| AH (004) | 12 | 4 | HNH_endonuclease | GGCGCC | 89% |
| GATC | 11% | ||||
| F (006) | 61 | 44 | T_I_S | GGAYNNNNNNTGG | 24% |
| CAGNNNNNNTGCT | 16% | ||||
| R (008) | 14 | 0 | T_III_R | NA d | 100% |
| AA (010) | 55 | 15 | RNA_methylase | ATTAAT | 33% |
| K (011) | 137 | 97 | T_II_M | GCAAGG | 49% |
| GKAAYG | 28% | ||||
| AC (012) | 8 | 5 | Restriction Endonuclease | GCGAA | 29% |
| CAACNNNNNTC | 29% | ||||
| CTGGAG | 29% | ||||
| T (014) | 130 | 93 | Adenine_DNA_methylase | GCAGG | 45% |
| AAGCTT | 32% | ||||
| AE (015) | 21 | 13 | HNH_endonuclease | GGCGCC | 70% |
| YSCNS | 15% | ||||
| AP (016) | 12 | 6 | GVPC | CANCATC | 83% |
| C (018) | 7 | 4 | T_I_R | AACNNNNNNGTGC | 73% |
| CTANNNNNNRTTC | 27% | ||||
| AF (019) | 4 | 3 | Endonuclease | NAd | 100% |
| A (021) | 88 | 58 | T_I_M | GGAYNNNNNNTGG | 37% |
| GTCANNNNNNRTCA | 12% | ||||
| CTCGAG | 9% | ||||
| U (022) | 290 | 120 | DNA_methylase | CTAG | 59% |
| CATTC | 14% | ||||
| CCCGGG | 7% | ||||
| O (023) | 37 | 28 | T_II_R | NAd | 100% |
| B (024) | 16 | 8 | T_I_M | GAGNNNNNNVTGAC | 75% |
| GACNNNNNNRTAC | 19% | ||||
| G (025) | 4 | 2 | T_I_S | GAGNNNNRTAA | 75% |
| GAGNNNNNTAC | 25% | ||||
| V (027) | 5 | 1 | DNA_methylase | CATTC | 100% |
| AO (030) | 4 | 2 | ParB-like_nuclease | GATC | 75% |
| CTAG | 25% | ||||
| W (031) | 153 | 70 | dam_methylase | GATC | 70% |
| AB/SAAM | 22% | ||||
| AT (032) | 116 | 60 | Uncharacterized | GCAAGG | 43% |
| GKAAYG | 26% | ||||
| GGTTAG | 14% | ||||
| L (033) | 66 | 38 | T_II_M-033 | CAARCA | 40% |
| CTGAAG | 36% | ||||
| D (034) | 16 | 11 | T_I_R-034 | GCANNNNNRTTA | 69% |
| GGCANNNNNNTTC | 19% | ||||
| X (035) | 19 | 9 | probable_RMS_M | GGGAC | 83% |
| H (036) | 38 | 24 | probable_T_II_M | CCWGG | 42% |
| CCSGG | 18% | ||||
| GTAC | 16% | ||||
| AI (037) | 6 | 4 | HNH_nuclease | NA d | 100% |
| AJ (039) | 5 | 4 | HNH_nuclease | GGCGCC | 100% |
| AK (041) | 6 | 4 | HNH_nuclease | NA d | 100% |
| Q (042) | 21 | 8 | Adenine_DNA_methylase probable_T_III_M | RGTAAT | 71% |
| NA d | 19% | ||||
| Y (044) | 179 | 110 | dcm_methylase | CGGCCG | 24% |
| GTCGAC | 13% | ||||
| ACGT | 11% | ||||
| E (045) | 58 | 42 | T_I_R | CCCNNNNNRTTGY | 63% |
| GCANNNNNRTTA | 28% | ||||
| Z (048) | 54 | 35 | Adenine_DNA_methylase | CCRGAG | 36% |
| GTMKAC | 30% |
a Number of estimated horizontal gene transfer events, b T_I and T_II denote type I and type II restriction enzyme, respectively. M, R, and S denote the methylase, restriction endonuclease, and specificity subunits, respectively. c Top predicted recognition sites d No predicted recognition site e Frequency of predictions within the cHG.
Figure 4Heatmap of co-occurrence between the 48 RMS-candidate cHGs. Positive correlation indicates the cHGs co-occur while negative indicates that the presence of one means the other will not be present. Significance level is p < 0.05 with a Bonferroni correction applied for multiple tests. Blue indicates significant positive correlation; red indicates a significant negative correlation.