| Literature DB >> 23768067 |
Vivek Anantharaman1, Kira S Makarova, A Maxwell Burroughs, Eugene V Koonin, L Aravind.
Abstract
BACKGROUND: The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23768067 PMCID: PMC3710099 DOI: 10.1186/1745-6150-8-15
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Multiple alignment of the HEPN superfamily. The multiple sequence alignment includes the conserved blocks based on the MUSCLE alignment [45], which was corrected manually on the basis of HHpred [46] and PSI-BLAST results [47]. Due to the low similarity, the alignment of helices 1, 2.1 and 4 should be considered tentative. Secondary structure, which is a consensus between the proteins with solved structures, is shown above the alignment; ‘H’ indicates α-helix. The sequences are denoted by their GI numbers and species names. The HEPN family to which each sequence belongs is indicated after the species name. Positions of the first and the last residues of the aligned region in the corresponding protein are indicated for each sequence. The PDB identifiers for proteins with solved structure are indicated on the right. The numbers (of amino acid residues) within the alignment represent poorly conserved inserts that are not shown. The coloring is based on the consensus shown underneath the alignment; ‘h’ indicates hydrophobic residues (WFYMLIVACTH), ‘p’ indicates polar residues (EDKRNQHTS),‘s’ indicates small residues (ACDGNPSTV). Predicted catalytic amino acids are shown by reverse shading. GI and species name is underlined if the HEPN domain has lost the conserved Rx4-6H motif.
Classification, domain architectures, gene-neighborhoods and other salient features of HEPN proteins
| HEPN-T (PF05168) | D replaces conserved H in several cases | Standalone versions and fusions to MNT; In the case of Sacsin it is part of a multi-domain protein with vertebrates showing a further fusion to an Ubiquitin-like domain and some animals showing a fusion to a Death domain. Several instances of genomic clustering with R-M system operons | Bacteria, Archaea, Eukaryotes |
| pdb: 1wwp, 2hsb, 1o3u. | |||
| Proteins with conserved D in place of H have a conserved H elsewhere which could contribute to activity | |||
| HEPN-T(Parep1/8) | Lacks R but H is conserved | Fused to inactive LAF-1/Vasa-like RNA helicase N-terminal ATPase domain in | Archaea. Has two distinct families PAE0096 and PaREP1. PDB:2q00 |
| HEPN-T (Cpin_6617) | No | Fusions to a dyad of ferredoxin domains (gi: 381187024, | Mostly Bacteria |
| HEPN-M (PF08780/DUF86-PF01934) | Mostly conserved (83%) | Occasionally fused to MNT, a previously undetected archaeal Holliday junction resolvase-like REase (Additional file 1), and nucleic acid methylase domains. In operon with a HAD phosphoesterase gene | PDB: 1ylm, 1jog-A. Bacteria, Archaea |
| HEPN-M (SAV_6107) | No | - | actinobacteria |
| Aminoglycoside_NT_C (PF07827/DUF4037) | No | Found at the C-termini of aminoglycoside nucleotidyltransferase and related proteins (gi: 15923025). Occasionally fused to TPRs (gi: 296454793) | PDB:1kny, 3jyy, 3jz0, 2pbe Bacteria |
| GlnD/GlnE (PF08335)/ DUF294_C (PF10335) | No | Fused to GlnD/E-like nucleotidyltransferase. Usually part of the glutamine synthetase modifying complex. DrrA is a secreted toxin in | PDB:1v4a, 3l0i Bacteria |
| DUF4145 (PF13643) | Mostly conserved (80%) | Fused to Restriction Endonuclease (REase, SF-II-Helicase); Sel1, Zinc Ribbon, TM and SH3 (Firmicutes), UvrD Helicase (endoV alpha subunit), TIR and ATPase ( | Bacteria > Archaeaa, dsDNA viruses; |
| In operon with R-M, TerD, McrB/C and symE toxin | |||
| c2405 | Conserved H but lacks R | Fused to N-terminal AbiTii domain and in a few cases to a C-terminal Helix-hairpin-helix domain | Bacteria |
| MtlR | 60% | Most often a part of mannitol operon with other mannitol utilization genes | gamma proteobacteria pdb:3c8g, 3brj |
| Abi2/AbiF/AbiD | Yes | Abi2/AbiF/AbiD and jhp1408 families | Bacteria |
| Embedded in R-M operons and also a protein with DNase domains ParB and HNH ( | |||
| Swt1-like | Partly conserved | ||
| Helicase, Vsr REase and 2 wHTH (MTES_1575), active; CBS and HD (alr3009), active; RNASEIII and DSRBD (Cyanobacteria), active; STAND-ATPase, TPR, S1 (Npun_F6454, MED222_16016, Desac_1927), mostly active; SWI2/SNF2-ATPase (WQE_15321), active; Zinc Ribbon (Npun_R5629); ZnR with two TMs (Plim_2023), active. | | ||
| Ribo L-PSP-HEPN | Yes | Fused to endoRNase L-PSP(gi: 166363853) ; operon with ParB | Bacteria. Distantly related AbiF and AbiD |
| AbiU2 | Yes | In operon with a gene encoding protein with Sel1 repeats; R-M operons; | Bacteria |
| AbiV | No | - | Bacteria; Has an alternative conserved H at the same position as the first HEPN-T family; hence, could be related to that family |
| AbiJ | Yes | Fused to various novel N terminal domains labeled AbiJ-NTD1 to 5; Some of the solos occur in operon with R-M system | Bacteria |
| AbiA-CTD | Yes | Fused to Reverse Transcriptase ; in operon with R-M system | Bacteria |
| MAE_28990 | Yes | In operon with a ParB nuclease and DNA methylase genes | Bacteria |
| MAE_18760 | Yes | Fused to HEPN/RES-NTD1, HEPN/Toprim-NTD1, Schlafen and a novel beta rich domain. In operon with ParA/Soj ATPase of SIMIBI-type GTPase fold | Bacteria |
| Csx1( MJ1666) | Yes | A dyad of HEPN domains fused to a Rossmann fold domain (PF09455) | Archaea > Bacteria; PDB:2i71, 4EOG |
| Csx1(TM1812) | Yes | HEPN fused to a Rossmann fold (PF09455), and a few other novel domains | Bacteria; |
| Csm6 | Yes | HEPN fused to Csm6 (PF09659) and a helical domain | bacteria; |
| Csm6 (Cas_Cas02710) | Yes | HEPN fused to Csm6 (PF09670) | Bacteria > Archaea; |
| Ymh (PF09509) | Yes | Solos and fusions to pMORC, AbiJ-NTD1 and AbiTii domain. | Bacteria > Archaea |
| In operon with R-M | |||
| C6orf70 | Yes | Fused to TPR; WD40 ( | Bacteria > Eukaryotes. Overlaps with DUF4209 (PF13910). This family can be traced to LECA |
| Occurs in R-M related operons | |||
| DUF2526 (PF10735) | Yes | None detected | Gammaproteobacteria |
| KEN (RnaseL/Ire1) | Mostly conserved (95%) | Fused to S/T/Y-Kinase, along with ankyrin repeats, CCCH in some. Also found fused to UBI (gi:125543109) and BRCT (gi: 218187285) | Eukaryotes. pdb:3lj2; solo RNase L in |
| Las1 | Yes | Mainly Solos. Sometimes fused to Metallo-beta-lactamase and EF-HAND (Ascomycota) and to family specific globular domains | Eukaryotes |
| Rnase LS | Yes | Fused to RNase H (gi: 300902643), along with Caulimovirus viroplasmin domain (gi: 222100146). In some a TATA-binding protein (TBP)-like domain replaces the RNase H fold domain. In operon with antitoxin RnlB | Bacteria |
| DZIP3/ hRUL138 | Mostly conserved | Fusion to TPR, Zn-ribbon, RING, Ankyrin, CARD, NACHT ATPase, DEATH and LRR in various animal lineages | Eukaryotes. Mainly animal lineage: LSEs in |
| PrrC/RloC/ APECO1_4465 | Yes | Fused to ABC-ATPase. Often found in R-M operon and with genes for RhuM-like or Fic/Doc-like toxins. APECO1_4465 is also found in prophages | Bacteria |
| ERFG_01251 | Yes | Fused to ABC-ATPase and HEPN/TOPRIM-NTD1 | Bacteria |
| ApeA/BMEI1217 | Yes | In epsilonproteobacteria embedded in R-M operons | Bacteria > Archaea; |
| EC042_2821 | Yes | Fused to wHTH, REase and ZnR domains. Occurs in R-M system operons | Bacteria overlaps with DUF3644 |
| Integron cassette HEPN | Yes | Part of mobile integron element | PDB:3jrt Gammaproteobacteria |
| pEK499_p136_Ecoli like (B) | Yes | Some in operon with R-M genes, ADP-ribosyltransferase-like enzymes (ART), and Macro. Also found in operon with NamA-like RNase H fold nuclease and with the Pgl components | NamA toxin / RlfA Replication in Phage P1 has a RnaseH fold |
| LA2681 | Yes | Fused to TPR, and in operon with TPR | Bacteria > Archaea |
| Cthe_2314 | Yes | None detected | Bacteria |
| Bxe_C0808 | Yes | In operon with AbiU2 | Bacteria |
a: The “>” sign indicates a postulated transfer from one lineage to another.
Figure 2Structural diversity of HEPN domains. A member of each of the seven HEPN families with solved crystal structures is rendered as a cartoon; labels provide HEPN family name and PDB ID. Equivalent core helices are colored the same across all structures while labeled in the order observed from the N-terminus to the C-terminus to highlight circular permutations. In the canonical configuration, helix-1 (H1) and helix-2 (H2) from the first α-hairpin are colored green and blue, respectively and helix-3 (H3) and helix-4 (H4) from the second α-hairpin are colored cyan and yellow, respectively. The conserved insert region found between helix-2 and helix-3 in the canonical configuration is colored and labeled in light grey in each cartoon. The kink and further distortions are labeled in yellow. Conserved active site residues are rendered as ball and sticks and colored and labeled in red. Note the structural reorganization of HEPN domain in the Csx1 family. The distinctive β-hairpins of this family are colored and labeled in brown and the zinc ion found in the vicinity of the active site residues is rendered as a sphere and colored in purple.
Selected novel domains fused to HEPN domains
| AbiJ-NTD1 | ~ 140 aa; (e.g. 1 to 140 aa, | Mostly alpha helical | Fused to HEPN families: AbiJ, DUF4145 (gi: 113972064), Ymh (gi: 148556575). It is also found fused to other domains potentially involved in biological conflicts: HKD-Phosphoesterases (gi:302346766), STYKinase (gi:47459341), REase (gi: 358072046) and flavodoxin fold nucleoside deoxyribosyltransferase (gi: 397664865) |
| gi: 134296193, Bcep1808_2091 | |||
| AbiJ-NTD2 | ~ 100 aa; (e.g. 1 to 102 aa, | Mostly alpha helical with a conserved beta strand next to the first alpha helix | Found fused to AbiJ, and to other domains presumably involved in other domains potentially involved in biological conflicts: Mrr family REase (gi: 91784007), TIR nuclease (gi: 269963288). Many AbiJ_NTD2 sequences have been erroneously included in the DUF3644 Pfam model, a new HEPN domain described here. However, profile-profile searches do not demonstrate an independent relationship between AbiJ_NTD2 and HEPN independently |
| gi: 60680647, BF1118, | |||
| AbiJ-NTD3 | ~ 140 aa; (e.g. 1 to 142 aa, | Alpha + beta | Found fused to AbiJ. Fused to other domains presumably involved in defense: ABC ATPase (gi:319955098), REase domains prototyped by the Pfam model DUF2726 (gi:56476843) |
| gi:187251857, Emin_1454, | |||
| AbiJ-NTD4 | ~ 160 aa; (e.g. 1 to 165 aa, | Alpha + beta | Found fused to AbiJ and heat repeats (gi: 71907952) |
| gi: 182417316, CBY_0614, | |||
| AbiJ-NTD5 | ~ 100 aa; (e.g. 1 to 115 aa, | Mostly alpha helical | Found fused to AbiJ, and to other domains presumably involved in defense: TIR nuclease (gi:296123260), some have a further N-terminal DnaG-like CxxH-CxxC Zn ribbon domain |
| gi: 149930787, w0043, | |||
| AbiTii | ~ 180aa; (e.g. 1 to 180 aa of gi: 358446093) | Alpha + beta | Found fused to the N-terminus of the c2405 family of HEPN domains and in few cases to Ymh (gi: 372210551) |
| HEPN/RES-NTD1 | ~ 100 aa; (e.g. 1 to 95 aa, | Mostly alpha | Fused to HEPN (MAE_28990 superfamily), RES domain, a potential RNase found in various toxin |
| gi:206576331, KPK_1764 | helical | systems (gi: 30248753). Also occasionally fused to an ABC ATPase and two other novel domains (Supplementary material). Some of those fused to RES have a further N-terminal Zn ribbon domain | |
| HEPN/Toprim-NTD1 | ~240 aa; (e.g. 1 to 240 aa, gi: 423201025; HMPREF1167_01188 | Alpha + beta | Fused to two distinct HEPN families: MAE_28990 and ERFG_01251 families (gi: 118587223), TOPRIM (gi: 160895002) and a Mrr-like REase domain (gi: 383455290) |
| DpnII/MboI-NTD | ~100 aa; (e.g. 1 to 115 aa | Mostly alpha helical with a conserved beta strand next to the first alpha helix | This domain can be unified the α-helical domain found at the N-termini of the type-II REases DpnII and MboI. I It is fused to the HEPN domain prototyped by the Pfam DUF4145 model and to other domains presumably involved in defense: e.g. a novel REase (gi: 146284642) |
| gi:218440340, PCC7424_3406, | |||
| ApeA-NTD1 | ~ 300 aa; (e.g. 1 to 300 aa, | Mostly beta strands | Fused to HEPN (Apea). Several conserved aromatic residues, abundant but poorly conserved |
| gi: 218441941 PCC7424_5050, | |||
| MAE_18760-NTD1 | ~121 aa; (e.g. 1 to 121 of gi: 385800275) | Mostly beta strands | Found at the N-terminus of certain members of the MAE_18760 family |
| AAA-ATPase (Ava_2192-CTD) | ~300aa; (e.g. 160 to 460aa gi:75908411, Ava_2192, | AAA-ATPase fold | The AAA-ATPase domain overlaps with Pfam DUF499. Fused to HEPN (SWT1/Abi2 family) |
| wHTH | ~65aa ; (e.g. 2006 to 2075 and 2095 to 2161 gi: 323358023, MTES_1575, | wHTH fold | Fused to HEPN (SWT1/Abi2 family), along with Transglutaminase and Vsr–family REase domains. Overlaps with DUF3320. |
| Novel Vsr-REase (MTES_1575_REase) | ~180aa (e.g., 1810..1990 aa gi: 323358023, MTES_1575, | Vsr REase Fold | Fused to HEPN (SWT1/Abi2 family), along with Transglutaminase and wHTH. |
| Novel REase (EC042_2821_CTD) | ~180aa (eg, 240..420aa gi: 387608267, EC042_2821, | REase Fold | Fused to HEPN (EC042_2821) and an N-terminal wHTH in some. |
Figure 3A domain architecture and gene-neighborhood network showing the manifold functional connections of the HEPN domain. The graphs were rendered using the Cytoscape program [73]. The network is an ordered graph with the cyan edges representing the connection between adjacent domains combined in the same polypeptides and the gold edges representing the context in the gene neighborhood. (A) The “force-directed” network was derived using the spring-embedded layout utilizing the Kamada–Kawai algorithm, which works well for graphs with 50–100 nodes [74]. The natural clustering of the functional categories emerging from this algorithm is indicated with labels. (B) The nodes of the network arranged by function. (C) Condensed network, where the domain belonging to a given functional category has been collapsed into that category name. (D) A domain architecture graph of HEPN and the various N-terminal domains which co-occur with other defense-related domains, showing the interchangeability of HEPN and the defense-related domains.
Figure 4Selected domain architectures of HEPN proteins. The domains are not drawn to scale. Domain architectures are labeled with a representative gene name, the Genbank identifier (gi) number, and the species name separated by semicolons. The labels of eukaryotes are colored green. The generic functional categories are shown in red letters. Uncharacterized globular domains of limited phyletic spread are shown with a grey rectangle. Domain names of most domains follow the Pfam database or literature [78] (also see Additional file 1). Non-standard domain abbreviations: Ank – Ankyrin; CARF- CRISPR/Cas-associated Rossmann fold domain; PlipaseD – Phospholipase D; Taminase – Transglutaminase; TM – transmembrane helix; Helical – Helical domain.
Figure 5Selected gene-neighborhoods of HEPN genes. The gene neighborhood data for some of the genes encoding HEPN domain containing proteins is depicted using arrows. The HEPN gene is marked with an asterisk. The direction of the arrow is the direction of transcription of the gene. The gene name, Genbank identifier (gi), and the species name of the starred gene are shown next to the operon. The multi-gene modules that always co-occur are boxed. The cartoon representations of the genes are not drawn to scale. The depicted operons are typically representative of a types of operons found in a range of diverse organisms. Domain names of most domains follow the Pfam database or literature [78] (also see Additional file 1). Non standard abbreviations: RM_TRD, restriction-modification target recognition domain.