| Literature DB >> 29040665 |
Jagoda Jablonska1, Dorota Matelska1, Kamil Steczkiewicz1, Krzysztof Ginalski1.
Abstract
The His-Me finger endonucleases, also known as HNH or ββα-metal endonucleases, form a large and diverse protein superfamily. The His-Me finger domain can be found in proteins that play an essential role in cells, including genome maintenance, intron homing, host defense and target offense. Its overall structural compactness and non-specificity make it a perfectly-tailored pathogenic module that participates on both sides of inter- and intra-organismal competition. An extremely low sequence similarity across the superfamily makes it difficult to identify and classify new His-Me fingers. Using state-of-the-art distant homology detection methods, we provide an updated and systematic classification of His-Me finger proteins. In this work, we identified over 100 000 proteins and clustered them into 38 groups, of which three groups are new and cannot be found in any existing public domain database of protein families. Based on an analysis of sequences, structures, domain architectures, and genomic contexts, we provide a careful functional annotation of the poorly characterized members of this superfamily. Our results may inspire further experimental investigations that should address the predicted activity and clarify the potential substrates, to provide more detailed insights into the fundamental biological roles of these proteins.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29040665 PMCID: PMC5714182 DOI: 10.1093/nar/gkx924
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The conserved structural core of His-Me finger fold of Cas9 (PDB ID: 4ogc). The core elements are coloured in yellow and blue for β-strands and α-helix, respectively. The catalytic histidine is shown as red sticks. The remaining HNH sequence motif residues are shown as green sticks. The residues from outside the motif, chelating catalytic zinc ion (orange), are denoted as grey sticks. The sequence logo was generated based on the structure-guided multiple sequence alignment of all superfamily sequences clustered at 70% sequence identity using the Skylign server (106). The total height of the letters depicts the information content of the position in bits.
Figure 2.Profile hidden Markov model (HMM) connectivity network for His-Me finger proteins. Nodes represent HMM profiles built with hmmbuild (using default parameters) from the HMMER3 suite, based on the alignments of sequences from respective groups clustered at 70% identity. Edges depict E-value scores (<0.001) between the groups calculated with hhsearch with default parameters. The size of the nodes is proportional to the number of all sequences in the NCBI NR database; the taxonomy distribution is also provided. All groups are numbered as in Table 1, those with available 3D structures are denoted with underlined numbers, whereas new groups are marked with red font. The network was visualized with the Cytoscape software (29).
38 groups of proteins retaining the His-Me finger fold
| Group | PDB90 | Pfam | COG/KOG | Functions of defining family members | Taxonomy | |||
|---|---|---|---|---|---|---|---|---|
| Viruses | Bacteria | Archaea | Eukaryota | |||||
| 1 | 4h9d | PF01844 (HNH) | COG3183 | GVE2 thermostable nicking nuclease ( | 588 | 44036 | 404 | 1020 |
| 4ogc | PF13391 (HNH_2) | COG1403 | ATP-dependent nuclease EA31 | |||||
| 4cmp | PF13395 (HNH_4) | COG3440 | phiRv1 integrase ( | |||||
| 5axw | PF14279 (HNH_5) | COG3513 | CRISPR RNA-guided endonuclease Cas9 | |||||
| 5h0o | McrA restriction endonuclease ( | |||||||
| MnlI restriction endonuclease ( | ||||||||
| Homing endonucleases F-LimI, F-LimIV | ||||||||
| Bacteriophage T4 homing endonucleases MobA, MobB, MobC, MobD, MobE | ||||||||
| F-TflIV nuclease ( | ||||||||
| DNA annealing helicase and endonuclease ZRANB3 ( | ||||||||
| PLC-like phosphodiesterase | ||||||||
| 2 | 1u3e | PF05551 (zf-His_Me_endon) | I-HmuI homing endonuclease ( | 795 | 3288 | 13 | 28 | |
| PF13392 (HNH_3) | SPBc2 prophage-derived putative HNH homing endonuclease YosQ | |||||||
| Pathogenesis-related transcriptional factor and ERF protein ( | ||||||||
| 3 | PF07510 (DUF1524) | COG3513 | GmrD from GmrSD restriction system ( | 28 | 12124 | 174 | 263 | |
| COG1217 | ||||||||
| 4 | PF14414 (WHH) | COG5585 | WHH toxin domain in toxin-antitoxin systems ( | 0 | 2098 | 2 ( | 0 | |
| PF12639 (Colicin-DNase) | Cell wall assembly Smi1/Knr4 family proteins | |||||||
| Type VII secretion proteins | ||||||||
| 5 | 1g8t | PF01223 (Endonuclease_NS) | KOG3721 | YxiD toxin component of the YxiD-YxxD toxin-antitoxin system ( | 22 | 8333 | 10 | 2456 |
| 1zm8 | PF13930 (Endonuclea_NS_2) | COG1864 | FhaB proteins | |||||
| 3ism | COG5444 | RhsA proteins ( | ||||||
| 3s5b | Eukaryotic DNA/RNA mitochondrial endonuclease G ( | |||||||
| 4a1n | Bacterial virulence factor NucA ( | |||||||
| 5gkc | NET-degrading DNA-entry nuclease (EndA) ( | |||||||
| 2xgr | Streptodornase ( | |||||||
| 5fgw | ||||||||
| 6 | PF13391 (HNH_2) | COG3440 | Restriction endonucleases AlwI, BbrI ( | 3 | 5351 | 88 | 120 (Fungi) | |
| 7 | 3g27 | PF06147 (DUF968) | Prophage-derived proteins YbcO, YdfU | 30 (Bacteriophages) | 6423 | 0 | 3 | |
| PF07102 (DUF1364) | ||||||||
| 8 | PF14279 (HNH_5) | 0 | 158 | 1 ( | 0 | |||
| 9 | 1oup | PF04231 (Endonuclease_1) | COG2356 | Bacterial periplasmic or secreted endonuclease I, Dns, DnsH, Vvn ( | 0 | 5245 | 0 | 79 |
| 2pu3 | Mg2+-activated RNase Bsn ( | |||||||
| 2vnd | ||||||||
| 10 | PF14412 (AHH) | AHH toxin domain in bacterial polymorphic toxin system ( | 0 | 474 | 0 | 1 ( | ||
| 11 | PF14411 (LHH) | LHH toxin domain in bacterial polymorphic toxin system ( | 0 | 522 | 0 | 1 ( | ||
| 12 | 1a73 | PF05551 (zf-His_Me_endon) | Homing endonucleases I-PpoI, I-CpiI, I-PchI ( | 0 | 0 | 0 | 62 (Fungi) | |
| 13 | 3fc3 | PF02945 (Endonuclease_7) | Type II restriction endonuclease VII Hpy99I ( | 330 | 289 | 3 | 0 | |
| 1en7 | Phage T4 recombination endonuclease VII ( | |||||||
| 14 | PF14410 (GH-E) | GH-E toxin domain in bacterial polymorphic toxin system proteins ( | 0 | 366 | 8 ( | 0 | ||
| 15 | PF01844 (HNH) | Putative mismatch repair endonuclease YisB ( | 0 | 948 | 35 | 205 | ||
| 16 | 1ozj | PF03165 (MH1) | KOG3701 | MH1 domain of Smad proteins ( | 0 | 0 | 0 | 2850 |
| 4zkg | ||||||||
| 17 | PF01844 (HNH) | COG3183 | Group II intron-encoded protein LtrA with reverse transcriptase and RNA maturase activities ( | 16 | 1119 | 1 ( | 148 | |
| KOG4768 | Eukaryotic putative COX1/OXI3 intron 1 protein | |||||||
| Mitochondrial DNA mismatch repair protein MutS | ||||||||
| 18 | 3u43 | PF01844 (HNH) | Colicins, pyocins ( | 0 | 1097 | 0 | 0 | |
| 7cei | ||||||||
| 1bxi | ||||||||
| 4uhp | ||||||||
| 4qko | ||||||||
| 19 | PF16784 (HNHc_6) | Phage Orf family recombinases ( | 32 | 1163 | 0 | 1 ( | ||
| 20* | 3ldy | Restriction endonuclease PacI ( | 78 (Giant viruses) | 4 | 0 | 6 | ||
| 21 | 4gtw | PF01223 (Endonuclease_NS) | Ectonucleotide pyrophosphatase/phosphodiesterase (Enpp1, Enpp2, Enpp3) ( | 5 (Birds viruses) | 0 | 0 | 1309 (Fish and marine species, i.e., tunicates) | |
| 4zg9 | Venom phosphodiesterase ( | |||||||
| 22 | PF13392 (HNH_3) | Gyrase subunit B intein | 0 | 84 | 7 | 0 | ||
| Deoxycytidine triphosphate deaminase intein | ||||||||
| 23* | 0 | 180 | 0 | 0 | ||||
| 24 | PF05766 (NinG) | Rap (recombination adept with plasmid)/NinG proteins involved in recombination ( | 27 | 1972 | 0 | 2 | ||
| 25 | PF15657 (Tox-HNH-EHHH) | Tox-HNH-EHHH toxin domain in bacterial polymorphic toxin system ( | 0 | 197 | 0 | 0 | ||
| 26 | PF16786 (RecA_dep_nuc) | RecA-dependent nuclease involved in genome degradation ( | 20 | 184 | 0 | 0 | ||
| 27 | PF05315 (ICEA) | Type II restriction endonucleases NlaIII and IceA1 ( | 2 | 184 | 0 | 0 | ||
| 28* | SphI restriction endonuclease ( | 0 | 34 | 0 | 0 | |||
| 29 | PF09665 (RE_Alw26IDE) | Restriction endonucleases RE_Alw26IDE, BsaI, BsmBI ( | 0 | 57 | 0 | 0 | ||
| 30 | PF15652 (Tox-SHH) | Tox-SHH toxin domain in bacterial polymorphic toxin system ( | 0 | 77 | 0 | 0 | ||
| 31 | 3plw | PF16786 (RecA_dep_nuc) | RecA-dependent nuclease involved in genome degradation ( | 1 ( | 203 | 0 | 0 | |
| 32 | PF01844 (HNH) | Hcp1 family polymorphic toxin system protein YhhZ ( | 0 | 543 | 0 | 0 | ||
| 33 | PF13392 (HNH_3) | Putative early/late infection stage endonucleases from | 125 ( | 0 | 0 | 0 | ||
| 34 | PF15637 (Tox-HNH-HHH) | Tox-HNH-HHH toxin domain in bacterial polymorphic toxin system ( | 0 | 71 | 0 | 0 | ||
| 35 | 1v0d | PF09230 (DFF40) | Rep4, DffB, Drep-4 caspase-activated DNases ( | 0 | 0 | 0 | 399 | |
| 36 | PF15635 (Tox-GHH2) | Tox-GHH2 toxin domain in bacterial polymorphic toxin system ( | 0 | 166 | 0 | 0 | ||
| 37 | PF15636 (Tox-GHH) | KOG4659 | Tox-GHH toxin domain in bacterial polymorphic toxin system | 0 | 181 | 0 | 2476 | |
| Metazoan teneurin ( | ||||||||
| 38 | PF13391 (HNH_2) | Eukaryotic pathogens effector CR-HNH ( | 0 | 0 | 0 | 132 | ||
Figure 3.Structural diversity of His-Me fingers. The core elements of the fold (ββα motif) are coloured in yellow and blue for β-strands and α-helix, respectively. The Ω-loop is coloured in magenta. The zinc knuckle preceding the core β-hairpin is shown in light pink. The catalytic site His/Tyr is presented as red sticks. (A) The structure of viral RecA-dependent Ref nuclease (PDB ID: 3plw) with finger loop insertion (orange). (B) Dimerized nuclease domains of Holliday Junction Resolvase ENDOVII (PDB ID: 2qnc). (C) The structure of Vibrio vulnificus nuclease (Vvn) (PDB ID: 1oup) with the α-helical domain (green) at the back of the HNH domain. (D) The structure of Caspase-Activated DNase (CAD) (PDB ID: 1v0d). The loop substituting for the core α-helix is coloured in blue. (E) The structure of Serratia marcescens nuclease (Sm) (PDB ID: 1g8t) with the β-sheet domain (green) at the back of HNH domain. (F) The structure of PacI (PDB ID: 3m7k) with tyrosine substituting for catalytic histidine.
Figure 4.Multiple sequence alignment of the conserved core regions of the His-Me finger superfamily. The numbers of excluded residues are specified in parentheses. The extensive finger loop in the sequence corresponding to PDB ID: 3plw is denoted with ‘ = ’. Sequences are labelled according to the group number followed by their NCBI accession number or PDB ID, and an abbreviation of the species name. Respective Pfam families are given in the last column. Residue conservation is denoted by the following scheme: uncharged, highlighted in yellow; polar, highlighted in grey; HNH motif residues, highlighted in green; remaining metal ion coordinating residues, highlighted in red. The secondary structure is shown above the corresponding alignment blocks.