Literature DB >> 20198175

HNHDb: a database on pattern based classification of HNH domains reveals functional relevance of sequence patterns and domain associations.

Alaguraj Veluchamy¹, Sujitha Mary, Vishal Acharya, Preethi Mehta, Taru Deva, Sankaran Krishnaswamy.

Abstract

The HNH Database is a collection and sequence-based classification of HNH domain proteins. The database contains about 1913 HNH domain containing proteins, and is classified into 10 subsets based on the sequence pattern. Each of these subsets has unique signature sequences. We have shown a correlation between the subset combination and their domain association and function. Functional divergence of this domain may be due to the combination of these conserved patterns and the large variations in the non-conserved regions. HNHDb is freely available at http://bicmku.in:8081/hnh.

Entities: Chemical Disease

Keywords: HNH domains; HNHDb; sequence classification

Year: 2009 PMID： 20198175 PMCID： PMC2823387 DOI： 10.6026/97320630004080

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Decomposing each protein into modular domains and each domain into subclasses is a basic prerequisite for accurate functional classification of protein molecules. The protein sequence classification is also helpful in organizing huge data produced by large-scale genome sequencing projects. The domain HNHc (SMART id: SM00507) is a conserved domain of around 50 amino acids, characterized by the presence of central conserved Asp/His residue flanked by conserved His (N-terminal) and His/Asp/Glu (Cterminal) residues at some distance. HNH domains are found among homing endonucleases, inteins, Group I and Group II introns, as well as free standing ORFs in viruses, archaebacteria, eubacteria and eukaryote, showing a polyphyletic relationship and are associated with a range of DNA binding proteins, performing a variety of binding and cutting functions [1-3]. They are involved in a variety of cellular activities including bacterial toxicity, homing functions in group I and II introns and inteins, recombination, developmentally controlled DNA rearrangement, phage packaging and as restriction endonuclease [4-6]. HNH homing endonucleases are members of the five families of homing endonucleases including LAGLIDADG, GIY-YIG, HNH, HISCYS box and cyanobacterial intron homing endonucleases. Among these five families, HNH family is evolutionarily more related to HisCys box nucleases. Structurally, HNHmotif family is one among the six families of HisMe finger endonucleases and has a topology similar to that of a zinc finger motif [7] or treble clef motif [8], containing two βstrands and one α-helix linked together by a divalent metal ion often referred to as ββα-Me motif. Although variations in sequence are common among the members of the His-Me finger endonuclease superfamily, they share a structural similarity over some particular residues in the active site region [9] and many of the type-II restriction endonucleases are found to belong to HNH fold [10,11]. The HNH catalytic motif are highly adaptable and shows slight configurational modifications depending on the enzyme and the substrate [12,13] and they are found to differ in their activity against double stranded DNA, single stranded DNA and single stranded RNA. The goal of this classification is to address the growing need to corroborate and integrate data by delineating characteristic subsequences using a regular expression type method. Since few of the HNH proteins are of known function such a classification database will help functional and structural analysis.

Methodology

Datasets

We used HNH domain sequence family derived from SMART database.

HNH Protein sub-classification

The sub-classification of HNH proteins involves two steps. A total of 2483 HNH domain sequences obtained from the SMART database were crosschecked with Swissprot. The redundant sequences were removed and obsolete entries were deleted, forming set of 2143 sequences. These subset members along with other features were inlaid in planned database architecture. Generating a PROSITE pattern from unaligned HNH domain sequences with the PRATT program [14] and the resulting multiple patterns are used to generate weight based matrices [15]. The WAPAM was used to search against sequences in SMART [16] database and cluster them into subsets.

Database Design

The HNH database is implemented in MySQL (v 4.1.12) (RDBMS) with PHP (v 4.3.5) as a front-end tool. Perl scripts are used to generate the PROSITE patterns and construction of this database. HNHDb is well linked to other databases like SwissProt by its accession number, PDB by its Id. HNH protein sequences can be queried using accession numbers, PROSITE pattern or a key word search of protein name/function. ClustalW and Mview are integrated so that user can select sequences and align. The subsets are named HNHDb:Sub:1 to HNHDb:Sub:10.

Discussion

About 90% (1913 HNH domain proteins into 10 subsets) of the available HNH proteins are classified and the database covers 100% of all HNH proteins in Pfam [17]. Each subclass has a particular set of defined conserved patterns. Few functional classes were derived, which have combinations of these patterns. The whole HNH domain appears to be a mere combination of these patterns. Their functional variation could possibly result from the presence or absence of these significant regions. The HNH domain sequence can be characteristically represented in a PROSITE pattern as: [L]-[L]-x-[R]-[D]-G-G-x(2,4)-C-x(2,4)-Cx( 6,7)-[D]-H-x(5,6)-G-G-x(5)-N-x(1,3)-[L]-[L]-x(2,5)-C-x(2,4)-C- [NH]. Most of the HNH domain sequences contain repetition of the few dyads i.e. -G-G- or –L-L- or -C-x(2,4)-C-. This could possibly make the difference in the function or different DNA-binding strategies. The sequences which do not have these characteristic patterns may be due to mis-annotation or the group may be different altogether. Apart from H-N-H residues, there are three dyads, which by combinations i.e. either by presence or absence, forms particular pattern.

Defining features

Common protein signatures (Figure 1a) are:

Figure 1

(a) HNH pattern; (b) Subset patterns and number of hits; (c) Percentage identity of the HNH sequence across the subsets; (d) Screenshot of the web interface of the HNH Database.

A L-L leucine dyad at the beginning of the domain; An R-D, immediately following leucine dyad; Presence of Cysteine double dyad one before the first H and the other between N and H. This cysteine dyad is found to be delimiting the domain boundary; Presence of GG double dyad, one before first H and the other between H and N of the HNH H-x(*)-N-y(*)-N/H. The unusual property is that the defining features in the patterns occur twice, one at the N-terminal side and the other at the C-terminal, wrapping around the HNH pattern active residues.

Subset description

The seven distinct features seen in HNH sub-classification by Mehta et al [9] is diffused among 10 subsets adding some more features to the HNH proteins. The most prominent patterns (Figure 1b) are G-G and -C-x(2,4)-C-, which marks the boundary of the domain at the N and C-terminal. The two Cysteine dyads are present at both ends of the domain sequence, with 30 amino acids gap. If the domain sequence is long then the mid portion is increased and the dyads remain at the ends. These pattern can then be characteristically represented as Cx(2,4)Cx(30)Cx(2,4)C. The glycine dyad at the Cterminal is more conserved than the one at the Nterminal. Each subset has particular number of hits and their similarity is in large variation, although a common pattern is found (Figure 1c).

Functional annotation

We observe that a particular functional class has a specific set of protein signature combination (Table 1 in supplementary material). Most of the functional classes are assigned based on the associated domain. Although the DNA binding activity is invariable to the HNH domain, the biological function may vary due to the associated domain. These functional classes ensure that the derived patterns are highly unlikely to have emerged by chance. PDB structures of few members of functional classes are known. These can help in homology modeling of the members of the clusters whose structures are not known. Further, residues of subset 1 to subset 9 may be part of SDRs (specificity determining residues) and subset 10 is active residues or functionally conserved residues (FCRs). The variation in the SDRs may leads to divergence in activity. Regular expression based methods are better at differentiating SDRs and FCRs. DATABASE ACCESS: The database (Figure 1d) and associated information files are freely accessible at the URL http://bicmku.in:8081/hnh.

Conclusion

The relationship between the cellular role of a protein, conserved subsequence in a domain and the type of domain association is established in HNH domain containing sequences. The promiscuity of the HNH domain sequences, conservation of SDRs and their organism wise distribution among the subsets, suggests horizontal transfer and coevolution.

16 in total

Review 1. Catalytic mechanisms of restriction and homing endonucleases.

Authors: Eric A Galburt; Barry L Stoddard
Journal: Biochemistry Date: 2002-11-26 Impact factor: 3.162

2. HNH family subclassification leads to identification of commonality in the His-Me endonuclease superfamily.

Authors: Preeti Mehta; Krishnamohan Katta; Sankaran Krishnaswamy
Journal: Protein Sci Date: 2004-01 Impact factor: 6.725

3. SMART 4.0: towards genomic data integration.

Authors: Ivica Letunic; Richard R Copley; Steffen Schmidt; Francesca D Ciccarelli; Tobias Doerks; Jörg Schultz; Chris P Ponting; Peer Bork
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. Type II restriction endonuclease R.KpnI is a member of the HNH nuclease superfamily.

Authors: Matheshwaran Saravanan; Janusz M Bujnicki; Iwona A Cymerman; Desirazu N Rao; Valakunja Nagaraja
Journal: Nucleic Acids Res Date: 2004-11-23 Impact factor: 16.971

Review 5. Selfish DNA: homing endonucleases find a home.

Authors: David R Edgell
Journal: Curr Biol Date: 2009-02-10 Impact factor: 10.834

6. Statistical modeling and analysis of the LAGLIDADG family of site-specific endonucleases and identification of an intein that encodes a site-specific endonuclease of the HNH family.

Authors: J Z Dalgaard; A J Klar; M J Moser; W R Holley; A Chatterjee; I S Mian
Journal: Nucleic Acids Res Date: 1997-11-15 Impact factor: 16.971

7. Finding flexible patterns in unaligned protein sequences.

Authors: I Jonassen; J F Collins; D G Higgins
Journal: Protein Sci Date: 1995-08 Impact factor: 6.725

8. Self-splicing group I and group II introns encode homologous (putative) DNA endonucleases of a new family.

Authors: A E Gorbalenya
Journal: Protein Sci Date: 1994-07 Impact factor: 6.725

9. Metal ions and phosphate binding in the H-N-H motif: crystal structures of the nuclease domain of ColE7/Im7 in complex with a phosphate ion and different divalent metal ions.

Authors: Meng-Jiun Sui; Li-Chu Tsai; Kuo-Chiang Hsia; Lyudmila G Doudeva; Wen-Yen Ku; Gye Won Han; Hanna S Yuan
Journal: Protein Sci Date: 2002-12 Impact factor: 6.725

10. The Pfam protein families database.

Authors: Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal: Nucleic Acids Res Date: 2007-11-26 Impact factor: 16.971

7 in total

1. ZRANB3 is a structure-specific ATP-dependent endonuclease involved in replication stress response.

Authors: Ria Weston; Hanneke Peeters; Dragana Ahel
Journal: Genes Dev Date: 2012-07-03 Impact factor: 11.361

2. HK97 gp74 Possesses an α-Helical Insertion in the ββα Fold That Affects Its Metal Binding, cos Site Digestion, and In Vivo Activities.

Authors: Sasha A Weiditch; Sarah C Bickers; Diane Bona; Karen L Maxwell; Voula Kanelis
Journal: J Bacteriol Date: 2020-03-26 Impact factor: 3.490

3. The role of the N-terminal loop in the function of the colicin E7 nuclease domain.

Authors: Anikó Czene; Eszter Németh; István G Zóka; Noémi I Jakab-Simon; Tamás Körtvélyesi; Kyosuke Nagata; Hans E M Christensen; Béla Gyurcsik
Journal: J Biol Inorg Chem Date: 2013-01-19 Impact factor: 3.358

4. Fine tuning of the catalytic activity of colicin E7 nuclease domain by systematic N-terminal mutations.

Authors: Eszter Németh; Tamás Körtvélyesi; Peter W Thulstrup; Hans E M Christensen; Milan Kožíšek; Kyosuke Nagata; Anikó Czene; Béla Gyurcsik
Journal: Protein Sci Date: 2014-06-17 Impact factor: 6.725

5. Crystallization and preliminary crystallographic analysis of an Escherichia coli-selected mutant of the nuclease domain of the metallonuclease colicin E7.

Authors: Anikó Czene; Eszter Tóth; Béla Gyurcsik; Harm Otten; Jens Christian N Poulsen; Leila Lo Leggio; Sine Larsen; Hans E M Christensen; Kyosuke Nagata
Journal: Acta Crystallogr Sect F Struct Biol Cryst Commun Date: 2013-04-30

6. Genome-Wide Analysis of Type VI System Clusters and Effectors in Burkholderia Species.

Authors: Thao Thi Nguyen; Hyun-Hee Lee; Inmyoung Park; Young-Su Seo
Journal: Plant Pathol J Date: 2018-02-01 Impact factor: 1.795

Review 7. Pathogenicity-associated protein domains: The fiercely-conserved evolutionary signatures.

Authors: Seema Patel
Journal: Gene Rep Date: 2017-04-08

7 in total