| Literature DB >> 20961958 |
Mindaugas Laganeckas1, Mindaugas Margelevicius, Ceslovas Venclovas.
Abstract
PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile-profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20961958 PMCID: PMC3045609 DOI: 10.1093/nar/gkq958
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Conserved structural core and typical active site arrangement in PD-(D/E)XK nucleases. Shown is the 3D structure of the archaeal Holliday junction endonuclease (PDB: 1ob8). Secondary structure elements of the conserved core are labeled and colored blue (α-helices) and yellow (β-strands). Side chains are shown for the residues representing the three active site signature motifs (I–III). Red broken arrows indicate observed variants of the active site residue ‘migration’ into alternative positions.
Description of the SVM classifiers
| SVM classifier | HHsearch probability threshold (%) | Number of positive/negative training examples | Use of motif-I (E/Q) in the definition of the active site | Training accuracy (%) |
|---|---|---|---|---|
| SVM-1 | 50 | 381/257 | No | 95.9 |
| SVM-2 | 50 | 381/257 | Yes | 95.9 |
| SVM-3 | 70 | 285/257 | No | 97.4 |
| SVM-4 | 70 | 285/257 | Yes | 98.0 |
| SVM-5 | 80 | 233/257 | Yes | 98.6 |
Figure 2.Putative PD-(D/E)XK families. (A) Sequence alignments with PD-(D/E)XK representatives having known experimental 3D structures. Each family is denoted by the PFAM name and the accession number. Sequences within families are labeled with Uniprot (www.uniprot.org) accession codes. PD-(D/E)XK structural representatives are labeled with corresponding PDB codes and common protein/family names in parentheses. PD-(D/E)XK signature motifs (I–III) are indicated at the top. Red asterisks and open circles denote respectively canonical and alternative positions of the active site residues. Six aligned blocks correspond to secondary structure elements of the conserved structural core characteristic of the superfamily as shown in Figure 1. Numbers in parentheses indicate excluded residues. Alignments are colored according to sequence conservation: identical residues have blue background, similar ones—green. Known or putative active site residues are highlighted in red. Observed (PDB: 1ob8) or predicted consensus secondary structure for each family is displayed above the sequences (H, α-helix; E, β-strand). (B) Domain composition of representative sequences from each family. Additional domains/motifs are denoted as follows: wHtH, winged-helix; HtH, helix-turn-helix; TMH, transmembrane helix.
REases, newly assigned to PD-(D/E)XK superfamily
|
|
For each REase family the SVM probability of assignment to the PD-(D/E)XK superfamily, HHsearch probability, putative active site motifs, DNA recognition sequence and a subtype according to REBASE are indicated. Putative active site motifs are annotated with starting residue numbers and number of residues in between the motifs. Predicted active site residues are in red color, motif-III residues that ‘migrated’ to non-canonical positions are underlined. Where known, cleavage sites within recognition sequences are indicated with ‘∧’ and those outside of recognition sequences—with two numbers in parentheses—for top and bottom strands respectively. REBASE subtypes are as follows: P—symmetric target and cleavage sites; G—symmetric or asymmetric target, affected by AdoMet; S—asymmetric target and cleavage sites.