| Literature DB >> 17722976 |
Vered Kunik1, Yasmine Meroz, Zach Solan, Ben Sandbank, Uri Weingart, Eytan Ruppin, David Horn.
Abstract
Predicting the function of a protein from its sequence is a long-standing goal of bioinformatic research. While sequence similarity is the most popular tool used for this purpose, sequence motifs may also subserve this goal. Here we develop a motif-based method consisting of applying an unsupervised motif extraction algorithm (MEX) to all enzyme sequences, and filtering the results by the four-level classification hierarchy of the Enzyme Commission (EC). The resulting motifs serve as specific peptides (SPs), appearing on single branches of the EC. In contrast to previous motif-based methods, the new method does not require any preprocessing by multiple sequence alignment, nor does it rely on over-representation of motifs within EC branches. The SPs obtained comprise on average 8.4 +/- 4.5 amino acids, and specify the functions of 93% of all enzymes, which is much higher than the coverage of 63% provided by ProSite motifs. The SP classification thus compares favorably with previous function annotation methods and successfully demonstrates an added value in extreme cases where sequence similarity fails. Interestingly, SPs cover most of the annotated active and binding site amino acids, and occur in active-site neighboring 3-D pockets in a highly statistically significant manner. The latter are assumed to have strong biological relevance to the activity of the enzyme. Further filtering of SPs by biological functional annotations results in reduced small subsets of SPs that possess very large enzyme coverage. Overall, SPs both form a very useful tool for enzyme functional classification and bear responsibility for the catalytic biological function carried out by enzymes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17722976 PMCID: PMC1950953 DOI: 10.1371/journal.pcbi.0030167
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1The Occurrence of Specific Peptides within the EC Hierarchy of Enzymes
(A) A sketch of the EC hierarchy and the assignments of SPs to SP classes. SPs can be compared with those appearing in Figure 1B.
(B) Aligned sequences of two groups of enzymes of level 4 that share the same third-level assignment. Alignment is performed according to SPs. The organisms in the upper group, 5.1.3.20, belong to proteobacteria, while those of the lower group, 5.1.3.2, also contain eukaryotes (ARATH, CYATE, and PEA). Boldfaced substrings denote SPs. Amino acids flanked by spaces denote active sites and binding sites, as indicated above. A list of all SPs and their assignments to SPN classes is presented below the sequences.
Specific Peptides in All Six Classes of Swiss-Prot Release 48.3
Performance of SPs Extracted from the Swiss-Prot 45 Dataset on Novel Enzyme Sequences in Swiss-Prot 48.3
Coverage of a Non-Redundant Test Set by Motifs in SP1, SP2, and SP3
Enzymes with High Sequence Similarity and Different EC Assignments
Occurrence of Specific Peptides on Active Sites
Figure 2SPs Occurrence on a Spatial Structure of an Enzyme
(A) 3-D display of enzyme P07649 (PDB code 1DJ0), belonging to 5.4.99.12, showing (1) an active site D at sequence location 60; (2) a binding site Y at location 118; (3) a binding site L at location 245. The active site is common to two SPs (4) containing (CAGRT(D)AGVH). Other shown SPs are (5) GQVVH at locations 67–71; (6) FHARF at 107–111, known to be a tentative RNA-binding peptide; (7) ENDFTS at 157–163; and (8) HMVRNI at 201–207, sharing a pocket with the active and binding sites. QVVH and ENDFTS belong to SP3, all other peptides belong to SP4.
(B) A different display of the same enzyme focuses on the pocket containing the active site. The relevant section of the sequence is shown, with red residues signifying active and binding sites, green residues corresponding to other amino acids residing in the pocket, and underlined residues corresponding to SPs.
Occurrence of SPs in Spatial Proximity to Active Sites
Small Sets of SPs that Contain Active Sites Suffice To Specify Functionality of Many Enzymes