| Literature DB >> 15585667 |
L Y Han1, C Z Cai, Z L Ji, Z W Cao, J Cui, Y Z Chen.
Abstract
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.Mesh:
Substances:
Year: 2004 PMID: 15585667 PMCID: PMC535691 DOI: 10.1093/nar/gkh984
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Division of amino acids into three different groups for different physicochemical properties
| Property | Group 1 | Group 2 | Group 3 |
|---|---|---|---|
| Hydrophobicity | |||
| Type | Polar | Neutral | Hydrophobic |
| Amino acids in the group | RKEDQN | GASTPHY | CVLIMFW |
| Van der Waals volume | |||
| Value | 0–2.78 | 2.95–4.0 | 4.43–8.08 |
| Amino acids in the group | GASCTPD | NVEQIL | MHKFRYW |
| Polarity | |||
| Value | 4.9–6.2 | 8.0–9.2 | 10.4–13.0 |
| Amino acids in the group | LIFWCMVY | PATGS | HQRKNED |
| Polarizability | |||
| Value | 0–0.108 | 0.128–0.186 | 0.219–0.409 |
| Amino acids | GASDT | CPNVEQIL | KMHFRYW |
Figure 1The sequence of a hypothetic protein for illustration of derivation of the feature vector of a protein. Sequence index indicates the position of an amino acid in the sequence. The index for each type of amino acids in the sequence (A or E) indicates the position of the first, second, third, … of that type of amino acid (the position of the first, second, third, …, A is at 1, 3, 4, …). A/E transition indicates the position of AE or EA pairs in the sequence.
Characteristic descriptors of human insulin precursor (Swiss-Prot AC P01308)
| Property | Elements of descriptors | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Amino acid composition | 9.09 | 5.45 | 1.82 | 7.27 | 2.73 | 10.91 | 1.82 | 1.82 | 1.82 | 18.18 |
| 1.82 | 2.73 | 5.45 | 6.36 | 4.55 | 4.55 | 2.73 | 5.45 | 1.82 | 3.64 | |
| Hydrophobicity | 24.55 | 38.18 | 37.27 | 15.60 | 16.51 | 30.28 | 5.45 | 40.91 | 54.55 | 80.00 |
| 100.0 | 1.82 | 21.82 | 47.27 | 68.18 | 98.18 | 0.91 | 12.73 | 37.27 | 72.37 | |
| 99.09 | ||||||||||
| Van der waals volume | 40.00 | 41.82 | 18.18 | 29.36 | 11.01 | 13.76 | 1.82 | 21.82 | 52.73 | 71.82 |
| 99.09 | 2.73 | 25.45 | 56.36 | 78.18 | 100.0 | 0.91 | 15.45 | 41.82 | 50.00 | |
| 98.18 | ||||||||||
| Polarity | 40.91 | 32.73 | 26.36 | 24.77 | 20.18 | 13.76 | 0.91 | 14.55 | 38.18 | 74.55 |
| 99.09 | 1.82 | 20.91 | 49.09 | 68.18 | 91.82 | 5.45 | 33.64 | 53.64 | 79.09 | |
| 100.0 | ||||||||||
| Polarizability | 29.09 | 52.73 | 18.18 | 31.19 | 9.17 | 15.60 | 1.82 | 21.82 | 52.73 | 68.18 |
| 91.82 | 2.73 | 25.45 | 56.36 | 79.09 | 100.0 | 0.91 | 15.45 | 41.82 | 50.00 | |
| 98.18 | ||||||||||
The feature vector of this protein is constructed by combining all of the descriptors in sequential order.
List of enzymes without a homolog in the NR and Swiss-Prot databases and the results of SVM functional family assignment
| Enzyme (EC number) [Swiss-Prot accession number] | Database containing no homolog | SVM assigned functional family (probability of correct prediction) | Assignment status |
|---|---|---|---|
| Thiocyanate hydrolase beta subunit (EC 3.5.5.8) [O66186]. | NR | EC 3.5 Hydrolase of non-peptide carbon–nitrogen bonds (98.9%) | + |
| EC 2.6 Transferases of nitrogenous groups (62.2%) | |||
| Potential cysteine protease avirulence protein avrPpiC2 (EC 3.4.22.-) [Q9F3T4]. | NR | EC 4.2 Carbon–oxygen lyase (93.6%) | − |
| EC 2.3 Acyltransferase (83.9%) | |||
| EC 4.1 Carbon–carbon lyase (71.3%) | |||
| Outer membrane (58.6%) | |||
| Extracellular phospholipase (EC 3.1.1.5) [P82476] | NR | EC 3.1 Hydrolase of ester bonds (98.7%) | + |
| Cytochrome | NR | EC 1.9 Oxidoreductase of a heme group of donors (99.0%) | + |
| Cytochrome | NR | EC 1.9 Oxidoreductase of a heme group of donors (98.4%) | + |
| Transmembrane (98.3%) | |||
| EC 3.1 Hydrolase of ester bonds (62.2%) | |||
| Alginate lyase precursor (EC4.2.2.3) [P39049]. | NR | Transmembrane (65.4%) | − |
| Outer membrane (58.6%) | |||
| EC 2.1 Transferase of one-carbon groups (58.6%) | |||
| DNA α-glucosyltransferase (EC 2.4.1.26) [P04519] | NR | EC 2.4 Glycosyltransferase (80.4%); | + |
| EC 2.7 Transferase of phosphorus-containing groups (68.5%) | |||
| Endonuclease CviAII (EC 3.1.21.4 [P31117] | NR | EC 3.1 Hydrolase of ester bonds (99.0%) | + |
| Type II restriction enzyme CviJI (EC 3.1.21.4) [P52283] | NR | EC 3.1 Hydrolase of ester bonds (99.0%); | + |
| rRNA-binding proteins (98.8%); | |||
| EC 3.4 Peptidase (68.5%) | |||
| DNA-directed RNA polymerase, subunit 10 homolog (EC 2.7.7.6) [P42488] | NR | EC 2.7 Transferase of phosphorus-containing groups (99.0%) | + |
| 7 transmembrane receptor metabotropic glutamate family (58.6%) | |||
| Endonuclease IV (EC 3.1.21.-) [P39250] | NR | No function predicted | − |
| Beta-agarase precursor (EC3.2.1.81) [P13734]. | NR | EC 4.1 Carbon–carbon lyase (96.7%) | − |
| EC 2.4 Glycosyltransferase (71.3%) | |||
| Phenylacetaldoxime dehydratase (EC 4.2.1.-) [P82604]. | Swiss-Prot | Transmembrane (98.2%) | − |
| EC 3.4 Peptidase (96.4%) | |||
| EC 3.3 Hydrolase of ether bonds (80.4%) | |||
| EC 2.7 Transferase of phosphorus-containing groups (73.8%) | |||
| ATP synthase H chain, mitochondrial precursor (EC3.6.3.14) [Q12349]. | Swiss-Prot | EC 3.6 Hydrolase of acid anhydrides (99.0%) | + |
| RNA-binding protein (58.6%) | |||
| Peptide- | Swiss-Prot | EC 3.5 Hydrolase of non-peptide carbon–nitrogen bonds (99.0%) | + |
| Beta-Barrel porin (58.6%) | |||
| Swiss-Prot | EC 3.3 Hydrolase of ether bonds (99.0%) | + | |
| EC 2.7 Transferase of phosphorus-containing groups (71.3%) | |||
| DNA-binding protein (65.4%) | |||
| Hypothetical 52.8 kDa protein in VPS15-YMC2 intergenic region. (EC 3.1.22.-) [P38257] | Swiss-Prot | DNA-binding protein (89.3%) | − |
| Outer membrane (58.6%) | |||
| Hypothetical protein BBB03 (EC3.1.22.-) [O50979]. | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (88.1%) | − |
| EC 3.4 Peptidase (86.8%) | |||
| EC 2.3 Acyltransferase (71.3%) | |||
| EC 4.1 Carbon–carbon lyase (65.4%) | |||
| Telomere elongation protein (EC2.7.7.-) [P17214]. | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99.1%) | + |
| DNA-binding protein (78.4%) | |||
| Fucose-1-phosphate guanylyltransferase (EC 2.7.7.30) [O14772] | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99.1%) | + |
| 7 transmembrane receptor metabotropic glutamate family (58.6%) | |||
| DNA-directed RNA polymerase I 14 kDa polypeptide (EC 2.7.7.6) [P50106]. | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99%) | + |
| DNA-binding protein (62.2%) | |||
| Beta-Barrel porin (58.6%) | |||
| EC 3.4 Peptidase (58.6%) | |||
| DNA polymerase III, theta subunit (EC 2.7.7.7) [P28689]. | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99.0%) | + |
| EC 4.2 Carbon–oxygen lyase (58.6%) | |||
| Cytochrome | Swiss-Prot | EC 1.9 Oxidoreductase of a heme group of donors (97.0%) | + |
| Envelope protein (58.6%) | |||
| Transmembrane (58.6%) | |||
| Cytochrome | Swiss-Prot | EC 1.9 Oxidoreductase of a heme group of donors (98.3%) | + |
| Transmembrane (58.6%) | |||
| Cytochrome | Swiss-Prot | EC 1.9 Oxidoreductase of a heme group of donors (99.0%) | + |
| Transmembrane (58.6%) | |||
| RNA-binding protein (58.6%) | |||
| Cytochrome | Swiss-Prot | EC 1.9 Oxidoreductase of a heme group of donors (97.8%) | + |
| Transmembrane (93.8%) | |||
| EC 1.10 Oxidoreductase of diphenols and related substances as donors (58.6%) | |||
| Alpha-type channel (58.6%) | |||
| Heme-copper oxidase subunit IV (EC 1.9.3.-) [Q9YDX4]. | Swiss-Prot | EC 1.9 Oxidoreductase of a heme group of donors (99.0%) | + |
| Transmembrane (99.0%) | |||
| Aminoglycoside 2′- | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (78.4%) | − |
| EC 4.2 Carbon–oxygen lyase (58.6%) | |||
| Glycosyl transferase alg8 (EC2.4.1.-) [Q887P9]. | Swiss-Prot | Transmembrane (99.0%) | * |
| EC 2.4 Glycosyltransferase (98.6%) | |||
| Beta-agarase B (EC 3.2.1.81) [P48840]. | Swiss-Prot | Outer membrane (58.6%) | − |
| Beta-Barrel porin (58.6%) | |||
| CM (EC 5.4.99.5) [P19080] | Swiss-Prot | EC 5.4. Intramolecular transferase (99.0%) | + |
| EC 4.2. Carbon–oxygen lyase (58.6%) | |||
| Outer membrane (58.6%) | |||
| DNA β-glucosyltransferase (EC 2.4.1.27) [P04547] | Swiss-Prot | EC 2.4 Glycosyltransferases (95.7%); | + |
| EC 2.5 Transferase of alkyl or aryl groups, other than methyl groups (80.4%) | |||
| dNMPkinase (EC 2.7.4.13) [P04531] | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99.0%); | + |
| EC 2.4 Glycosyltransferase (96.4%); | |||
| EC 1.1 Oxidoreductase of the CH-OH group of donors (71.3%) | |||
| Endonuclease II (EC 3.1.21.1) [P07059] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%) | + |
| Endonuclease V (EC 3.1.25.1) [P04418] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%) | + |
| Exonuclease (EC 3.1.11.3) [P03697] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%); | + |
| EC 4.1 Carbon–carbon lyases (88.1%); | |||
| EC 2.7 Transferase of phosphorus-containing groups (68.5%); | |||
| EC 1.1 Oxidoreductase of the CH-OH group of donors (58.6%) | |||
| Ribonuclease (EC 3.1.-.-)[P13312] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%) | + |
| Intron-associated endonuclease 1 (EC 3.1.-.-) [P13299] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%); | + |
| DNA-binding protein (83.9%) | |||
| Intron-associated endonuclease 2 (EC 3.1.-.-) [P07072] | Swiss-Prot | EC 3.1 Hydrolase of ester bonds (99.0%) | + |
| Putative adenine-specific methylase (EC 2.1.1.72) [P51715] | Swiss-Prot | EC 2.1 Transferase of one-carbon groups (99.0%); | + |
| Outer membrane (58.6%); | |||
| mRNA-binding protein (58.6%) | |||
| Protein kinase (EC 2.7.1.37) [P00513] | Swiss-Prot | EC 2.7 Transferase of phosphorus-containing groups (99.0%) | + |
| Slt35 (EC 3.2.1.-) [P41052] | Swiss-Prot | Outer membrane (99.0%) | − |
| EC 1.1. Oxidoreductase acting on the CH-OH group of donors (89.3%) | |||
| EC 4.1. Carbon–carbon lyase (62.2%) | |||
| Ammonia monooxygenase (EC 1.13.12.-)[Q04508] | Swiss-Prot | EC 1.13. oxygenase (99.0%) | + |
| Transmembrane (99.0%) | |||
| EC 2.4. Glycosyltransferases (83.9%) | |||
| 2-Aminomuconate deaminase (EC 3.5.99.5) [P81593] | Swiss-Prot | EC 3.5. Hydrolase acting on carbon–nitrogen bonds, other than peptide bonds (99.0%) | + |
| EC 3.4. Peptidase (58.6%) | |||
| ADP-ribosyltransferase (EC2.4.2.37) [P14299] | Swiss-Prot | Transmembrane (92.9%) | * |
| EC 2.4. Glycosyltransferase (90.3%) | |||
| Outer membrane (58.6%) | |||
| Alpha- | Swiss-Prot | EC 3.4. Peptidase (91.3%) | − |
| Aminopeptidase G (EC 3.4.11.-) [Q54340] | Swiss-Prot | EC 3.4. Peptidase (99.0%) | + |
| TC 1.C. Channels/pores—pore-forming toxins (proteins and peptides) (58.6%) | |||
| Alginate lyase (EC 4.2.2.3) [Q59478] | Swiss-Prot | Transmembrane (96.4%) | − |
| EC 3.1. Hydrolase of ester bonds (78.4%) | |||
| Outer membrane (58.6%) | |||
| ATPE_YEAST (EC 3.6.3.14) [P21306] | Swiss-Prot | RNA-binding proteins (58.6%) | − |
| AhdA2cA1c (EC1.14.-.-) [BAC65427.1] | Swiss-Prot | EC 3.1. Hydrolase of ester bonds (82.2%) | − |
| DNA-binding protein (80.4%) | |||
| Transmembrane (58.6%) |
The symbol +, * and − represent the cases that the top of the predicted family, one of the predicted families, and none of the predicted families matches the enzyme function, respectively.
List of pairs of homologous enzymes of different families and the results of SVM functional family assignment
| Enzyme E1 (Swiss-Prot accession number) | EC class (F1) | Enzyme E2 (Swiss-Prot accession number) | EC class (F2) | Sequence similarity (BLAST | SVM functional family assignment | Assignment status |
|---|---|---|---|---|---|---|
| Glycolateoxidase (P05414) | EC 1.1 | IPP isomerase (Q8PW37) | EC 5.3 | 3.00E−07 | E1->F1;E2->F2 | + |
| Creatine amidinohydrolase (P38488) | EC 3.5 | Prolinedipeptidase (O58885) | EC 3.4 | 3.00E−15 | E1->F1;E2->F2 | + |
| Cystathionine gamma-synthase (P38675) | EC 2.5 | Methionine gamma-lyase (P13254) | EC 4.4 | 2.00E−15 | E1->W;E2->F2 | − |
| Exocellobiohydrolase 1 (P38676) | EC 3.2 | Cystathionine gamma-lyase (Q8VCN5) | EC 4.4 | 1.00E−12 | E1->W;E2->F2 | − |
| Maleylacetoacetate isomerase (P57109) | EC 5.2 | Glutathione | EC 2.5 | 1.00E−51 | E1->F1;E2->F2 | + |
| Tyrosine-protein kinase FRK (P42685) | EC 2.7 | Intestinalguanylate cyclase (P70106) | EC 4.6 | 2.60E−12 | E1->F1;E2->F1 | − |
| Glutamate-1-semialdehyde aminotransferase (Q06774) | EC 5.4 | 4-aminobutyrate aminotransferase (P22256) | EC 2.6 | 5.70E−32 | E1->F1;E2->F2 | + |
| Exodeoxyribonuclease (P37454) | EC 3.1 | DNA- (apurinic or apyrimidinic site) lyase (P43138) | EC 4.2 | 1.60E−96 | E1->F1;E2->F2 | + |
E1-> F1 or E2 -> F2 indicates that enzyme E1 or E2 is assigned into family F1 and F2 respectively. E1-> W or E2 -> W indicates that enzyme E1 or E2 is assigned into a wrong family respectively. The symbol + or − represents the cases that SVM is able or unable to distinguish the two enzymes and exclusively assign them into the respective family.