| Literature DB >> 21655262 |
Abstract
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21655262 PMCID: PMC3105057 DOI: 10.1371/journal.pone.0020445
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The seven AFP subsets used for cross-validation testing.
| Subset | Type | PDB ID |
| 1 | insect AFP | 1c3y |
| 2 | Type III fish AFP | 1c89; 3nla; 1ucs; 1ops; 1kde; 1ame; 1msi; 1b7i; 1b7j; 1b7k; 1ekl; 1gzi; 1hg7; 1jab; 1msj; 2ame; 2jia; 2msi; 2msj; 2spg; 3ame; 3msi; 4ame; 4msi; 5msi; 6ame; 6msi; 7ame; 7msi; 8ame; 8msi; 9ame; 9msi; |
| 3 | β-helical insect AFP | 1ezg |
| 4 | Type I fish AFP | 1wfa; 1j5b; 1y03 |
| 5 | β-helical insect AFP | 1eww; 1l0s; 1m8n |
| 6 | insect AFP | 2pne |
| 7 | Type II fish AFP | 2py2; 2afp |
Distribution of the 369 AFP sequences between the types of organism in the independent dataset.
| Organism | Number of sequences |
| Algae | 17 |
| Bacteria | 101 |
| Fish | 123 |
| Insects | 105 |
| Plants | 23 |
Performances of SVM and SVMGA in the seven-fold cross-validation tests.
| SVM | SVMGA | |||
| No. entries | Subset | 13 Feature schemes | 13 Feature schemes | Doxey et al. |
| 1 | 1 | 0 | 1 | - |
| 33 | 2 | 0 | 33 | 3 |
| 1 | 3 | 1 | 1 | 1 |
| 3 | 4 | 0 | 3 | 3 |
| 3 | 5 | 2 | 3 | 2 |
| 1 | 6 | 1 | 1 | - |
| 2 | 7 | 1 | 2 | 0 |
| AFP accuracy | 11.4% | 100.0% | 90.0% | |
| AFP precision | 25.0% | 62.9% | 42.9% | |
| Overall accuracy | 98.6% | 99.3% | 99.6% | |
|
| 0.162 | 0.790 | 0.620 | |
|
| 5 | 44 | 9 | |
|
| 3747 | 3736 | 3184 | |
|
| 15 | 26 | 12 | |
|
| 39 | 0 | 1 | |
The 13 feature schemes were: where k = {1,5,6,7}, g = {0,1,3,6}, S = {H 3,P 3,S 2,}, and S' = {7,11}.
Doxey and colleagues [9] used structure as the property to correctly identify 10 AFPs in their dataset. Only 2atp, based on an NMR structure, was not identified correctly.
Figure 1Sequence identity distribution for pairs of AFPs.
The x-axis values are the best pairwise-matched SI values for each AFP sequence against the other 368 sequences. The y-axis values are the best pairwise-matched SI values for each of the 369 AFP sequences of the second independent dataset against the 44 sequences of the validation set. Whether an AFP is identified (black symbol) or not identified (red symbol) in the independent data is indicated.
The feature schemes that enabled the recognition of the AFP in a subset when single SVM classifier was used.
| Feature Scheme | ||||||||||
| Subset |
|
|
|
|
|
|
|
|
|
|
| 1 | • | |||||||||
| 2 | • | • | • | • | • | • | • | • | • | • |
| 3 | • | • | • | • | • | • | ||||
| 4 | • | • | • | • | • | • | • | • | ||
| 5 | • | • | • | • | • | • | • | • | • | |
| 6 | • | • | • | • | • | • | ||||
| 7 | • | • | • | |||||||
The filled circles correlate the feature schemes with the AFPs that they identified. The AFPs are denoted according to their subsets.
An example of votes acquired by residues in a sequence from 1msi.
| Sequence | ….. | Q9 | L10 | I11 | P12 | I13 | N14 | T15 | A16 | L17 | T18 | ….. | |
| Coding |
| * | * | * | |||||||||
|
| |||||||||||||
|
| |||||||||||||
|
| |||||||||||||
|
| * | * | |||||||||||
|
| * | * | * | * | * | ||||||||
|
| |||||||||||||
|
| * | * | ** | ||||||||||
|
| * | ||||||||||||
|
| * | ** | ** | * | |||||||||
|
| * | * | * | * | * | * | |||||||
|
| * | * | * | * | * | ||||||||
|
| * | ||||||||||||
| Votes | ….. | 3 | 3 | 0 | 1 | 2 | 4 | 8 | 5 | 3 | 4 | ….. |
Figure 2Examples of key residues mapped onto the surfaces of the seven representative AFPs used in the cross-validation tests.
The structures were drawn using PyMOL [37]. Identification key residues are denoted in red (more votes) and yellow (fewer votes) for the following PDB structures: (A) the winter flounder α-helical AFP (PDB ID 1wfa) [35]; (B) the snow flea AFP (PDB ID 2pne) [38]; (C) the β-helical spruce budworm AFP (PDB ID 1eww) [13]; (D) the β-helical beetle Tenebrio molitor AFP (PDB ID 1ezg) [36].
Figure 3The surface of the eelpout type III AFP (PDB ID 1msi).
(A) Key residues selected by the SVMGA are labeled in black words. Residues Q9, V20, M21 and Q44, which were identified as key residues in a mutagenesis study but not by the SVMGA, are shown in cyan. (B) A view of the ice–binding interface; all residues that are part of the interface as reported are labeled. The residues identified by SVMGA are shown in red and yellow. Residues known to be important in ice binding, but not identified by the SVMGA are shown in cyan. Residues not identified by the SVMGA are shown in gray. Residues not determined by SVMGA are shown in gray.
Figure 4Rate of identifying the 369 AFPs from the second independent set.
Each bar correlates the identification accuracy with a range of maximum SI values, found from the y axis of Figure 1 in specific ranges of SI for the different species.