| Literature DB >> 30094375 |
Morteza Eslami1, Ramin Shirali Hossein Zade2, Zeinab Takalloo3, Ghasem Mahdevar4, Abbasali Emamjomeh5, Reza H Sajedi3, Javad Zahiri6,7.
Abstract
Various cold-adapted organisms produce antifreeze proteins (AFPs), which prevent the freezing of cell fluids by inhibiting the growth of ice crystals. AFPs are currently being recognized in various organisms, living in extremely low temperatures. AFPs have several important applications in increasing freeze tolerance of plants, maintaining the tissue in frozen conditions and producing cold-hardy plants by applying transgenic technology. Substantial differences in the sequence and structure of the AFPs, pose a challenge for researchers to identify these proteins. In this paper, we proposed a novel method to identify AFPs, using supportive vector machine (SVM) by incorporating 4 types of features. Results of the two used benchmark datasets, revealed the strength of the proposed method in AFP prediction. According to the results of an independent test setup, our method outperformed the current state-of-the-art methods. In addition, the comparison results of the discrimination power of different feature types revealed that physicochemical descriptors are the most contributing features in AFP detection. This method has been implemented as a stand-alone tool, named afpCOOL, for various operating systems to predict AFPs with a user friendly graphical interface.Entities:
Keywords: Biochemistry; Bioinformatics; Computational biology; Computer science; Mathematical biosciences
Year: 2018 PMID: 30094375 PMCID: PMC6074609 DOI: 10.1016/j.heliyon.2018.e00705
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Fig. 1Schematic representation of AFP517 dataset construction. a) Positive dataset contains 843 AFPs. b) Negative dataset contains 843 non-AFPs.
Prediction performance of the afpCOOL on two benchmark datasets in a 10-fold cross validation (10-fold CV) and leave-one-out (LOOCV) procedures. The AFP481 dataset contains 300 AFPs and 300 non-AFPs; and, the AFP517 dataset contains 517AFPs and 517non-AFPs.
| Dataset | Performance measures | ||||||
|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Sensitivity | F-Measure | MCC | AUC | ||
| 10-Fold CV | AFP481 | 0.93 | 0.93 | 0.93 | 0.93 | 0.87 | 0.93 |
| AFP517 | 0.91 | 0.92 | 0.92 | 0.92 | 0.84 | 0.92 | |
| LOOCV | AFP481 | 0.89 | 0.90 | 0.88 | 0.86 | 0.78 | 0.91 |
| AFP517 | 0.91 | 0.90 | 0.89 | 0.90 | 0.81 | 0.90 | |
Fig. 2The receiver operating characteristic (ROC) curves of our method on the two benchmark dataset calculated from the ten-fold cross validation. The AFP481 dataset contains 300 AFPs and 300 non-AFPs; and, the AFP517 dataset contains 517AFPs and 517non-AFPs.
Performance comparison of the proposed method (afpCOOL) with the two current state-of-the-art methods in AFP prediction. All methods are trained on the AFP481 dataset in a 10-fold cross validation procedure and tested on an independent test dataset with 181 AFP and 9193 non-AFPs.
| Method | Performance Measure | ||
|---|---|---|---|
| Sensitivity | Specificity | Accuracy | |
| afpCOOL | 0.72 | 0.98 | 0.96 |
| AFP-Pred (Griffith and Yaish 2004) | 0.85 | 0.82 | 0.83 |
| AFP-PSSM (Zhao et al. 2012) | 0.76 | 0.93 | 0.93 |
Fig. 3Graphical user interface of afpCOOL.
Fig. 4Sensitivity (a) and specificity(b) of the BLAST when using each of the 801 AFPs as the query against the last update.
Anti-freeze proteins with no any hit in the BLAST search (35 proteins).
| Proteins (UniProt ID) | Organism |
|---|---|
| Myoxocephalus scorpius (Shorthorn sculpin) (Cottusscorpius). | |
| Parachaenichthyscharcoti (Charcot's dragonfish) (Chaenichthyscharcoti). | |
| Gymnodraco acuticeps (Antarctic dragonfish). | |
| Pseudopleuronectes americanus (Winter flounder) (Pleuronectes americanus). | |
| Myoxocephalus aenaeus (Grubby sculpin) (Cottusaenaeus). | |
| Eleginus gracilis (Saffron cod) (Gadusgracilis). | |
| Secale cereale (Rye). | |
| Dissostichus mawsoni (Antarctic cod). | |
| Notothenia microlepidota. | |
| Chaenocephalus aceratus (Blackfinicefish) (Chaenichthysaceratus). | |
| Liparis atlanticus (Atlantic seasnail). | |
| Tautogolabrus adspersus (Cunner). | |
| Pogonophryne cerebropogon. | |
| Myoxocephalus octodecemspinosus (Longhorn sculpin) (Cottus octodecemspinosus). | |
| Liparis gibbus (variegated snailfish). | |
| Trapa natans (Water chestnut). | |
| Solanum tuberosum (Potato). | |
| Pagothenia borchgrevinki (Bald rockcod) (Trematomusborchgrevinki). | |
| Antarctomycespsychrotrophicus. | |
| Tenebrio molitor (Yellow mealworm beetle). | |
| Hypogastrura harveyi. | |
| Nicotiana tabacum (Common tobacco). | |
| Bradyrhizobium sp. ORS 375. | |
| Cullen corylifolia (Malaysian scurfpea) (Psoralea corylifolia). | |
| Cardinium endosymbiont cEper1 of Encarsia pergandiella. | |
| Phenylobacterium zucineum (strain HLK1). | |
| Stigmatella aurantiaca (strain DW4/3-1). |
Anti-freeze proteins with 0% sensitivity in the BLAST search (11 proteins).
| Proteins (UniProt ID) | Organism |
|---|---|
| Boreogadus saida (Polar cod). | |
| Coprinopsis cinerea (strain Okayama-7/130/ATCC MYA-4618/FGSC 9003) | |
| Ammopiptanthusnanus. | |
| Dissostichus mawsoni (Antarctic cod). | |
| Chaenocephalus aceratus (Blackfinicefish) (Chaenichthys aceratus). | |
| Gadus ogac (Greenland cod). | |
| Gymnodraco acuticeps (Antarctic dragonfish). |
Fig. 5Amino acid composition of AFP proteins that have been used in the BLAST searches.
Fig. 6The percent of each feature type score regarding the total score of the 100 most important features.
The 10 most important features.
| Rank | Feature Type | Feature's Detail |
|---|---|---|
| 1 | Physicochemical | The number of atoms in the side chain labeled 2 + 1 |
| 2 | Physicochemical | Normalized positional residue frequency at helix termini C |
| 3 | Physicochemical | Average non-bonded energy per residue |
| 4 | Physicochemical | Free energy change of epsilon(i) to alpha(Rh) |
| 5 | Physicochemical | The number of bonds in the longest chain |
| 6 | PSSM-based | H to H |
| 7 | Physicochemical | Normalized positional residue frequency at helix termini C4 |
| 8 | PSSM-based | H to G |
| 9 | Physicochemical | Relative population of conformational state C |
| 10 | Physicochemical | Loss of Side chain hydropathy by helix formation |