Literature DB >> 27631006

Identifying the Types of Ion Channel-Targeted Conotoxins by Incorporating New Properties of Residues into Pseudo Amino Acid Composition.

Yun Wu1, Yufei Zheng1, Hua Tang2.   

Abstract

Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based on F-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27631006      PMCID: PMC5008028          DOI: 10.1155/2016/3981478

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

The marine cone snail can secrete venom for predation and defense. A key component of venom is called conotoxin which is a kind of disulfide-rich neurotoxic peptide with 10–30 residues long. The high diversity of their sequences makes it difficult to systemically study them. It has been reported that there are over 100,000 conotoxins existing in approximately 700 species of cone snails [1]. Conotoxins can target G protein-coupled receptors (GPCRs) [2], nicotinic acetylcholine, and neurotensin receptors. Particularly, they interact with ion channels with extremely high specificity and affinity [3]. Thus, they have been regarded as important drug candidates to treat chronic pain, epilepsy, spasticity, and cardiovascular diseases [4, 5]. With more and more conotoxins being discovered, biochemical experiments-based method to investigate the function of conotoxins becomes more and more difficult because of high cost and long period of wet experiment. Using computational method to predict the function of conotoxins provides us with a convenient way to perform systemic analysis of conotoxins. In 2006, Mondal et al. combined support vector machine (SVM) with pseudo amino acid composition (PseAAC) to predict the superfamily of conotoxins [6]. Subsequently, Lin and Li developed a novel method called increment of diversity (ID) to describe dipeptide sequence and used quadratic discriminant (QD) to predict superfamily and family of conotoxins [7]. Zaki et al. used sequence alignment which was also used by Zou et al. [8] combined with amino acid composition to predict superfamily of conotoxins by use of SVM [9]. They further provide a SVM-Freescore method to improve accuracy [10]. Recently, Yin et al. developed a method called dHKNN to predict superfamily of conotoxins and achieved the overall accuracy of 90.3% by using hidden Markov model to select best features [11, 12]. Lisacek et al. used profile Hidden Markov Models (pHMMs) and position-specific scoring matrix (PSSM) to improve accuracy for conotoxin superfamily prediction [13-15]. Although the methods and results mentioned above can give some guide to study conotoxins, they did not provide more information for the prediction of conotoxins' function. A case shows that two conotoxins (delta-conotoxin-like Ac6.1 and omega-conotoxin-like Ai6.2) belong to the same superfamily; however, they can target different ion channels [16]. Thus, it is necessary to develop new bioinformatics tools to identify the function of conotoxins. In 2007, Saha and Raghava proposed a method based on SVM and PSI-BLAST to predict the function of neurotoxins [17]. Soli et al. developed a statistical-based model to predict the activity of scorpion toxins by using motifs and secondary structure information [18]. Recently, Yuan et al. developed a feature selection technique based on binomial distribution to predict the types of ion channel-targeted conotoxins by using radial basis function network [19]. Subsequently, they improved the accuracy by using SVM with optimal dipeptide composition [20]. However, the prediction accuracy can be further improved. Thus, the present study aimed to develop a new prediction method to improve the prediction quality of conotoxins' types. We incorporated three kinds of new properties of residues into PseAAC for formulating conotoxins samples. Subsequently, we used SVM to perform classification. After feature selection, we found that the accuracy was dramatically improved in jackknife cross-validation. In the following section, we will introduce the process of model construction in detail.

2. Materials and Methods

2.1. Benchmark Dataset

The benchmark dataset extracted from the UniProt [21] was constructed by Lin's group [19, 20]. The dataset is reliable and objective because (i) the conotoxins with ambiguous annotations have been excluded, (ii) the function of all conotoxins in benchmark dataset has been experimentally confirmed, and (iii) high similar sequences (cutoff = 80%) have been pruned by using CD-HIT program. The benchmark dataset contains 112 mature conotoxins peptide sequences including 24 potassium ion channel-targeted conotoxins (K-conotoxins), 43 sodium ion channel-targeted conotoxins (Na-conotoxins), and 45 calcium ion channel-targeted conotoxins (Ca-conotoxins). All calculations and model construction in the following section are based on the data.

2.2. Feature Extraction

A key point in protein prediction is how to extract important information from peptide sequences. In the past studies, the amino acid composition has been widely used in protein prediction. To consider the correlation of residues, the dipeptide composition was used in prediction model. Chou proposed a very popular and elegant descriptor called PseAAC which describes not only the correlation of physicochemical properties of residues but also the amino acid composition [22]. Furthermore, recently some web servers or stand-alone tools have been proposed to generate different modes of PseAAC, such as PseKNC [23], PseKNC-General [24], Pse-in-One [25], repRNA [26], and repDNA [27]. The authors should introduce these tools. In this study, we proposed three kinds of new properties, that is, rigidity, flexibility, and irreplaceability. The flexibility and rigidity of residues correlate with the protein structure and function. The irreplaceability of residues can reflect the evolution of life. The values of three properties for 20 residues [28] have been listed in Table 1. In the following, we will describe how to formulate conotoxins with PseAAC [22].
Table 1

The values of rigidity, flexibility, and irreplaceability of 20 residues.

ResiduesRigidity Flexibility Irreplaceability
G−1.097−2.7460.56
A−1.338−3.1020.52
V−1.641−1.3390.54
L−1.7410.4240.58
I−1.7410.4240.65
F2.877−0.4660.86
W5.913−1.0001.82
Y2.714−0.6720.98
D−0.2040.4240.77
H2.269−0.2230.94
N−0.2040.4240.79
E−0.3652.0090.76
K−1.8223.9500.81
Q−0.3652.0090.86
M−1.7412.4841.25
R1.1693.060.6
S−1.5110.9570.64
T−1.641−1.3390.56
C−1.5110.9571.12
P1.979−2.4040.61
Consider a conotoxin P = R 1 R 2 R 3 R 4 ⋯ R , where R 1, R 2, and R denote the 1st, 2nd, and Lth residue of the conotoxin sample P; it can be defined by a 400 + 3λ-dimensional vector as shown bywherewhere f is the normalized frequency of the 400 dipeptides in conotoxin P and can be defined aswhere n denotes the number of occurrences of uth dipeptide in conotoxin P. In (2), ω is weight factor for sequence order effect. τ is called the j-tier sequence correlation factor computed by the following formula:where H   (n = 1,2, 3 denotes rigidity, flexibility, and irreplaceability) is called the correlation function and can be given by where h (R ) is the nth kind of the physicochemical values of the amino acid R . The values should be converted to standard type bywhere h 0 (R ) is the original physicochemical values of the ith amino acid. For the purpose of finding the best feature subset which can produce the maximum accuracy, we performed feature selection by using the algorithm called F-score which can be defined aswhere and are the average values of the ith feature in whole dataset and the kth dataset; x is the value of the ith feature of the jth conotoxin in the kth dataset; and N is the numbers of conotoxin in the kth dataset. We noticed that the larger the F(i) value is, the better the predictive capability the ith feature has. We used a python script fselect.py downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/ to calculate F-score.

2.3. Support Vector Machine

SVM is a very popular machine learning method which is very suitable for small sample classification [29-31] and regressions [32, 33]. Its basic idea is to map the original samples into a high-dimensional space and search for the best hyperplane in this space which can separate different samples. In this study, the LibSVM soft package was used to implement SVM. The radial basis function (RBF) usually exhibits excellent performance in nonlinear classification [34]. Thus the RBF kernel function was used in the current work. We utilized grid search method to find out the best values of the regularization parameter C and kernel parameter γ via jackknife cross-validation. The search spaces for C and γ are [215, 2−5] and [2−5, 2−15] with steps being 2−1 and 2, respectively.

2.4. The Evaluation of Model Performance

We used jackknife cross-validation to evaluate the performance of proposed method. Three metrics, namely, sensitivity (Sn), overall accuracy (OA), and average accuracy (AA) as defined in [19, 20], were used to quantitatively estimate the accuracy of the model:where N is the total number of the kth types of conotoxins and m denotes the number of the kth types of conotoxins which was correctly recognized.

3. Results and Discussion

As we can see from (2), the results of the proposed method depend on two parameters λ and ω, where λ represents the long-range sequence order effect and ω is called weight factor which reflects the weight imposed between the local and global effects. Generally speaking, the greater λ is, the more global sequence order information it contains. However, if λ is too large, it would cause the high-dimensional disaster as mentioned above. Therefore, our searching for the optimal values of the three parameters was carried out in the following regions: From (9), a total of 10 × 10 = 100 individual combinations needed to be considered for finding the optimal parameter combination. This was actually a routine but tedious process to optimize the model via a 2-dimensional grid search. We used the jackknife cross-validation approach to deal with the parameter optimization. The results show that when λ = 6 and ω = 0.2, the accuracy reaches to maximum value. We noticed that the current model contains 418 features which is still so large that the high-dimensional and overfitting problems will appear. Therefore, we must select the key features from the 418 components. These key features can produce the maximum Acc. The best feature subset will be obtained by investigating all the combinations of features. However, it is time-consuming and even beyond computational capability for most computers to examine all possible combinations. Based on this reason, we used F-score defined in (7) to perform feature selection. At first, all 418 features were ranked according to their F-scores from large to small. Secondly, the SVM was used to classify three samples and calculate the accuracy based on the feature with maximum F-score. Thirdly, a new feature subset was produced by adding the feature with the second highest F value to the former feature subset. We repeated the process until all combinations were investigated and the accuracies were calculated. We plotted the accuracies with feature dimension in Figure 1 and noticed that the maximum accuracy is 94.6% when 180 best features were used. The detailed results were recorded in Table 1. Other published results were also listed in Table 2. We noticed that Sns of Na- and Ca-conotoxins of our method are 95.3% and 95.6%, respectively, which are higher than those of RBF network-based method [19]. The Sns of K- and Ca-conotoxins of our method are 91.7% and 95.6%, respectively, which are higher than those of iCTX-Type [20]. Thus, in summary, our proposed method is superior to other published methods.
Figure 1

A plot to show the feature selection results. When the top 180 features were used to perform prediction, the overall success rate reached its peak of 94.6%.

Table 2

Comparison of the current method with published methods.

MethodsSn (%)AA(%)OA(%)
KNaCa
RBF network [19]91.788.388.989.789.3
iCTX-Type [20]83.397.889.890.3191.1
Our method91.795.395.694.294.6

4. Conclusion

In this paper, we designed a new method based on three kinds of new properties to predict three kinds of ion channel-targeted conotoxins. By using feature selection technique, prediction accuracy was dramatically improved. Comparison with published methods demonstrated the advantage of our method. The properties of residues used in this paper can also be used in other fields of protein classification. In the future, we will construct a free webserver based on the proposed method for the convenience of the vast majority of experimental scientists.
  32 in total

1.  Identification and classification of conopeptides using profile Hidden Markov Models.

Authors:  Silja Laht; Dominique Koua; Lauris Kaplinski; Frédérique Lisacek; Reto Stöcklin; Maido Remm
Journal:  Biochim Biophys Acta       Date:  2011-12-30

Review 2.  Tropical marine neurotoxins: venoms to drugs.

Authors:  Michael R Watters
Journal:  Semin Neurol       Date:  2005-09       Impact factor: 3.420

3.  Probing peptide libraries from Conus achatinus using mass spectrometry and cDNA sequencing: identification of delta and omega-conotoxins.

Authors:  Konkallu Hanumae Gowd; Kalyan Kumar Dewan; Prathima Iengar; Kozhalmannom S Krishnan; Padmanabhan Balaram
Journal:  J Mass Spectrom       Date:  2008-06       Impact factor: 1.982

Review 4.  Structural studies of conotoxins.

Authors:  Norelle L Daly; David J Craik
Journal:  IUBMB Life       Date:  2009-02       Impact factor: 3.885

5.  Prediction of neurotoxins based on their function and source.

Authors:  Sudipto Saha; Gajendra P S Raghava
Journal:  In Silico Biol       Date:  2007

6.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition.

Authors:  Wei Chen; Tian-Yu Lei; Dian-Chuan Jin; Hao Lin; Kuo-Chen Chou
Journal:  Anal Biochem       Date:  2014-04-13       Impact factor: 3.365

7.  UniProt Knowledgebase: a hub of integrated protein data.

Authors:  Michele Magrane
Journal:  Database (Oxford)       Date:  2011-03-29       Impact factor: 3.451

8.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines.

Authors:  Renzhi Cao; Zheng Wang; Yiheng Wang; Jianlin Cheng
Journal:  BMC Bioinformatics       Date:  2014-04-28       Impact factor: 3.169

9.  Bioinformatic characterizations and prediction of K+ and Na+ ion channels effector toxins.

Authors:  Rima Soli; Belhassen Kaabi; Mourad Barhoumi; Mohamed El-Ayeb; Najet Srairi-Abid
Journal:  BMC Pharmacol       Date:  2009-03-10

10.  iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels.

Authors:  Hui Ding; En-Ze Deng; Lu-Feng Yuan; Li Liu; Hao Lin; Wei Chen; Kuo-Chen Chou
Journal:  Biomed Res Int       Date:  2014-06-01       Impact factor: 3.411

View more
  7 in total

Review 1.  Computational Studies of Snake Venom Toxins.

Authors:  Paola G Ojeda; David Ramírez; Jans Alzate-Morales; Julio Caballero; Quentin Kaas; Wendy González
Journal:  Toxins (Basel)       Date:  2017-12-22       Impact factor: 4.546

2.  A Transcriptomic Survey of Ion Channel-Based Conotoxins in the Chinese Tubular Cone Snail (Conus betulinus).

Authors:  Yu Huang; Chao Peng; Yunhai Yi; Bingmiao Gao; Qiong Shi
Journal:  Mar Drugs       Date:  2017-07-18       Impact factor: 5.118

3.  Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model.

Authors:  Wang Xianfang; Wang Junmei; Wang Xiaolei; Zhang Yue
Journal:  Biomed Res Int       Date:  2017-04-09       Impact factor: 3.411

Review 4.  Classes, Databases, and Prediction Methods of Pharmaceutically and Commercially Important Cystine-Stabilized Peptides.

Authors:  S M Ashiqul Islam; Christopher Michel Kearney; Erich Baker
Journal:  Toxins (Basel)       Date:  2018-06-19       Impact factor: 4.546

5.  Identification of Novel Conotoxin Precursors from the Cone Snail Conus spurius by High-Throughput RNA Sequencing.

Authors:  Roberto Zamora-Bustillos; Mario Alberto Martínez-Núñez; Manuel B Aguilar; Reyna Cristina Collí-Dula; Diego Alfredo Brito-Domínguez
Journal:  Mar Drugs       Date:  2021-09-28       Impact factor: 5.118

Review 6.  Machine learning in pain research.

Authors:  Jörn Lötsch; Alfred Ultsch
Journal:  Pain       Date:  2018-04       Impact factor: 6.961

Review 7.  Recent Advances in Conotoxin Classification by Using Machine Learning Methods.

Authors:  Fu-Ying Dao; Hui Yang; Zhen-Dong Su; Wuritu Yang; Yun Wu; Ding Hui; Wei Chen; Hua Tang; Hao Lin
Journal:  Molecules       Date:  2017-06-25       Impact factor: 4.411

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.