Literature DB >> 24991545

iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels.

Hui Ding¹, En-Ze Deng¹, Lu-Feng Yuan¹, Li Liu², Hao Lin³, Wei Chen⁴, Kuo-Chen Chou⁵.

Abstract

Conotoxins are small disulfide-rich neurotoxic peptides, which can bind to ion channels with very high specificity and modulate their activities. Over the last few decades, conotoxins have been the drug candidates for treating chronic pain, epilepsy, spasticity, and cardiovascular diseases. According to their functions and targets, conotoxins are generally categorized into three types: potassium-channel type, sodium-channel type, and calcium-channel types. With the avalanche of peptide sequences generated in the postgenomic age, it is urgent and challenging to develop an automated method for rapidly and accurately identifying the types of conotoxins based on their sequence information alone. To address this challenge, a new predictor, called iCTX-Type, was developed by incorporating the dipeptide occurrence frequencies of a conotoxin sequence into a 400-D (dimensional) general pseudoamino acid composition, followed by the feature optimization procedure to reduce the sample representation from 400-D to 50-D vector. The overall success rate achieved by iCTX-Type via a rigorous cross-validation was over 91%, outperforming its counterpart (RBF network). Besides, iCTX-Type is so far the only predictor in this area with its web-server available, and hence is particularly useful for most experimental scientists to get their desired results without the need to follow the complicated mathematics involved.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 24991545 PMCID： PMC4058692 DOI： 10.1155/2014/286419

Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

1. Introduction

Being peptides consisting of about 10 to 30 amino acid residues, conotoxins are toxins secreted by cone snails for capturing prey and securing themselves. This kind of toxins can bind to various targets, such as G protein-coupled receptors (GPCRs), nicotinic acetylcholine, and neurotensin receptors. In particular, they display extremely high specificity and affinity for ion channels. Ion channels represent a class of membrane spanning protein pores that mediate the flux of ions in a variety of cell types. There are over 300 types of ion channels in a living cell [1]. Many crucial functions in life, such as heartbeat, sensory transduction, and central nervous system response, are controlled by cell signaling via various ion channels. Ion channel dysfunction may lead to a number of diseases, such as epilepsy, arrhythmia, and type II diabetes. These kinds of diseases are primarily treated with the drugs that modulate the ion channels concerned. Ion channels are also the important targets for treating virus diseases (see, e.g., [2-4]). Owing to their importance to human being's life, ion channels have become the 2nd most frequent targets for drug development, just next to GPCRs (G protein-coupled receptors) [5]. The following three kinds of ion channels are usually the targets by conotoxins: potassium (K) channel (Figure 1), sodium (Na) channel (Figure 2), and calcium (Ca) channel (Figure 3). Based on their functions and targeting objects, conotoxins can be classified into the following three types: (i) K-channel-targeting type; (ii) Na-channel-targeting type; and (iii) Ca-channel-targeting type.

Figure 1

A ribbon drawing to show the human potassium (K) channel. Reproduced from Chou [6] with permission.

Figure 2

A ribbon drawing to show the human sodium (Na) channel. Reproduced from Chou [6] with permission.

Figure 3

A ribbon drawing to show the calcium (Ca) channel from hepatitis C virus. Reproduced from [4] with permission.

Although conotoxins are lethally venomous because of blocking the transmission of nerve impulses, they have been widely used to treat chronic pain, epilepsy, spasticity, and cardiovascular diseases. Therefore, conotoxins have been regarded as important pharmacological tools for neuroscience research. It has been estimated that there are more than 100,000 kinds of conotoxins secreted by over 700 kinds of Conus in the world [8]. However, relatively much fewer conotoxins (about 3,000 peptides) have been experimentally confirmed and reported in literature and databases. Moreover, the records about the functions of conotoxins in public databases are no more than 300 items. Hence, developing a computational method to predict the functions of conotoxins has become a challenging task. In a pioneer work, Mondal et al. [9] proposed a method for predicting conotoxin superfamilies by using the pseudoamino acid composition approach [10, 11]. Subsequently, a series of studies have been reported in predicting conotoxin superfamilies (see, for example, [12-15]). All these methods yielded quite encouraging results, and each of them did play a role in stimulating the development of this area. However, none of these methods can be used to predict the types of conotoxins defined according to their targeting ion-channels. For instance, both delta-conotoxin-like Ac6.1 (UniProt accession number: P0C8V5) [16] and omega-conotoxin-like Ai6.2 [17] (UniProt accession number: P0CB10) belong to the conotoxin O1 superfamily. However, the former targets the voltage-gated sodium channels, while the latter targets the voltage-gated calcium channels. To deal with this problem, recently, a method was developed [7] to identify conotoxins among the aforementioned three types by using their sequence information alone. However, further work is needed in this regard due to the following reasons. (i) The prediction quality can be further improved. (ii) No web server for the prediction method in [7] was provided, and hence its usage is quite limited, especially for the majority of experimental scientists. The present study was devoted to develop a new predictor for identifying the conotoxins' types from the above two aspects. As elaborated in a comprehensive review [18] and conducted by a series of recent publications [19-28], to establish a really useful statistical predictor for a biological system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the biological samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (iii) introduce or develop a powerful algorithm (or engine) to operate the prediction; (iv) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (v) establish a user-friendly web server for the predictor that is accessible to the public. In what follows, let us describe how to deal with these procedures one by one.

2. Materials and Methods

2.1. Benchmark Dataset

The sequences of conotoxins and their functions were collected from the UniProt [29]. To ensure its quality, the benchmark dataset was constructed strictly according to the following criteria. (i) Included were only those peptides annotated with “conotoxin” and with the keyword of potassium, calcium, or sodium in their functional ontologies. (ii) Included were only those conotoxins with clear functional annotations based on experiment results. In other words, we excluded those annotated with “uncertain,” “predicted,” or “inferred from homology” because of lacking confidence. (iii) Excluded were those that were annotated with “immature” due to the incompleteness. (iv) Excluded were also those that contained any invalid amino acid codes, such as “B,” “X,” and “Z”. After going through the above procedures, we obtained 195 conotoxins, of which 37 belonged to the K-channel-targeting type, 86 to the Na-channel-targeting type, and 72 to the Ca-channel-targeting type. As elaborated in a comprehensive review [18], a benchmark dataset containing many redundant samples with high similarity would lack statistical representativeness. A predictor, if trained and tested by a benchmark dataset with many homologous sequences, might yield misleading results with overestimated accuracy [30]. To remove the homologous sequences from the benchmark dataset, a cutoff threshold of 25% was recommended [31] to exclude those protein/peptide sequences from the benchmark datasets that had ≥25% pairwise sequence identity to any other sample in the same subset. However, in this study we did not use such a stringent criterion because the currently available data did not allow us to do so. Otherwise, the numbers of peptides for some subsets would be very few to have statistical significance. As a compromise, we set the cutoff threshold at 80% and used the CD-HIT software [32] to remove those conotoxin samples that had ≥80% sequence identity to any other in a same subset. After such a screening procedure, we obtained 112 conotoxin samples for the benchmark dataset S, as formulated as follows: where the subset S K contains 24 conotoxin samples of K-channel-targeting type, S Na contains 43 samples of Na-channel-targeting type, and S Ca contains 45 samples of Ca-channel-targeting type, while the symbol ∪ represents the union in the set theory. The codes of 112 conotoxins and their sequences are given in Supporting Information S1 (see Supplementary Material available online at http://dx.doi.org/10.1155/2014/286419). Likewise, we also constructed an independent dataset S Ind as formulated by where S K Ind contains 12 K-conotoxins, S Na Ind contains 37 Na-conotoxins, and S Ca Ind contains 21 Ca-conotoxins. None of the samples in the independent dataset occurs in the dataset S of (1), and their detailed sequences are given in Supporting Information S2. For simplicity, hereafter, let us use “K-conotoxin,” “Na-conotoxin,” and “Ca-conotoxin” to represent K-channel-targeting type conotoxin, Na-channel-targeting type conotoxin, and Ca-channel-targeting type conotoxin, respectively.

2.2. The Dipeptide Mode of Pseudoamino Acid Composition

Given a conotoxin peptide P with L amino acids, how do we translate it into a mathematical expression for statistical prediction? This is one of the first important problems to develop a sequence-based predictor for identifying the type of a conotoxin. The most straightforward way to formulate the sample of a conotoxin peptide P with L residues is to use its entire amino acid sequence, as can be formulated by where R1 represents the 1st residue of the conotoxin peptide and R2 the 2nd residue of the peptide and so forth. Subsequently, we can utilize various sequence similarity search based tools, such as BLAST [33], to perform statistical prediction. Although this kind of sequence model was very straightforward and intuitive, unfortunately, it failed to work when a query conotoxin peptide did not have significant similarity to any of the peptide sequences in the training dataset. Thus, investigators turned to use vectors to represent the peptide samples. Another reason for them to do so is that the statistical samples in vector format are much easier to be handled than in sequence format by many existing operation engines, such as the correlation angle approach [34], covariance discriminant (CD) [27, 35–37], neural network [38-40], optimization approach [41], support vector machine (SVM) [22, 23, 42, 43], random forest [44, 45], conditional random field [20], nearest neighbor (NN) [46, 47]; K-nearest neighbor (KNN) [30], OET-KNN [48-50], fuzzy K-nearest neighbor [25, 51–55], ML-KNN algorithm [56], and SLLE algorithm [36]. The simplest vector used to represent a peptide or protein sample is its amino acid composition (AAC), as given as follows: where f (i = 1,2,…, 20) is the normalized occurrence frequency of the ith type of native amino acid in the peptide chain and T is the transpose operator. The AAC model was used by many in predicting various contributes of proteins (see, e.g., [41, 57–59]). However, as we can see from (4), when using AAC to represent a peptide or protein sample, all its sequence order information would be completely lost and hence limit the prediction quality. How can we formulate a peptide or protein sequence with a vector yet still keep considerable sequence order information? As reported in many recent publications, in order to incorporate the sequence order information, the pseudoamino acid composition [10, 11] or Chou's PseAAC [60] was proposed. Since the concept of PseAAC was proposed in 2001 [10], it has been penetrating into almost all the fields of protein attribute predictions (see, e.g., [61-78]). Recently, the concept of PseAAC was further extended to represent the feature vectors of DNA and nucleotides [19, 21, 23, 27, 79], as well as other biological samples (see, e.g., [80-82]). Because it has been widely and increasingly used, in addition to the web server “PseAAC” [83] built in 2008, recently three types of powerful open access software, called “PseAAC-Builder” [84], “propy” [85], and “PseAAC-General” [86], were established: the former two are for generating various modes of Chou's special PseAAC, while the 3rd one is for those of Chou's general PseAAC. According to a comprehensive review [18], the general PseAAC is formulated by where the component ψ (u = 1,2,…, Ω) and the dimension Ω will depend on how to extract the features from the peptide sequences concerned. For the current study, since the conotoxin sequences are not long (about 10–30 residues), we could just consider the sequence order information between two most contiguous amino acid residues. Thus, the dimension of the vector P in (5) is Ω = 20 × 20 = 400 and each of the components therein is given by where A, C,…, W, Y are, respectively, the single letter codes of 20 native amino acids, f(AA) is the occurrence frequency for the dipeptide AA in the conotoxin sequence (see (3)), and f(AC⁡) is for the dipeptide AC and so forth. The formulation defined by (5)-(6) is actually the dipeptide mode of PseAAC, which can be automatically generated by the PseAAC server [83] for a given peptide or protein sequence.

2.3. Feature Selection

The original raw features usually contain the redundant information and noise that may negatively affect the prediction quality [87]. Using the feature selection techniques to optimize the feature set can not only enhance the prediction accuracy but also provide useful insights for in-depth understanding of the action mechanism of conotoxins. According to the feature selection algorithm [87], the F-score function is defined by where is the average frequency of the ith feature in the kth dataset, the average frequency of the ith feature in the all datasets concerned, f is the frequencies of the ith feature of the jth sequence in the kth dataset, and N is the number of peptide samples in the kth dataset. The program called “fselect.py” was downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools to calculate F-score defined in (7). The larger the F-score is, the more likely it has a better discriminative capability [87]. Accordingly, we ranked the 400 dipeptides in (5) according to their F-scores. Subsequently, based on the ranked dipeptides, we performed the incremental feature selection (IFS) strategy to find an optimal subset of features that yielded the highest predictive accuracy. During the IFS procedure, the feature subset started with one feature with the highest F-score. A new feature subset was composed when one more feature with the second highest F-score was added. By adding these features sequentially from the higher to lower ranks, 400 feature sets would be obtained. The τth feature set can be formulated as For each of the 400 feature sets, a prediction model based on the proposed predictive algorithm was constructed and examined with the jackknife cross-validation on the benchmark dataset. By doing so, we obtained an IFS curve in a 2D (dimensional) Cartesian coordinate system with index τ as the abscissa (or X-coordinate) and the overall accuracy as the ordinate (or Y-coordinate). The optimal feature set is expressed as with which the IFS curve reached its peak. In other words, in the 2D coordinate system, when X = Θ, the value of the overall accuracy was the maximum. Thus, we used the Θ features to build the final predictor.

2.4. Support Vector Machine (SVM)

The classification algorithm used in this work was the support vector machine (SVM). The SVM has been widely used in the realm of bioinformatics (see, e.g., [19, 22, 23, 88–90]). Its basic principle is to transform the input vector into a high-dimension Hilbert space and seek a separating hyperplane with the maximal margin in this space by using the decision function: where is the ith training vector, the y represents the type of the ith training vector, and is a kernel function which defines an inner product in a high dimensional feature space. Because of its effectiveness and speed in nonlinear classification process, the radial basis kernel function (RBF) was used in the current work. The original SVM was designed for two-class problems. For multiclass problems, several strategies such as one-versus-rest (OVR), one-versus-one (OVO), and DAGSVM have been applied to extend the traditional SVM. In the present study, we used the OVO strategy for multiclass prediction. The concrete SVM software (LibSVM) was downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm. A grid search method was used to optimize the regularization parameter C and kernel parameter via the jackknife cross-validation. The search spaces for C and γ are [215, 2−5] and [2−5, 2−15] with steps of 2−1 and 2, respectively. For more details about SVM, see a monograph [91].

3. Results and Discussion

3.1. Test Method and Criteria

In statistical prediction, the independent dataset test, subsampling or K-fold crossover test and jackknife test are the three cross-validation methods often used to check a predictor for its accuracy [92]. However, among the three test methods, the jackknife test is deemed the least arbitrary that can always yield a unique result for a given benchmark dataset [18]. Accordingly, the jackknife test has been increasingly used and widely recognized by investigators to examine the quality of various predictors (see, e.g., [19, 21, 73, 75, 93–95]). Therefore, in this study we also adopted the jackknife test. In addition to an objective test method, we also need a set of metrics to reasonably measure the test outcome. Here, let us use the criterion proposed in [96, 97] to develop a set of more intuitive and easier-to-understand metrics; that is, the correct rates ΛK in predicting K-conotoxins, ΛNa in predicting Na-conotoxins, and ΛCa in predicting Ca-conotoxins are defined by where N K is the total number of the K-conotoxins investigated, while N Na K is the number of the K-conotoxins incorrectly predicted as the Na-conotoxins, and N Ca K is the number of the K-conotoxins incorrectly predicted as the Ca-conotoxins; N Na is the total number of the Na-conotoxins investigated, while N K Na is the number of the Na-conotoxins incorrectly predicted as the K-conotoxins and N Ca Na is the number of the Na-conotoxins incorrectly predicted as the Ca-conotoxins; and N Ca is the total number of the Ca-conotoxins investigated, while N Na Ca is the number of the Ca-conotoxins incorrectly predicted as the Na-conotoxins and N K Ca is the number of the Ca-conotoxins incorrectly predicted as the K-conotoxins. From (11), it follows that where OA stands for the overall accuracy and AA for the average accuracy.

3.2. The Optimal Features

As mentioned above, it would be no good for a sample vector to contain either too few or too many features. This is because the former would limit the prediction quality due to lack of information, while the latter would generate a lot of noise due to redundancy. Therefore, we should find a set of optimal features, for which there is minimal redundancy among themselves but maximal relevancy to the target to be predicted. In the present study, such an optimal feature-set is none but (9). Shown in Figure 4 is the IFS curve for the value of OA against the number of the counted features, as described in Section 2.3. As can be seen from there, the value of OA reached its peak of 91.1% when the top-ranked 50 dipeptides (Table 1) were taken into account.

Figure 4

A plot to show the IFS curve, where the abscissa and ordinate axis denote the number of features and the overall accuracy, respectively. As shown in the figure, the value of the overall accuracy reached its peak (91.1%) when the top-ranked 50 dipeptide features were taken into account.

Table 1

List of the 50 optimal features or dipeptides derived according to (7)–(9) as elaborated in the Section 2.3.

AA	AS	CC	CH	CS	DH	DN	EN	GA	GH
GL	GT	GY	HA	HL	HS	IY	KD	KK	KM
KP	LN	LV	MC	MY	ND	NQ	NS	PI	QK
QT	RC	RD	RF	RN	RT	RW	SC	SG	TE
TF	TT	VV	WG	WI	YD	YH	YL	YT	YY

The predictor thus obtained via the aforementioned procedures is called “iCTX-Type,” where “i” stands for “identify” and “CTX” for “conotoxin.” A comparison of the current predictor iCTX-Type with the one in [7] (i.e., to the best of our knowledge, it is the only existing predictor in this area) is given in Table 2, from which we can see the following. (i) For four of the five metrics defined in (10)-(11), iCTX-Type yielded higher scores than the method in [7]. Particularly, iCTX-Type achieved higher overall accuracy (OA) and average accuracy (AA). (ii) Compared with the method of [7] using 70 features, only 50 features were used in the present method (Table 1), indicating that the iCTX-Type is more efficient in excluding redundancy and noise as well as in capturing the core features.

Table 2

Comparison of the current method with the one in [7] by the jackknife test on the same benchmark dataset (Supporting Information S1) according to the metrics defined in (11)-(12).

Method	Number of features counted	Λ^K (%)	Λ^Na (%)	Λ^Ca (%)	AA (%)	OA (%)
RBF network^a	70	91.7	88.4	88.9	89.7	89.3
iCTX-Type^b	50	83.3	97.8	89.8	90.3	91.1

aSee [7].

bThis paper.

To further verify the performance of the current predictor, iCTX-Type was also used to identify the samples in the independent dataset S Ind (see Supporting Information S2), and the success rates (see (11)) thus obtained were 91.7%, 91.9%, and 90.5% for K-, Na-, and Ca-conotoxins, respectively. These results are fully consistent with those obtained by the jackknife test as given in Table 2, furtherindicating that the new predictor iCTX-Type is quite promising and holds a high potential to become a useful tool for in-depth studying ion channel-targeted conotoxins. To enhance the value of its practical applications [98], a web server for the new iCTX-Type predictor was established as described below.

3.3. Web-Server Guide

For the convenience of the vast majority of experimental scientists, below a step-by-step guide is provided for how to use the web server to get the desired results without the need to follow the mathematic equations that were presented in this paper just for the integrity in developing the predictor. Step 1. Open the web server at http://lin.uestc.edu.cn/server/iCTX-Type and you will see the top page of iCTX-Type on your computer screen, as shown in Figure 5. Click on the Read Me button to see a brief introduction about the predictor and the caveat when using it.

Figure 5

A screenshot to show the top page of the iCTX-Type web server. Its website address is http://lin.uestc.edu.cn/server/iCTX-Type.

Step 2. Either type or copy/paste the query peptide sequences into the input box at the center of Figure 5. The input sequence should be in the FASTA format. A sequence in FASTA format consists of a single initial line beginning with a greater-than symbol “>” in the first column, followed by lines of sequence data. The words right after the “>” symbol in the single initial line are optional and only used for the purpose of identification and description. All lines should be no longer than 120 characters and usually do not exceed 80 characters. The sequence ends if another line starting with a “>” appears; this indicates the start of another sample sequence. Example sequences in FASTA format can be seen by clicking on the Example button right above the input box. Step 3. Click on the Submit button to see the predicted result. For instance, when using the three peptide sequences as an input and clicking the Submit button, you will see the following shown on the screen of your computer: the outcome for the 1st query example is “Ca-conotoxin”; the outcome for the 2nd query sample is “K-conotoxin”; the outcome for the 3rd query sample is “Na-conotoxin.” All these results are fully consistent with the experimental observations. It takes only a few seconds for the above computation before the predicted result appears on your computer screen; the more number of query sequences, the longer time it usually needs. Step 4. Click on the Data button to download the benchmark datasets used to train and test the iCTX-Type predictor. Step 5. Click on the Citation button to find the relevant papers that document the detailed development and algorithm of iCTX-Type. Caveats. The input query sequences must be formed by the single-letter codes of the 20 native amino acids; any other characters such as “B,” “X,” “U,” and “Z” are invalid and should not be part of the peptide sequence.

4. Conclusion

It is anticipated that iCTX-Type may become a useful high throughput tool for both basic research and drug development, particularly for in-depth investigation into the mechanisms of ion-channels and developing new drugs to treat chronic pain, epilepsy, spasticity, and cardiovascular diseases, among others. It is instructive to point out that since the binding of conotoxins to ion-channel is highly selective and specific, the information obtained by iCTX-Type in identifying the types of conotoxins may be also very useful for designing ion channel inhibitors according to the Chou's distorted key theory as elaborated in [99] and briefed in a Wikipedia article at http://en.wikipedia.org/wiki/Chou's_distorted_key_theory_for_peptide_drugs. Supporting Information S1: The benchmark dataset 𝕊 contains 112 conotoxins, of which 24 belong to K-channel-targeting type, 43 to Na-channel-targeting type, and 45 to Ca-channel-targeting type. Supporting Information S2: The independent dataset 𝕊 Ind contains 70 conotoxins, of which 12 are of K-channel-targeting type, 37 of Na-channel-targeting type, and 21 of Ca-channel-targeting type. None of the samples listed here occurs in benchmark dataset 𝕊.

93 in total

1. Some insights into protein structural class prediction.

Authors: G P Zhou; N Assa-Munt
Journal: Proteins Date: 2001-07-01

2. Predicting subcellular localization of proteins in a hybridization space.

Authors: Yu-Dong Cai; Kuo-Chen Chou
Journal: Bioinformatics Date: 2004-02-05 Impact factor: 6.937

3. Prediction of protease types in a hybridization space.

Authors: Kuo-Chen Chou; Yu-Dong Cai
Journal: Biochem Biophys Res Commun Date: 2005-11-09 Impact factor: 3.575

Review 4. Structural studies of conotoxins.

Authors: Norelle L Daly; David J Craik
Journal: IUBMB Life Date: 2009-02 Impact factor: 3.885

5. An optimization approach to predicting protein structural class from amino acid composition.

Authors: C T Zhang; K C Chou
Journal: Protein Sci Date: 1992-03 Impact factor: 6.725

6. A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0.

Authors: Hong-Bin Shen; Kuo-Chen Chou
Journal: Anal Biochem Date: 2009-08-03 Impact factor: 3.365

7. Relation between amino acid composition and cellular location of proteins.

Authors: J Cedano; P Aloy; J A Pérez-Pons; E Querol
Journal: J Mol Biol Date: 1997-02-28 Impact factor: 5.469

8. SLLE for predicting membrane protein types.

Authors: Meng Wang; Jie Yang; Zhi-Jie Xu; Kuo-Chen Chou
Journal: J Theor Biol Date: 2005-01-07 Impact factor: 2.691

9. iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins.

Authors: Kuo-Chen Chou; Zhi-Cheng Wu; Xuan Xiao
Journal: PLoS One Date: 2011-03-30 Impact factor: 3.240

10. iEzy-drug: a web server for identifying the interaction between enzymes and drugs in cellular networking.

Authors: Jian-Liang Min; Xuan Xiao; Kuo-Chen Chou
Journal: Biomed Res Int Date: 2013-11-26 Impact factor: 3.411

57 in total

1. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Authors: Guang-Hui Liu; Hong-Bin Shen; Dong-Jun Yu
Journal: J Membr Biol Date: 2015-11-12 Impact factor: 1.843

2. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

Authors: Muhammad Kabir; Maqsood Hayat
Journal: Mol Genet Genomics Date: 2015-08-30 Impact factor: 3.291

3. repRNA: a web server for generating various feature vectors of RNA sequences.

Authors: Bin Liu; Fule Liu; Longyun Fang; Xiaolong Wang; Kuo-Chen Chou
Journal: Mol Genet Genomics Date: 2015-06-18 Impact factor: 3.291

4. TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.

Authors: Xue He; Ke Han; Jun Hu; Hui Yan; Jing-Yu Yang; Hong-Bin Shen; Dong-Jun Yu
Journal: J Membr Biol Date: 2015-06-10 Impact factor: 1.843

Review 5. Structural Variability in the RLR-MAVS Pathway and Sensitive Detection of Viral RNAs.

Authors: Qiu-Xing Jiang
Journal: Med Chem Date: 2019 Impact factor: 2.745

6. Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou's General Pseudo Amino Acid Composition.

Authors: Hong-Liang Zou; Xuan Xiao
Journal: J Membr Biol Date: 2016-04-25 Impact factor: 1.843

7. Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis.

Authors: Bin Liu; Junjie Chen; Xiaolong Wang
Journal: Mol Genet Genomics Date: 2015-04-21 Impact factor: 3.291

8. Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou's General Pseudo Amino Acid Composition.

Authors: Khurshid Ahmad; Muhammad Waris; Maqsood Hayat
Journal: J Membr Biol Date: 2016-01-08 Impact factor: 1.843

9. Analysis of Conformational B-Cell Epitopes in the Antibody-Antigen Complex Using the Depth Function and the Convex Hull.

Authors: Wei Zheng; Jishou Ruan; Gang Hu; Kui Wang; Michelle Hanlon; Jianzhao Gao
Journal: PLoS One Date: 2015-08-05 Impact factor: 3.240

Review 10. Survey of Natural Language Processing Techniques in Bioinformatics.

Authors: Zhiqiang Zeng; Hua Shi; Yun Wu; Zhiling Hong
Journal: Comput Math Methods Med Date: 2015-10-07 Impact factor: 2.238

AA	AS	CC	CH	CS	DH	DN	EN	GA	GH
GL	GT	GY	HA	HL	HS	IY	KD	KK	KM
KP	LN	LV	MC	MY	ND	NQ	NS	PI	QK
QT	RC	RD	RF	RN	RT	RW	SC	SG	TE
TF	TT	VV	WG	WI	YD	YH	YL	YT	YY

AA	AS	CC	CH	CS	DH	DN	EN	GA	GH
GL	GT	GY	HA	HL	HS	IY	KD	KK	KM
KP	LN	LV	MC	MY	ND	NQ	NS	PI	QK
QT	RC	RD	RF	RN	RT	RW	SC	SG	TE
TF	TT	VV	WG	WI	YD	YH	YL	YT	YY

AA	AS	CC	CH	CS	DH	DN	EN	GA	GH
GL	GT	GY	HA	HL	HS	IY	KD	KK	KM
KP	LN	LV	MC	MY	ND	NQ	NS	PI	QK
QT	RC	RD	RF	RN	RT	RW	SC	SG	TE
TF	TT	VV	WG	WI	YD	YH	YL	YT	YY