Literature DB >> 23189138

Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.

Wei-Zhong Lin1, Jian-An Fang, Xuan Xiao, Kuo-Chen Chou.   

Abstract

The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called "iSMP-Grey" was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23189138      PMCID: PMC3506597          DOI: 10.1371/journal.pone.0049040

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Malaria is a potentially fatal tropical disease caused by a parasite known as Plasmodium. Four distinct species of plasmodium that can produce the disease in different forms: Plasmodium falciparum, Plasmodium vivax, Plasmodium ovale, and Plasmodium malaria. Of these four, Plasmodium falciparum, or P. falciparum, is the most widespread and dangerous. If not timely treated, it may lead to the fatal cerebral malaria, which remains one of the most devastating global health crises. Nearly half of the world's population is still at risk from its infection. According to the World Health Organization's 2010 World Malaria Report (http://www.who.int/malaria/world_malaria_report_2010/worldmalariareport2010.pdf), there are more than 225 million cases of malaria each year, killing around 781,000 people, corresponding to 2.23% of deaths worldwide. Malaria is more dangerous for women and children. It was stated in the World Health Organization's 2011 World Malaria Report (http://www.who.int/malaria/world_malaria_report_2011/9789241564403_eng.pdf) that 81% of cases and 91% of deaths occurred in the African Region, mostly involving children under five and women with pregnancy. Malaria was usually associated with poverty; actually it was a cause of poverty and a major hindrance for economic development. The situation has become even worse over the last few years with the increase in resistance to the drugs normally used to combat the parasites that cause the disease. Therefore, one strategy to deal with the growing malaria problem is to identify and characterize new and durable antimalarial drug targets, the majority of which are parasite proteins [1]. Parasite secretes an array of proteins within the host erythrocyte to facilitate its own survival within the host cell. These proteins can serve as potential drug or vaccine targets. However, it is difficult to experimentally identify the secretory proteins of P. falciparum owing to the complex nature of parasite. With the completion of Plasmodium genome sequence, it is both challenging and urgent to develop an automatic method or high throughput tool for identifying secretory proteins of P. falciparum. Actually, some efforts have been made in this regard. In a pioneer study, Verma et al. [2] proposed a method for identifying proteins secreted by malaria parasite. In their prediction method, the operation engine was the Support Vector Machine (SVM) while the protein samples were formulated with the amino acid composition, dipeptide composition, and position specific scoring matrix (PSSM) [3]. Subsequently, Zuo and Li [4] introduced the K-minimum increment of diversity (K-MID) approach to predict secretory proteins of malaria parasite based on grouping of amino acids. Meanwhile, various studies around this topic were also carried out [5], [6], [7], [8], [9]. In the past, various predictors for protein systems were developed by incorporating the evolutionary information via PSSM [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]. In the above papers, however, only the statistical information of PSSM [3] was utilized but the inner interactions among the constituent amino acid residues in a protein sample, or its sequence-order effects, were ignored. To avoid completely lose the sequence-order information associated with PSSM, the concept of pseudo amino acid composition (PseAAC) [21], [22] was utilized to incorporate the evolutionary information into the formulation of a protein sample, as done in predicting protein subcellular localization [23], [24], 25, predicting protein fold pattern [26], identifying membrane proteins and their types [27], predicting enzyme functional classes and subclasses [28], identifying protein quaternary structural attribute [29], predicting antibacterial peptides [30], predicting allergenic proteins [31], and identifying proteases and their types [32]. The present study was initiated in an attempt to develop a new and more powerful predictor for identifying the secretory proteins of malaria parasite by incorporating the sequence evolution information into PseAAC via a grey system model [33]. According to a recent review [34], to establish a really useful statistical predictor for a protein system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the protein samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (iii) introduce or develop a powerful algorithm (or engine) to operate the prediction; (iv) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (v) establish a user-friendly web-server for the predictor that is accessible to the public. Below, let us describe how to deal with these steps.

Materials and Methods

1. Benchmark Dataset

The benchmark dataset used in this study was taken from Verma et al. [2]. The dataset can be formulated aswhere contains 252 secretory proteins of malaria parasite, contains 252 non-secretory proteins of malaria parasite, and the symbol represents the union in the set theory. The same benchmark dataset was also used by Zuo and Li [4]. For reader's convenience, the sequences of the 252 secretory proteins in and those in are given in Supporting Information S1.

2. A Novel PseAAC Feature Vector by Incorporating Sequence Evolution Information via the Grey System Theory

To develop a powerful predictor for a protein system, one of the keys is to formulate the protein samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted [34]. To realize this, the pseudo amino acid composition (PseAAC) was proposed [21] to replace the simple amino acid composition (AAC) for representing the sample of a protein. Ever since the concept of PseAAC was introduced in 2001 [21], it has penetrated into almost all the fields of protein attribute predictions, such as predicting protein submitochondrial localization [35], predicting protein structural class [36], predicting DNA-binding proteins [37], identifying bacterial virulent proteins [38], predicting metalloproteinase family [39], predicting protein folding rate [40], predicting GABA(A) receptor proteins [41], predicting protein supersecondary structure [42], identifying protein quaternary structural attribute [43], predicting cyclin proteins [44], classifying amino acids [45], predicting enzyme family class [46], identifying risk type of human papillomaviruses [47], and discriminating outer membrane proteins [48], among many others (see a long list of references cited in [49]). Because it has been widely used, recently a powerful software called PseAAC-Builder [49] was proposed for generating various special modes of PseAAC, in addition to the web-server PseAAC [50] established in 2008. According to a recent review [34], the general form of PseAAC for a protein can be formulated aswhere is a transpose operator, while the subscript is an integer and its value as well as the components , , … will depend on how to extract the desired information from the amino acid sequence of . The form of can cover almost all the various modes of PseAAC. Particularly, it can be used to reflect much more essential core features deeply hidden in complicated protein sequences, such as those for the functional domain (FunD) information [51], [52], [53] (cf. Eqs.9–10 of [34]), gene ontology (GO) information [54], [55] (cf. Eqs.11–12 of [34]), and sequence evolution information [3] (cf. Eqs.13–14 of [34]). In this study, we are to use a novel approach to define the elements in . As is well known, biology is a natural science with historic dimension. All biological species have developed starting out from a very limited number of ancestral species. It is true for protein sequence as well [56]. Their evolution involves changes of single residues, insertions and deletions of several residues [57], gene doubling, and gene fusion. With these changes accumulated for a long period of time, many similarities between initial and resultant amino acid sequences are gradually eliminated, but the corresponding proteins may still share many common attributes, such as having basically the same biological function and residing at a same subcellular location. To incorporate this kind of sequence evolution information into the PseAAC of , let us use the information of the PSSM (Position-Specific Scoring Matrix) [3], as described below. According to [3], the sequence evolution information of protein with amino acid residues can be expressed by a matrix, as given bywhere represents the original score of amino acid residue in the i-th sequential position of the protein that is being changed to amino acid type j during the evolution process. Here, the numerical codes 1, 2, …, 20 are used to denote the 20 native amino acid types according to the alphabetical order of their single character codes [58]. The scores in were generated by using PSI-BLAST [3] to search the UniProtKB/Swiss-Prot database (Release 2010_04 of 23-Mar-2010) through three iterations with 0.001 as the -value cutoff for multiple sequence alignment against the sequence of the protein . In order to make every element in within the range of 0–1, a conversion was performed through the standard sigmoid function to make it becomewhere Now, let us describe how to extract the useful information from via a grey system model. According to the grey system theory [33], if the information of a system investigated is fully known, it is called a “white system”; if completely unknown, a “black system”; if partially known, a “grey system”. The model developed based on such a theory is called “grey model”, which is a kind of nonlinear and dynamic model formulated by a differential equation. The grey model is particularly useful for solving complicated problems that are lack of sufficient information, or need to process uncertain information and reduce random effects of acquired data. In the grey system theory, an important and generally used model is called GM(1,1) [33]. It is quite effective for monotonic series, with good simulating effect and small error, as reflected by the fact that using the GM(1,1) model has remarkably improved the success rates in predicting protein structural classes [59]. However, if the series concerned are not monotonic, the simulating effect of the GM(1,1) model would not be good and its error might be quite large. To overcome such a shortcoming, in this study we are to use a different grey system model called GM(2,1) [33], which can be effectively used to deal with the oscillation series. To extract the serial information of , let us consider the components in its column, i.e., , as an initial series. Obviously, the j-th column of the is an oscillation series but not monotonic as in the case investigated in [59]. To deal with such a problem, instead of the GM(1,1), let us adopt the GM(2,1) model here. According to the GM(2,1) model [33], we have the following 2nd-order grey differential equation with one variable:whereandIn , the coefficients and are associated with the developing coefficients, and the influence coefficient. Actually, , , and can be expressed as the components of a 3D vector as given byin which the components , , and can be directly derived from the following equationwhereandAccordingly, the Ω elements in are given bywhere are the occurrence frequencies of the 20 different types of amino acids in the protein sample concerned, and , , and are the weight factors that will be determined by optimizing the performance of the predictor, and their concrete values will be explicitly given in the footnote of . Substituting into , we immediately obtain a feature vector with components. The 60D feature vector thus derived will be used to represent the samples of protein sequences for further study.
Table 1

A comparison between iSMP-Grey and K-MID by the jackknife test.

PredictorSn (%)Sp (%)Acc (%)MCC
iSMP-Greya 93.2596.46.94.840.90
K-MIDb 81.7599.6090.670.83

The parameters used: , , and for ; and for the LIBSVM operation engine.

From ref.[4].

The parameters used: , , and for ; and for the LIBSVM operation engine. From ref.[4].

3. The SVM Operation Engine

In this study, the Support Vector Machine (SVM) algorithm was adopted to perform the prediction. The SVM software was implemented from the LIBSVM package [60]. The software thus obtained provided a simple interface by which the users can easily perform classification prediction by properly selecting the built-in parameters and . In this study we searched the optimal parameters and by the grid arithmetic built in the LIBSVM software, and their optimal values are also explicitly given in the footnote of . Meanwhile, the MATLAB windows were adopted in developing the classifier. The predictor thus established is called iSMP-Grey, which can be used to identify whether a protein of malaria parasite is secretory or non-secretory according to its sequence information alone.

4. Web-Server and User Guide

To enhance the value of its practical applications, a web-server for iSMP-Grey was established. Moreover, for the convenience of the vast majority of experimental scientists, here let us provide a step-by-step guide to show how the users can easily get the desired result by means of the web-server without the need to follow the above mathematical equations for its development and integrity.

Step 1

Open the web server at the site http://www.jci-bioinfo.cn/iSMP-Grey and you will see the top page of the predictor on your computer screen, as shown in . Click on the Read Me button to see a brief introduction about iSMP-Grey predictor and the caveat when using it.
Figure 1

A semi-screenshot to show the top page of the iSMP-Grey web-server.

Its web-site address is at http://www.jci-bioinfo.cn/iSMP-Grey.

A semi-screenshot to show the top page of the iSMP-Grey web-server.

Its web-site address is at http://www.jci-bioinfo.cn/iSMP-Grey.

Step 2

Either type or copy and paste the query protein sequence into the input box at the center of . The input sequence should be in the FASTA format. A sequence in FASTA format consists of a single initial line beginning with a greater-than symbol (“>”) in the first column, followed by lines of sequence data. The words right after the “>” symbol in the single initial line are optional and only used for the purpose of identification and description. The sequence ends if another line starting with a “>” appears; this indicates the start of another sequence. The example sequences in FASTA format can be seen by clicking on the Example button right above the input box. The maximum number of query protein sequences allowed for each submission is 10.

Step 3

Click on the Submit button to see the predicted result. For example, if you use the two query peptide sequences in the Example window as the input, about 2–3 minutes after clicking the Submit button, you will see on your screen that the 1st query protein is a “Secretory Protein of Malaria Parasite”, and that the 2nd query protein 2 is “Non-Secretory Protein of Malaria parasite”. All these results are fully consistent with the experimental observations.

Step 4

Click on the Citation button to find the relevant paper that documents the detailed development and algorithm of iSMP-Grey.

Step 5

Click on the Data button to download the benchmark dataset used to train and test the iSMP-Grey predictor.

Step 6

The program is also available by clicking the button download on the lower panel of .

5. Performance Evaluation

In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling (K-fold cross-validation) test, and jackknife test. However, as elaborated by a recent review [34] and demonstrated by Eqs.28–32 therein, among the three cross-validation methods, the jackknife test is deemed the least arbitrary and most objective because it can always yield a unique result for a given benchmark dataset, and hence has been widely recognized and increasingly used by investigators for examining the accuracy of various predictors (see, e.g., [36], [38], [39], [41], [44], [47], [61], [62], [63], [64], [65], [66]). Accordingly, the jackknife test was also adopted in this study to examine the anticipated success rates of the current predictor. Also, to use a more intuitive and easier-to-understand method to measure the prediction quality, the rates of correct predictions for the secretory proteins of malaria parasite in dataset and the non-secretory proteins of malaria parasite in dataset are respectively defined by [67] where is the total number of the secreted proteins investigated and the number of the secreted proteins missed in the predicted result; the total number of the non-secreted proteins investigated and the number of the non-secreted proteins missed in the predicted result. The overall success prediction rate is given by [68] It is clear from – that, if and only if none of the secreted proteins and non-secreted proteins are mispredicted, i.e., and , we have the overall success rate . Otherwise, the overall success rate would be smaller than 1. It is instructive to point out that the following equation is often used in literatures for examining the performance quality of a predictorwhere TP represents the true positive; TN, the true negative; FP, the false positive; FN, the false negative; Sn, the sensitivity; Sp, the specificity; Acc, the accuracy; MCC, the Mathew's correlation coefficient. The relations between the symbols in and those in are given byIt follows by substituting into and noting As can be obviously seen from the above equation, when meaning none of the secreted proteins was missed in prediction, we have the sensitivity ; while meaning all the secreted proteins were missed in prediction, we have the sensitivity . Likewise, when meaning none of the non-secreted proteins was incorrectly predicted as secreted protein, we have the specificity ; while meaning all the non-secreted proteins were incorrectly predicted as secreted proteins, we have the specificity . When meaning that none of the secreted proteins in the dataset and non of non-secreted proteins in was incorrectly predicted, we have the overall accuracy ; while and meaning that all the secreted proteins in the dataset and all the non-secreted proteins in were incorrectly predicted, we have the overall accuracy . The MCC correlation coefficient is usually used for measuring the quality of binary (two-class) classifications. When meaning that none of the secreted proteins in the dataset and none of the non-secreted proteins in was incorrectly predicted, we have ; when and we have meaning no better than random prediction; when and we have meaning total disagreement between prediction and observation. As we can see from the above discussion, it is much more intuitive and easier-to-understand when using to examine a predictor for its sensitivity, specificity, overall accuracy, and Mathew's correlation coefficient.

Results and Discussion

The results obtained with iSMP-Grey on the benchmark dataset of by the jackknife test are given in , where for facilitating comparison the results obtained by the K-MID predictor [4] on the same benchmark dataset with the same test method are also given. As we can see from , the overall success rate by iSMP-Grey was 94.84% with , which are remarkably higher than those by the K-MID predictor [4]. Moreover, a comparison was also made with the PSEApred predictor [2]. Although the results by PSEApred as reported by Verma et al. [2] were also based on the same benchmark dataset of , the test method used by these authors for PSEApred was 5-fold cross-validation. As elaborated in [34], this would make the test without a unique result as demonstrated below. For the current case, consists of and , where contains 252 secretory proteins of malaria parasite, and contains 252 non-secretory proteins of malaria parasite. Substituting these data into Eqs.28–29 of [34] with (number of groups for classification) and (number of folds for cross-validation), we obtainwhere the symbol Int is the integer-truncating operator meaning to take the integer part for the number in the bracket right after it. The result of indicates that the number of possible combinations of taking one-fifth proteins from each of the two subsets, and , for conducting the 5-fold cross-validation will be greater than , which is an astronomical figure, too large to be practically feasible. Actually, in their study [2], Verma et al. only randomly picked 100 different combinations from the possible combinations (cf. ) to perform the 5-fold cross-validation, yielding 100 different results located within a certain region. Therefore, in their report, rather than a single figure but a figures region was used to show their test result. For example, according to their report ( ), , meaning that the lowest one of the 100 overall success rates obtained by the PSEApred predictor [2] was 71.03%, while the highest one was 92.66%. To make the comparison of iSMP-Grey with PSEApred [2] under the same condition with the same test method, we also randomly picked 100 different combinations as done by Verma et al. [2] to perform the 5-fold cross-validation test with iSMP-Grey, and the corresponding results thus obtained are given in as well. As we can see from the table, not only the average rates obtained by the iSMP-Grey predictor are remarkably higher than those by the PSEApred predictor [2], but the corresponding region widths by the former are also significantly narrower than those by the latter, indicating the success rates by the iSMP-Grey are not only higher but also more stable than those by the PSEApred predictor [2].
Table 2

A comparison between iSMP-Grey and PSEApred by 5-fold cross-validation test.

PredictorSn (%)c Sp (%)c Acc (%)c MCCc
iSMP-Greya 90.48∼92.4694.05∼98.0292.86∼94.840.87∼0.90
PSEApredb 73.41∼97.2244.84∼10071.03∼92.660.49∼0.86

See footnote a of .

From ref. [2].

See the discussion in the text and for why the results obtained by the 5-fold cross-validation test were not unique.

See footnote a of . From ref. [2]. See the discussion in the text and for why the results obtained by the 5-fold cross-validation test were not unique. All the above results have indicated that the novel pseudo amino acid composition formulated via the grey system model GM(2,1) can more effectively incorporate the protein sequence evolution information so as to remarkably enhance the success rates of the iSMP-Grey predictor in identifying the secretory proteins of malaria parasite. It is anticipated that iSMP-Grey may become a useful high throughput tool for both basic research and drug development in the relevant areas. The benchmark dataset includes 504 proteins, classified into 252 secretory proteins of malaria parasite and 252 non-secretory proteins. (PDF) Click here for additional data file.
  65 in total

1.  T and B-cell epitopes prediction of Iranian saffron (Crocus sativus) profilin by bioinformatics tools.

Authors:  Babak Saffari; Hassan Mohabatkar; Sasan Mohsenzadeh
Journal:  Protein Pept Lett       Date:  2008       Impact factor: 1.890

2.  DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.

Authors:  K Krishna Kumar; Ganesan Pugalenthi; P N Suganthan
Journal:  J Biomol Struct Dyn       Date:  2009-06

3.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image.

Authors:  Xuan Xiao; Pu Wang; Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2008-06-24       Impact factor: 2.691

4.  Predicting protein fold pattern with functional domain and sequential evolution information.

Authors:  Hong-Bin Shen; Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2008-10-19       Impact factor: 2.691

5.  ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information.

Authors:  Kuo-Chen Chou; Hong-Bin Shen
Journal:  Biochem Biophys Res Commun       Date:  2008-09-05       Impact factor: 3.575

6.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition.

Authors:  D N Georgiou; T E Karakasidis; J J Nieto; A Torres
Journal:  J Theor Biol       Date:  2008-11-12       Impact factor: 2.691

7.  Real value prediction of protein solvent accessibility using enhanced PSSM features.

Authors:  Darby Tien-Hao Chang; Hsuan-Yu Huang; Yu-Tang Syu; Chih-Peng Wu
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

8.  Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization.

Authors:  Loris Nanni; Alessandra Lumini
Journal:  Amino Acids       Date:  2008-01-04       Impact factor: 3.520

9.  Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles.

Authors:  Ruchi Verma; Ajit Tiwari; Sukhwinder Kaur; Grish C Varshney; Gajendra Ps Raghava
Journal:  BMC Bioinformatics       Date:  2008-04-16       Impact factor: 3.169

Review 10.  Heterologous expression of plasmodial proteins for structural studies and functional annotation.

Authors:  Lyn-Marie Birkholtz; Gregory Blatch; Theresa L Coetzer; Heinrich C Hoppe; Esmaré Human; Elizabeth J Morris; Zoleka Ngcete; Lyndon Oldfield; Robyn Roth; Addmore Shonhai; Linda Stephens; Abraham I Louw
Journal:  Malar J       Date:  2008-10-01       Impact factor: 2.979

View more
  10 in total

1.  iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

Authors:  Muhammad Kabir; Maqsood Hayat
Journal:  Mol Genet Genomics       Date:  2015-08-30       Impact factor: 3.291

2.  Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome.

Authors:  Yongchun Zuo; Pengfei Zhang; Li Liu; Tao Li; Yong Peng; Guangpeng Li; Qianzhong Li
Journal:  Chromosome Res       Date:  2014-04-12       Impact factor: 5.239

3.  A multilabel model based on Chou's pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types.

Authors:  Chao Huang; Jing-Qi Yuan
Journal:  J Membr Biol       Date:  2013-04-02       Impact factor: 1.843

4.  iGPCR-drug: a web server for predicting interaction between GPCRs and drugs in cellular networking.

Authors:  Xuan Xiao; Jian-Liang Min; Pu Wang; Kuo-Chen Chou
Journal:  PLoS One       Date:  2013-08-27       Impact factor: 3.240

5.  iNR-Drug: predicting the interaction of drugs with nuclear receptors in cellular networking.

Authors:  Yue-Nong Fan; Xuan Xiao; Jian-Liang Min; Kuo-Chen Chou
Journal:  Int J Mol Sci       Date:  2014-03-19       Impact factor: 5.923

6.  Application of a novel grey self-memory coupling model to forecast the incidence rates of two notifiable diseases in China: dysentery and gonorrhea.

Authors:  Xiaojun Guo; Sifeng Liu; Lifeng Wu; Lingling Tang
Journal:  PLoS One       Date:  2014-12-29       Impact factor: 3.240

7.  Detecting Succinylation sites from protein sequences using ensemble support vector machine.

Authors:  Qiao Ning; Xiaosa Zhao; Lingling Bao; Zhiqiang Ma; Xiaowei Zhao
Journal:  BMC Bioinformatics       Date:  2018-06-25       Impact factor: 3.169

8.  Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation.

Authors:  Wangren Qiu; Chunhui Xu; Xuan Xiao; Dong Xu
Journal:  Curr Genomics       Date:  2019-08       Impact factor: 2.236

9.  iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.

Authors:  Yan Xu; Jun Ding; Ling-Yun Wu; Kuo-Chen Chou
Journal:  PLoS One       Date:  2013-02-07       Impact factor: 3.240

10.  iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components.

Authors:  Wang-Ren Qiu; Xuan Xiao; Kuo-Chen Chou
Journal:  Int J Mol Sci       Date:  2014-01-24       Impact factor: 5.923

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.