Literature DB >> 25121969

iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition.

Yan Xu¹, Xin Wen¹, Li-Shu Wen², Ling-Yun Wu³, Nai-Yang Deng⁴, Kuo-Chen Chou⁵.

Abstract

Nitrotyrosine is one of the post-translational modifications (PTMs) in proteins that occurs when their tyrosine residue is nitrated. Compared with healthy people, a remarkably increased level of nitrotyrosine is detected in those suffering from rheumatoid arthritis, septic shock, and coeliac disease. Given an uncharacterized protein sequence that contains many tyrosine residues, which one of them can be nitrated and which one cannot? This is a challenging problem, not only directly related to in-depth understanding the PTM's mechanism but also to the nitrotyrosine-based drug development. Particularly, with the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop a high throughput tool in this regard. Here, a new predictor called "iNitro-Tyr" was developed by incorporating the position-specific dipeptide propensity into the general pseudo amino acid composition for discriminating the nitrotyrosine sites from non-nitrotyrosine sites in proteins. It was demonstrated via the rigorous jackknife tests that the new predictor not only can yield higher success rate but also is much more stable and less noisy. A web-server for iNitro-Tyr is accessible to the public at http://app.aporc.org/iNitro-Tyr/. For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process. It has not escaped our notice that the approach presented here can be also used to deal with the other PTM sites in proteins.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 25121969 PMCID： PMC4133382 DOI： 10.1371/journal.pone.0105018

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

As one of the post-translational modifications (PTMs) of proteins, nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species such as peroxynitrite anion and nitrogen dioxide ( ). Compared with the fluids from healthy people, a remarkably increased level of nitrotyrosine is detected in those suffering from rheumatoid arthritis, septic shock, and coeliac disease. Accordingly, knowledge of nitrotyrosine sites in proteins is very useful for both basic research and drug development. Although conventional experimental methods did provide useful insight into the biological roles of tyrosine nitration [1]–[3], it is time-consuming and expensive to determine the nitrotyrosine sites based on the experimental approach alone. Particularly, identification of endogenous 3-NTyr modifications remains largely elusive (see, e.g., [4]–[7]). With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational methods for identifying the nitrotyrosine sites in proteins. The present study was initiated in an attempt to propose a new method for identifying the nitrotyrosine sites in proteins in hope that it can play a complementary role with the existing methods in this area.

Figure 1

A schematic drawing to show protein nitrotyrosine.

As summarized in [8] and demonstrated in a series of recent publications [9]–[21], to establish a really useful statistical predictor for a biological system, we need to consider the following procedures: (i) construct or select a valid benchmark dataset to train and test the predictor; (ii) formulate the biological samples with an effective mathematical expression that can truly capture their essence and intrinsic correlation with the target to be predicted; (iii) introduce or develop a powerful algorithm (or engine) to operate the prediction; (iv) properly perform cross-validation tests to objectively evaluate the anticipated accuracy; (v) establish a user-friendly web-server that is accessible to the public. Below, let us describe how to deal with these steps one by one.

Materials and Methods

1. Benchmark Dataset

To develop a statistical predictor, it is fundamentally important to establish a reliable and stringent benchmark dataset to train and test the predictor. If the benchmark dataset contains some errors, the predictor trained by it must be unreliable and the accuracy tested by it would be completely meaningless. For facilitating description later, let us adopt the Chou’s peptide formulation here that was used for studying HIV protease cleavage sites [22], [23], specificity of GalNAc-transferase [24], and signal peptide cleavage sites [25]. According to Chou’s scheme, a potential nitrotyrosine peptide, i.e., a peptide with Tyr (namely Y) located at its center ( ), can be expressed aswhere the subscript is an integer, represents the upstream amino acid residue from the center, the downstream amino acid residue, and so forth. A peptide can be further classified into the following categories:where represents a true nitrotyrosine peptide, a false nitrotyrosine peptide, and represents “a member of” in the set theory.

Figure 2

An illustration to show Chou’s scheme for a peptide of residues with tyrosine (Y) at the center.

Adapted from Chou [55], [76] with permission.

An illustration to show Chou’s scheme for a peptide of residues with tyrosine (Y) at the center.

Adapted from Chou [55], [76] with permission. As pointed out by a comprehensive review [26], there is no need to separate a benchmark dataset into a training dataset and a testing dataset for examining the performance of a prediction method if it is tested by the jackknife test or subsampling (K-fold) cross-validation test. Thus, the benchmark dataset for the current study can be formulated aswhere only contains the samples of , i.e., the nitrotyrosine peptides; only contains the samples of , i.e., the non-nitrotyrosine peptide (cf. Eq. 2); and represents the symbol for “union” in the set theory. Since the length of the peptide is (Eq. 1), the benchmark dataset with different values of will contain peptides of different numbers of amino acid residues, as formulated by The detailed procedures to construct are as follows. (i) Its elements were derived based on the same 546 source proteins used in [27] that contain 1,044 nitrotyrosine sites (see columns 1 and 2 of Supporting Information S1). (ii) Slide a flexible window of amino acids ( ) along each of the 546 protein sequences taken from the Uni-Prot database (version 2014_01). (iii) Collect only those peptide segments with Y (tyrosine) at the center. (iv) If the upstream or downstream in a protein was less than , the lacking residue was filled with a dummy residue “X” [28]. (v) Those peptide samples thus obtained were put into the positive subset if their centers have been experimentally confirmed as the nitrotyrosine sites; otherwise, into the negative subset .

Figure 3

Illustration to show the peptide segment highlighted by sliding the scaled window

Illustration to show the peptide segment highlighted by sliding the scaled window

along a protein sequence. During the sliding process, the scales on the window are aligned with different amino acids so as to define different peptide segments. When, and only when, the scale 0 is aligned with Y (tyrosine), is the peptide segment seen within the window regarded as a potential nitrotyrosine peptide. Adapted from Chou [55], [77] with permission. By following the aforementioned procedures, five such benchmark datasets (,,,, and ) had been constructed. Each of these datasets contained 1,044 nitrotyrosine peptides and 7,669 non-nitrotyrosine peptides. Note that the sample numbers thus obtained have some minor difference with those in [27]. This is because some proteins originally used in [27] have been removed or replaced in the updated version of the Uni-Prot database. However, it was observed via preliminary trials that when , i.e., the peptide samples concerned were formed by 19 residues, the corresponding results were most promising (see and ). Accordingly, we choose as the benchmark dataset for further investigation. Thus, Eq. 3 can be reduced towhere , containing 1,044 nitrotyrosine peptide samples, and containing 7,669 non-nitrotyrosine peptide samples. The detailed 19-tuple peptide sequences and their positions in proteins are given in Supporting Information S1.

Figure 4

A sequence logo plot to show the difference between the positive and negative peptides.

The window’s size is 19 when . See Eq. 1 and the legend of Fig. 3 for further explanation.

Figure 5

A plot to show the different ROC curves obtained by the 10-fold cross-validation under different values.

As we can see, when , the corresponding AUC (i.e., the area under its curve) is the largest, meaning the most promising compared with the other values of .

A sequence logo plot to show the difference between the positive and negative peptides.

The window’s size is 19 when . See Eq. 1 and the legend of Fig. 3 for further explanation.

A plot to show the different ROC curves obtained by the 10-fold cross-validation under different values.

As we can see, when , the corresponding AUC (i.e., the area under its curve) is the largest, meaning the most promising compared with the other values of .

2. Feature Vector and Pseudo Amino Acid Composition

One of the most important but also most difficult problems in computational biology today is how to effectively formulate a biological sequence with a discrete model or a vector, yet still keep considerable sequence order information. This is because all the existing operation engines, such as correlation angle approach [29], covariance discriminant [30], neural network [31], support vector machine (SVM) [32], random forest [33], conditional random field [28], K-nearest neighbor (KNN) [34], OET-KNN [35], Fuzzy K-nearest neighbor [36], ML-KNN algorithm [37], and SLLE algorithm [30], can only handle vector but not sequence samples. However, a vector defined in a discrete model may totally miss the sequence-order information. To deal with such a dilemma, the approach of pseudo amino acid composition [38] or Chou’s PseAAC [39] was proposed. Ever since it was introduced in 2001 [38], the concept of PseAAC has been rapidly penetrated into almost all the areas of computational proteomics, such as in identifying bacterial virulent proteins [40], predicting anticancer peptides [41], predicting protein subcellular location [42], predicting membrane protein types [43], analyzing genetic sequence [44], predicting GABA(A) receptor proteins [45], identifying antibacterial peptides [46], predicting anticancer peptides [41], identifying allergenic proteins [47], predicting metalloproteinase family [48], identifying GPCRs and their types [49], identifying protein quaternary structural attributes [50], among many others (see a long list of references cited in a 2014 article [51]). Recently, the concept of PseAAC was further extended to represent the feature vectors of DNA and nucleotides [9], as well as other biological samples (see, e.g., [52]). Because it has been widely and increasingly used, recently three types of powerful open access soft-ware, called ‘PseAAC-Builder’ [53], ‘propy’ [54], and ‘PseAAC-General’ [51], were established: the former two are for generating various modes of Chou’s special PseAAC; while the 3rd one for those of Chou’s general PseAAC. According to a comprehensive review [8], PseAAC can be generally formulated aswhere is the transpose operator, while an integer to reflect the vector’s dimension. The value of as well as the components in Eq. 6 will depend on how to extract the desired information from a protein/peptide sequence. Below, let us describe how to extract the useful information from the benchmark datasets to define the peptide samples via Eq. 6. For convenience in formulation, let rewrite Eq. 1 as followswhere , the residue at the center of the peptide, is tyrosine (Y), and all the other residues can be any of the 20 native amino acids or the dummy code X as defined above. Hereafter, let us use the numerical codes 1, 2, 3, …, 20 to represent the 20 native amino acids according to the alphabetic order of their single letter codes, and use 21 to represent the dummy amino acid X. Accordingly, the number of possible different dipeptides will be , and the number of dipeptide subsite positions on the sequence of Eq. 7 will be . Now, let us introduce a positive and a negative PSDP (position-specific dipeptide propensity) matrix, as given below where the elementand In Eq. 9, is the occurrence frequency of the dipeptide ( = 1,2,441) at the subsite on the sequence of Eq. 7 (or the column in the positive subset dataset ) that can be easily derived using the method described in [55] from the sequences in the Supporting Information S1; while is the corresponding occurrence frequency but derived from the negative subset dataset . Thus, for the peptide sequence of Eq. 7, its attribute to the positive set or negative set can be formulated by a -D (dimension) vector or , as defined by [23] where where and represent the residues in the and positions of the peptide concerned.

3. Discriminant Function Approach

Now in the 2-D space, let us define an ideal nitrotyrosine peptide [22] and an ideal non-nitrotyrosine peptide as expressed bywhere is the upper limit of the corresponding matrix element in Eq. 12a, and is the upper limit of the corresponding matrix element in Eq. 12b. Theoretically speaking, each of these hypothetical upper limits in Eq. 13 should be 1 [23]. Thus, the similarity score of with and that of with can be defined as Similar to the treatment in [23], let us define a discriminant function Δ given bywhere is the adjust parameter used to optimize the overall success rate when the positive and negative benchmark datasets are highly imbalanced in size. Now the peptide of Eq. 7 can be identified according to the following rule The predictor obtained via the above procedures is called iNitro-Tyr. How to properly and objectively evaluate the anticipated accuracy of a new predictor and how to make it easily accessible and user-friendly are the two key issues that will have important impacts on its application value [56]. Below, let us address these problems.

Results and Discussion

1. Metrics for Scoring Prediction Quality

In literature the following four metrics are often used to score the quality of a predictor at four different angleswhere TP represents the number of the true positive; TN, the number of the true negative; FP, the number of the false positive; FN, the number of the false negative; Sn, the sensitivity; Sp, the specificity; Acc, the accuracy; MCC, the Mathew’s correlation coefficient. To most biologists, unfortunately, the four metrics as formulated in Eq. 17 are not quite intuitive and easy-to-understand, particularly the equation for MCC. Here let us adopt the formulation proposed recently in [9], [11], [28] based on the symbols introduced by Chou [25], [55] in predicting signal peptides. According to the formulation, the same four metrics can be expressed aswhere is the total number of the nitrotyrosine peptides investigated while the number of the nitrotyrosine peptides incorrectly predicted as the non-nitrotyrosine peptides; the total number of the non-nitrotyrosine peptides investigated while the number of the non-nitrotyrosine peptides incorrectly predicted as the nitrotyrosine peptides [57]. Now, it is crystal clear from Eq. 18 that when meaning none of the nitrotyrosine peptides was incorrectly predicted to be a non-nitrotyrosine peptide, we have the sensitivity . When meaning that all the nitrotyrosine peptides were incorrectly predicted as the non-nitrotyrosine peptides, we have the sensitivity . Likewise, when meaning none of the non-nitrotyrosine peptides was incorrectly predicted to be the nitrotyrosine peptide, we have the specificity ; whereas meaning all the non-nitrotyrosine peptides were incorrectly predicted as the nitrotyrosine peptides, we have the specificity . When meaning that none of nitrotyrosine peptides in the positive dataset and none of the non- nitrotyrosine peptides in the negative dataset was incorrectly predicted, we have the overall accuracy and ; when and meaning that all the nitrotyrosine peptides in the positive dataset and all the non- nitrotyrosine peptides in the negative dataset were incorrectly predicted, we have the overall accuracy and ; whereas when and we have and meaning no better than random prediction. As we can see from the above discussion based on Eq. 18, the meanings of sensitivity, specificity, overall accuracy, and Mathew’s correlation coefficient have become much more intuitive and easier-to-understand. It is instructive to point out, however, the set of metrics in Eqs. 17–18 is valid only for the single-label systems. For the multi-label systems, such as those for the subcellular localization of multiplex proteins (see, e.g., [58]–[62]) where a protein may have two or more locations, and those for the functional types of antimicrobial peptides (see, e.g., [63] where a peptide may possess two or more functional types, a completely different set of metrics is needed as elaborated in [37].

2. Jackknife Cross-Validation

With a set of clear and valid metrics as defined in Eq. 18 to measure the quality of a predictor, the next thing we need to consider is how to objectively derive the values of these metrics for a predictor. In statistical prediction, the following three cross-validation methods are often used to calculate the metrics of Eq. 18 for evaluating the quality of a predictor: independent dataset test, subsampling test, and jackknife test [64]. However, of the three test methods, the jackknife test is deemed the least arbitrary that can always yield an unique result for a given benchmark dataset [65]. The reasons are as follows. (i) For the independent dataset test, although all the samples used to test the predictor are outside the training dataset used to train it so as to exclude the “memory” effect or bias, the way of how to select the independent samples to test the predictor could be quite arbitrary unless the number of independent samples is sufficiently large. This kind of arbitrariness might result in completely different conclusions. For instance, a predictor achieving a higher success rate than the other predictor for a given independent testing dataset might fail to keep so when tested by another independent testing dataset [64]. (ii) For the subsampling test, the concrete procedure usually used in literatures is the 5-fold, 7-fold or 10-fold cross-validation. The problem with this kind of subsampling test is that the number of possible selections in dividing a benchmark dataset is an astronomical figure even for a very simple dataset, as demonstrated by Eqs.28–30 in [8]. Therefore, in any actual subsampling cross-validation tests, only an extremely small fraction of the possible selections are taken into account. Since different selections will always lead to different results even for a same benchmark dataset and a same predictor, the subsampling test cannot avoid the arbitrariness either. A test method unable to yield an unique outcome cannot be deemed as a good one. (iii) In the jackknife test, all the samples in the benchmark dataset will be singled out one-by-one and tested by the predictor trained by the remaining samples. During the process of jackknifing, both the training dataset and testing dataset are actually open, and each sample will be in turn moved between the two. The jackknife test can exclude the “memory” effect. Also, the arbitrariness problem as mentioned above for the independent dataset test and subsampling test can be avoided because the outcome obtained by the jackknife cross-validation is always unique for a given benchmark dataset. Accordingly, the jackknife test has been increasingly used and widely recognized by investigators to examine the quality of various predictors (see, e.g., [33], [41], [43], [45]–[47], [66]–[72]). Accordingly, in this study we also used the jackknife cross-validation method to calculate the metrics in Eq. 18 although it would take more computational time.

3. Comparison with Other Methods

The jackknife test results by iNitro-Tyr on the benchmark dataset (cf. Supporting Information S1) for the four metrics defined in Eq. 18 are listed in , where for facilitating comparison, the corresponding results by GPS-YNO2 [27] with different thresholds are also given.

Table 1

Comparison of the new iNitro-Tyr predictor with the existing predictors in identifying the nitrotyrosine sites; the rates listed below were derived by the jackknife cross-validation on the 546 source proteins used in [27].

Predictor	Threshold	Acc (%)	MCC	Sn (%)	Sp (%)
GPS-YNO2a	High	82.57	0.1884	28.89	90.02
	Medium	79.60	0.2171	40.53	85.02
	Low	76.51	0.2335	50.09	90.18
iNitro-Tyrb		84.52	0.4905	81.76	85.89

As reported in [27], where , i.e., the length of the potential nitrotyrosine peptides considered is .

See Eqs. 15–16, where and , i.e., the length of the potential nitrotyrosine peptides considered is .

As reported in [27], where , i.e., the length of the potential nitrotyrosine peptides considered is . See Eqs. 15–16, where and , i.e., the length of the potential nitrotyrosine peptides considered is . From the table, we can see the following facts. (i) The overall accuracy by the current iNitro-Tyr predictor is , which is higher than the overall accuracy by GPS-YNO2 regardless what threshold is used for the latter. (ii) The Mathew’s correlation coefficient obtained by iNitro-Tyr is , which is significantly higher than that by GPS-YNO2, indicating that the new predictor is more stable and less noisy. (iii) The sensitivity and specificity obtained by iNitro-Tyr are and , which are much more evenly distributed than those by the GPS-YNO2 predictor. It is instructive to point out that, as shown by Eqs. 12a and b, the amino acid pairwise coupling effects [11] has been incorporated via the general form of PseAAC [8] to formulate the peptide samples. If, however, we just used the single amino acid specific position occurrence frequency to formulate the peptide samples, the corresponding prediction quality would drop down to and , clearly indicating that consideration of the amino acid pairwise coupling effects could significantly enhance the prediction quality, fully consistent with the reports by previous investigators [73], [74], where it was observed that the prediction of protein secondary structural contents had been remarkably improved by taking into account the amino acid pairwise coupling effects. Accordingly, compared with the best of existing predictors for identifying the nitrotyrosine sites in proteins, the new iNitro-Tyr predictor not only can yield higher or comparable accuracy, but is also much more stable and less noisy. It is anticipated that iNitro-Tyr may become a useful high throughput tool in this area, or at the very least play a complementary role to the existing predictors.

4. Web-Server and User Guide

For the convenience of most experimental scientists, we have established a web-server for the iNitro-Tyr predictor, with which users can easily get their desired results according to the steps below without the need to understand the mathematical equations in the method section.

Step 1

Open the web server at http://app.aporc.org/iNitro-Tyr/ and you will see the top page of the predictor on your computer screen, as shown in . Click on the Read Me button to see a brief introduction about iNitro-Tyr predictor and the caveat when using it.

Figure 6

A semi-screenshot to show the top page of the iNitro-Tyr srver.

Its website address is at http://app.aporc.org/iNitro-Tyr/.

A semi-screenshot to show the top page of the iNitro-Tyr srver.

Its website address is at http://app.aporc.org/iNitro-Tyr/.

Step 2

Either type or copy/paste the sequences of query proteins into the input box shown at the center of . All the input sequences should be in the FASTA format. A sequence in FASTA format consists of a single initial line beginning with the symbol “>” in the first column, followed by lines of sequence data in which amino acids are represented using single-letter codes. Except for the mandatory symbol “>”, all the other characters in the single initial line are optional and only used for the purpose of identification and description. The sequence ends if another line starting with the symbol “>” appears; this indicates the start of another sequence. Example sequences in FASTA format can be seen by clicking on the Example button right above the open box. Note that if your input protein sequences should be formed by the 20 native amino acid codes (ACDEFGHIKLMNPQRSTVWY).

Step 3

Click on the Submit button to see the predicted results. For example, if you use the two query protein sequences in the Example window as the input, after clicking the Submit button, you will see the following on your screen. (i) The 1st protein (P05181) contains 18 Y residues; of which only those located at the sequence position 71, 318, 349, 381, and 423 are of nitrotyrosine site, while all the others are of non-nitrotyrosine site. (ii) The 2nd protein (P03023) contains 8 Y residues; of which only those located at the sequence positions 7, 12, 17, and 47 belong to the nitrotyrosine site, while all the others belong to non-nitrotyrosine site. All these results are fully consistent with experimental observations except for one Y residue at the position 349 in the 1st protein (P05181) that is actually non-nitrotyrosine site but was overpredicted as nitrotyrosine site.

Step 4

As shown on the lower panel of , you may also submit your query proteins in an input file (with FASTA format) via the “Browse” button. To see the sample of input file, click on the Example button right under the input box.

Step 5

Click on the Data button to download the benchmark dataset used to train and test the iNitro-Tyr predictor.

Conclusions

As one of the important posttranslational modifications (PTMs), nitrotyrosine is a product occurring in proteins when their tyrosine (Tyr or Y) residue is nitrated. Since a remarkably increasing level of nitrotyrosine is detected for those patients who have suffered from rheumatoid arthritis, septic shock, and coeliac disease, knowledge of nitrotyrosine is very useful for developing drugs against these diseases. A new predictor was developed for identifying the nitrotyrosine sites in proteins based on a set of 19-tuple peptides generated as follows. Sliding a window of 19 amino acids along each of the 546 protein sequences taken from a protein database, collected were only those peptide segments with Y (tyrosine) at the center, i.e., the potential nitrotyrosine-site-containing peptides. The benchmark dataset thus obtained contains 1,044 experiment-confirmed nitrotyrosine peptides and 7,669 non-nitrotyrosine peptides. The new predictor is called iNitro-Tyr, in which each of the potential nitrotyrosine-site-containing peptides was formulated with a 18-D vector formed by incorporating the position-specific dipeptide propensity (PSDP) into the general form [8] of pseudo amino acid composition [38], [75] or Chou’s PseAAC [39], [51], [54]. It has been observed by the rigorous cross validations that the iNitro-Tyr not only yields higher success rates but also is more stable and less noisy as reflected by a set of four metrics generally used to measure the quality of a predictor from different angles. For the convenience of most experimental scientists, the web-server of iNitro-Tyr has been established at http://app.aporc.org/iNitro-Tyr/. Furthermore, to maximize their convenience, a step-by-step guide has been provided, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of the predictor. It has not escaped our notice that the current approach can also be used to develop various effective methods for identifying the sites of other PTM sites in proteins. The benchmark dataset used in this study contains 8,713 peptides formed by 19 amino acid residues with Y (tyrosine) at the center. Of these peptides, 1,044 are of nitrotyrosine and 7,669 of non-nitrotyrosine. Listed are also the codes of the source proteins from which these 19-tuple peptide sequences are derived as well as their corresponding sites in proteins. See the main text for further explanation. (DOC) Click here for additional data file.

74 in total

1. Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization.

Authors: Suyu Mei
Journal: J Theor Biol Date: 2011-10-21 Impact factor: 2.691

2. Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.

Authors: Kuo-Chen Chou; Hong-Bin Shen
Journal: J Proteome Res Date: 2006-08 Impact factor: 4.466

3. A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0.

Authors: Hong-Bin Shen; Kuo-Chen Chou
Journal: Anal Biochem Date: 2009-08-03 Impact factor: 3.365

4. A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins.

Authors: K C Chou
Journal: J Biol Chem Date: 1993-08-15 Impact factor: 5.157

5. SLLE for predicting membrane protein types.

Authors: Meng Wang; Jie Yang; Zhi-Jie Xu; Kuo-Chen Chou
Journal: J Theor Biol Date: 2005-01-07 Impact factor: 2.691

6. Targets of tyrosine nitration in diabetic rat retina.

Authors: Xianquan Zhan; Yunpeng Du; John S Crabb; Xiaorong Gu; Timothy S Kern; John W Crabb
Journal: Mol Cell Proteomics Date: 2007-12-28 Impact factor: 5.911

7. iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins.

Authors: Kuo-Chen Chou; Zhi-Cheng Wu; Xuan Xiao
Journal: PLoS One Date: 2011-03-30 Impact factor: 3.240

8. Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors: Kuo-Chen Chou
Journal: J Theor Biol Date: 2010-12-17 Impact factor: 2.691

9. Signal propagation in protein interaction network during colorectal cancer progression.

Authors: Yang Jiang; Tao Huang; Lei Chen; Yu-Fei Gao; Yudong Cai; Kuo-Chen Chou
Journal: Biomed Res Int Date: 2013-03-20 Impact factor: 3.411

10. iEzy-drug: a web server for identifying the interaction between enzymes and drugs in cellular networking.

Authors: Jian-Liang Min; Xuan Xiao; Kuo-Chen Chou
Journal: Biomed Res Int Date: 2013-11-26 Impact factor: 3.411

46 in total

1. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition.

Authors: Hao Lin; En-Ze Deng; Hui Ding; Wei Chen; Kuo-Chen Chou
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971

2. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

Authors: Guang-Hui Liu; Hong-Bin Shen; Dong-Jun Yu
Journal: J Membr Biol Date: 2015-11-12 Impact factor: 1.843

3. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

Authors: Muhammad Kabir; Maqsood Hayat
Journal: Mol Genet Genomics Date: 2015-08-30 Impact factor: 3.291

4. TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.

Authors: Xue He; Ke Han; Jun Hu; Hui Yan; Jing-Yu Yang; Hong-Bin Shen; Dong-Jun Yu
Journal: J Membr Biol Date: 2015-06-10 Impact factor: 1.843

5. iN6-methylat (5-step): identifying DNA N⁶-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule.

Authors: Nguyen Quoc Khanh Le
Journal: Mol Genet Genomics Date: 2019-05-04 Impact factor: 3.291

9. Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou's General Pseudo Amino Acid Composition.

Authors: Hong-Liang Zou; Xuan Xiao
Journal: J Membr Biol Date: 2016-04-25 Impact factor: 1.843

10. Consensus models for CDK5 inhibitors in silico and their application to inhibitor discovery.

Authors: Jiansong Fang; Ranyao Yang; Li Gao; Shengqian Yang; Xiaocong Pang; Chao Li; Yangyang He; Ai-Lin Liu; Guan-Hua Du
Journal: Mol Divers Date: 2014-12-16 Impact factor: 2.943