Literature DB >> 19111559

Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices.

Cristian Robert Munteanu¹, Alexandre L Magalhães, Eugenio Uriarte, Humberto González-Díaz.

Abstract

The cancer diagnostic is a complex process and, sometimes, the specific markers can interfere or produce negative results. Thus, new simple and fast theoretical models are required. One option is the complex network graphs theory that permits us to describe any real system, from the small molecules to the complex genetic, neural or social networks by transforming real properties in topological indices. This work converts the protein primary structure data in specific Randic's star networks topological indices using the new sequence to star networks (S2SNet) application. A set of 1054 proteins were selected from previous works and contains proteins related or not with two types of cancer, human breast cancer (HBC) and human colon cancer (HCC). The general discriminant analysis method generates an input-coded multi-target classification model with the training/predicting set accuracies of 90.0% for the forward stepwise model type. In addition, a protein subset was modified by single amino acid mutations with higher log-odds PAM250 values and tested with the new classification if can be related with HBC or HCC. In conclusion, we shown that, using simple input data such is the primary protein sequence and the simples linear analysis, it is possible to obtain accurate classification models that can predict if a new protein related with two types of cancer. These results promote the use of the S2SNet in clinical proteomics.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Neoplasm Proteins

Year: 2008 PMID： 19111559 PMCID： PMC7094125 DOI： 10.1016/j.jtbi.2008.11.017

Source DB: PubMed Journal: J Theor Biol ISSN： 0022-5193 Impact factor: 2.691

Introduction

Cancer is a leading cause of death worldwide, accounted for around 13% of all deaths in 2007 (WHO, 2008). Two of the leading types of cancer are the human breast cancer (HBC) and the human colon cancer (HCC). The estimated new cancer cases and deaths in US for 2008 shows that HBC will affect 26% of the women (15% will die) and HCC will involve 10% of the men/women (8% men and 9% women will die) (Jemal et al., 2008). Therefore, simple and fast theoretical method can be very useful in the detection of cancer diseases. The actual work will use the protein quantitative proteome-disease relationship (QPDR) (Ferino et al., 2008), similar to quantitative structure-activity relationship (QSAR) (Devillers and Balaban, 1999). QPDR is one of the widely used analyse for predicting the protein properties and, in the present study, is using the macromolecular descriptors, named topological indices (TIs), obtained with the graph theory. The branch of mathematical chemistry dedicated to encode the DNA/protein information in graph representations by the use of the TIs has become an intense research area (Agüero-Chapin et al., 2006; Bielinska-Waz et al., 2007; Liao and Wang, 2004; Liao and Ding, 2005; Randic, 2000; Randic and Basak, 2001; Randic and Balaban, 2003; Randic et al., 2000). The graphic approaches of the biological systems study can provide useful insights in QSAR studies (González-Díaz et al., 2006, González-Díaz et al., 2007c; Prado-Prado et al., 2008), protein folding kinetics (Chou, 1990), enzyme-catalyzed reactions (Chou, 1989; Chou and Forsen, 1980; Chou and Liu, 1981; Kuzmic et al., 1992), inhibition kinetics of processive nucleic acid polymerases and nucleases (Althaus et al., 1993a, Althaus et al., 1993b; Althaus et al., 1994, Althaus et al., 1996; Chou et al., 1994), DNA sequence analysis (Qi et al., 2007), anti-sense strands base frequencies (Chou et al., 1996), analysis of codon usage (Chou and Zhang, 1992; Zhang and Chou, 1994) and in complicated network systems investigations (Diao et al., 2007; Gonzalez-Diaz et al., 2008, González-Díaz et al., 2007a). Recently, the “cellular automaton image” (Wolfram, 1984, Wolfram, 2002) has also been applied to study hepatitis B viral infections (Xiao et al., 2006a), HBV virus gene missense mutation (Xiao et al., 2005b), and visual analysis of SARS–CoV (Gao et al., 2006; Wang et al., 2005), as well as representing complicated biological sequences (Xiao et al., 2005a) and helping to identify protein attributes (Xiao and Chou, 2007; Xiao et al., 2006b). We have chosen the TIs for these QPDR models based on the previous work results with similar QSAR/QPDR models. Even if the TIs cannot be always interpreted, they demonstrate to encode the information that permits to create accurate QSAR/QPDR models. Other interesting fields to apply the graph theory are the oncology and clinical proteomics. A classification model for discriminating prostate cancer patients from control group with connectivity indices where constructed by González-Díaz et al. (2007b). Vilar's group designed a QSAR model for alignment-free prediction of HBC biomarkers based on electrostatic potentials of protein pseudofolding HP-lattice networks (Vilar et al., 2008). The actual work is proposing a new cancer/non-cancer classification model based on protein embedded/non-embedded star graph TIs such are the trace of connectivity matrices, Harary number, Wiener index, Gutman index, Schultz index, Moreau–Broto indices, Balaban distance connectivity index, Kier–Hall connectivity indices and Randic connectivity index. This classification can predict two types of cancer: HBC and HCC. The primary protein sequence is transformed in connectivity star graph's TIs that are used by a statistical linear method in order to construct an input-coded multi-target classification model.

Materials and methods

Protein set

Two sets of protein primary sequences are used: a set of 189 HBC/HCC cancer proteins (Sjoblom et al., 2006) and 865 non-cancer proteins (Dobson and Doig, 2005; Dobson et al., 2004). The list of cancer-related proteins in our work is the same with the list obtained by the Sjoblom group after the experimental analysis of 13,023 genes in 11 breast and 11 colorectal cancers.

Star graph TIs

Each protein sequence was transformed in a star graph, where the amino acids are the vertices (nodes), connected in a specific sequence by the peptide bonds. The star graph is a special case of trees with N vertices where one has got N−1 degrees of freedom and the remaining N−1 vertices have got one single degree of freedom (Harary, 1969). Each of the 20 possible branches (“rays”) of the star contains the same amino acid type and the star centre is a non-amino acid vertex. A protein can be represented by diverse forms of graphs, which can be associated with distinct distance matrices. The best method to construct a standard star graph is the following: each amino acid/vertex holds the position in the original sequence and the branches are labelled by alphabetical order of the 3-letter amino acid code (Randic et al., 2007). The graph is embedded if the initial sequence connectivity in the protein chain is included. Figs. 1 A and B present the non-embedded/embedded star graphs of PRPS1 using the alphabetical order of one-letter amino acid code. Thus, the primary structure of protein chains are transformed in the correspondent Star graphs invariant TIs. The resulted graphs are not depending on the three-dimensional structure or the shape of the protein.

Fig. 1

(A) The non-embedded star graphs for PRPS1 and (B) the embedded star graphs for PRPS1.

(A) The non-embedded star graphs for PRPS1 and (B) the embedded star graphs for PRPS1. The comparison of the graphs is made by using the corresponding connectivity matrix, distance matrix and degree matrix. The matrices of the connectivity in the sequence and in the star graph are combined in the case of the embedded graph. These matrices and the normalized ones are the base of the TIs calculation. The conversion of the amino acid sequences in star graph TIs was made by using sequence to star networks (S2SNet) application, developed by our group (Munteanu and Gonzáles-Diáz, 2008). S2SNet is based on wxPython (Rappin and Dunn, 2006) for the GUI application and has Graphviz (Koutsofios and North, 1993) as a graphics back-end. The present calculations are characterized by embedded and non-embedded TIs, no weights, Markov normalization and power of matrices/indices (n) up to 5. The results file contains the following TIs (Todeschini and Consonni, 2002):These TIs and other derivate ones will be used in the next step to construct a cancer/non-cancer classification model by linear statistical methods. Trace of the n connectivity matrices (tr) or the spectral moments: where n=0–power limit, M=graph connectivity matrix (i∗i dimension); ii=ith diagonal element; Harary number (H) or the reciprocal distance sum index: where d are the elements of the distance matrix and m are the elements of the M connectivity matrix; Wiener index (W) or the sum of the numbers of edges in the shortest paths in a graph between all pairs of amino acids in a protein: Gutman TI (S 6): where deg are the elements of the degree matrix; Schultz TI (non-trivial part) (S): Balaban distance connectivity index (J) or average distance sum connectivity index (measures the graph ramification): where nodes+1=AA numbers/node number in the star graph+origin, Σ is the node distance degree; Kier–Hall connectivity indices ( X): Randic connectivity index ( X):

Statistical analysis

An input-coded multi-target classification model was created with general discriminant analysis (GDA) method (Kowalski and Wold, 1982; Van Waterbeemd, 1995), STATISTICA 6.0 package (StatSoft.Inc., 2002). This model can predict if a protein is HBC or HCC-related using a single equation. For this reason, in addition to the 30 star graph embedded and non-embedded TIs are introduced other two types of continuous predictors (attributes) encoded specific information about each cancer types as following: 30 products of the HBC/HCC cancer probability with the embedded/non-embedded TIs (pTI=probHBC/HCC*TI) and 30 differences between the same TIs and the average of the TIs for each type of cancer [dTI=TI–average(TI)HBC/HCC]. The cancer probabilities represent the fractions of proteins HBC/HCC-related from the entire Sjöblom's proteins (cancer proteins) and have values of 0.639 (HBC) and 0.361 (HCC). For each protein there are two cases corresponding to both types of cancer. The dependent variable (CancerOrNot) takes 1 for cancer and 0 for non-cancer and the cross-validation (CV) variable has two values (train and val). The best CV methods to examine a predictor are the following: independent dataset test, subsampling test, and jackknife test (Chou and Zhang, 1995). Chou and Shen, 2007, Chou and Shen, 2008 have shown that only the jackknife test has the least arbitrariness . Thus, the jackknife test has been increasingly used by investigators to examine the accuracy of various predictors (Chen and Li, 2007a, Chen and Li, 2007b; Diao et al., 2007; Ding et al., 2007; Jiang et al., 2008; Li and Li, 2008; Lin, 2008; Niu et al., 2006; Xiao and Chou, 2007; Zhang et al., 2008; Zhou et al., 2007). In the actual work, the independent data test is used by splitting the data at random in a training series (train, 75%) used for model construction and a prediction one (val, 25%) for model validation (the CV column is filled by repeating 6 train and 2 val). All independent variables are standardized prior to model construction. The general QPDR formula contains embedded and non-embedded TIs, pTIs and dTIs:where C/nC−score is the continue score value for the cancer/non-cancer classification (HBC or HCC), c 1–c are the TIs coefficients (n=number of TIs), c–c pTIs coefficients (n In addition, the Dobson's set is use to select a subset of 61 non-cancer proteins with cancer probability between 0.3 and 0.5 in order to proceed 17 single amino acid mutations with log-odds PAM250 (Dayhoff, 1978) greater or equal with 2 (see Table 1 ). The best classification model predicted the probability of presence in HBC/HCC cancer for any of these mutated proteins and the results were analysed with two-way joining clustering analysis method (tw-JCA) from STATISTICA (StatSoft.Inc., 2002).

Table 1

Single amino acid mutations and the corresponding log-odd PAM250 value.

Original AA	Mutated AA	log-odd PAM250	Notation
D	N	2	D→N/2DN
E	Q	2	E→Q/2EQ
F	L	2	F→L /2FL
H	N	2	H→N/2HN
H	R	2	H→R/2HR
L	I	2	L→I/2LI
M	I	2	M→I/2MI
Q	D	2	Q→D/2QD
V	L	2	V→L/2VL
V	M	2	V→M/2VM
W	R	2	W→R/2WR
E	D	3	E→D/3ED
H	Q	3	H→Q/3HQ
K	R	3	K→R/3KR
M	L	4	M→L/4ML
V	I	4	V→I/4VI
Y	F	7	Y→F/7YF

Single amino acid mutations and the corresponding log-odd PAM250 value.

Results and discussions

Fifteen classification models were tested with the aim of finding the best GDA equation which is able to discriminate between proteins related with HBC and HCC. The attributes include 30 embedded/non-embedded star graph TIs obtained with S2SNet application and other 60 composed predictors, pTIs and dTIs. The values obtained for the training/predicting accuracies with the forward stepwise method are presented in Table 2 .

Table 2

Training/predicting accuracies of Cancer (C)/non-cancer (nC) models using embedded (E) and non-embedded (nE) star graph TIs, pTIs and dTIs.

Star graph type	Attributes	Train			Cross-validation			Total			Eq. vars.
		nC (%)	C (%)	Total (%)	nC (%)	C (%)	Total (%)	nC (%)	C (%)	Total (%)
nE	pTI	90.4	69.4	88.5	91.4	66.0	89.1	90.7	68.6	88.7	4
	TI, pTI	90.4	68.1	88.3	90.8	66.0	88.5	90.5	67.5	88.4	5
	dTI	86.0	79.9	85.4	87.0	74.5	85.9	86.2	78.5	85.5	2
	TI, dTI	88.1	74.3	86.9	88.9	72.3	87.4	88.3	73.8	87.0	4
	TI, pTI, dTI	91.1	66.0	88.8	91.8	61.7	89.1	91.3	64.9	88.9	6

E	pTIe	92.3	70.1	90.3	93.1	70.2	91.0	92.5	70.2	90.5	6
	TIe, pTIe	92.7	69.4	90.6	93.3	70.2	91.2	92.8	69.6	90.7	6
	dTIe	88.1	78.5	87.3	88.3	76.6	87.2	88.2	78.0	87.3	4
	TIe, dTIe	91.4	75.7	89.9	91.8	74.5	90.3	91.5	75.4	90.0	5
	TIe, pTIe, dTIe	93.1	68.1	90.8	93.3	66.0	90.8	93.1	67.5	90.8	8

nE and E	pTI, pTIe	90.2	70.1	88.4	91.2	68.1	89.1	90.5	69.6	88.6	4
	TI, TIe, pTI, pTIe	92.3	68.8	90.1	92.0	66.0	89.7	92.2	68.1	90.0	8
	dTI, dTIe	90.3	78.5	89.2	90.6	76.6	89.3	90.4	78.0	89.2	6
	TI, TIe, dTI, dTIe	90.9	72.9	89.3	91.4	72.3	89.7	91.1	72.8	89.4	7
	TI, TIe, pTI, pTIe, dTI, dTIe	92.3	68.8	90.1	92.2	70.2	90.3	92.3	69.1	90.2	8

Training/predicting accuracies of Cancer (C)/non-cancer (nC) models using embedded (E) and non-embedded (nE) star graph TIs, pTIs and dTIs. The forward stepwise selection variable method conjugated with the embedded TIs and dTIs provides the best results for our data set with values of correctly classified compounds of 89.9%, 90.3% and 90.0% for the training, CV and full sets, respectively, and using only six/five parameters/variables (Eq. (14)). The embedded TIs have the name of the non-embedded ones plus “e” as suffix. The simple linear mathematical form of the model has been chosen in the absence of prior information. where N is the number of cases (C and nC), R is the canonical regression coefficient, U is the Wilk's statistics, F is the Fisher's statistics and p is the p-level (probability of error). The above results are typically considered as excellent in the literature for LDA–QPDR/QSAR models (Castillo-Garit et al., 2008; Estrada and Molina, 2001; Marrero-Ponce et al., 2004; Morales et al., 2006; Vilar et al., 2008). In order to check the variation of this model with the training/CV sets, we carried on a CV study by using ten totally random sets, including the initial one from the actual model (with the same 75% training and 25% CV). The classification values are presented in Table S1 from the supplementary material and show an average of 90.2% for training and 89.2% for CV. These values demonstrate the stability of the model with the selection of the classification sets. In order to illustrate the performance of the approach when applied to a single set of cancer related proteins (e.g. either breast or colon), we obtained two equations, one for HBC and other for HCC. Therefore, we have to consider that the Eq. (14) represents an input-coded multi-target classification model that can evaluate if a protein is HBC or HCC-related by using the HBC or HCC average J e and X2e values (contained in the d J e and d X2 e differences). Eq. (14) can be reduced to two different equations, one for each type of cancer (HBC and HCC): The detailed classification results for each type of cancer obtained with Eqs. (14a), (14b) are presented in Table 3 .

Table 3

Accuracy of input-coded multi-target and individual HBC and HCC classification models based on the embedded TIs (TIe+dTIe and pTIe).

Eq.	Cancer	Correct	Incorrect	Accuracy (%)
TIe, dTIe
14	Both	307	1795	90.0
14a	HBC	168	880	91.8
14b	HCC	139	915	88.2

pTIe
15	Both	277	1825	90.5
15a	HBC	170	878	91.8
15b	HCC	107	947	89.2

Accuracy of input-coded multi-target and individual HBC and HCC classification models based on the embedded TIs (TIe+dTIe and pTIe). A similar input-coded multi-target classification model was obtained by using the forward stepwise method and the embedded pTIs and provides values of correctly classified compounds of 90.3, 91.0 and 90.5 for the training, CV and full sets, respectively (using seven/six parameters/variables) (Eq. (15)). In order to evaluate if a protein is HBC or HCC-related, it is necessary to use the HBC or HCC probability inside the pTIs products. The classification values obtained for the individual equations are presented in Table 3. The equations obtained are the following: Eqs. (14), (15) show similar results when the input data is containing probability of cancer (products with TIs) or the TIs averages for each type of cancer (differences with TIs). In general, in the case of embedded, non-embedded and both indices, we obtained better results with dTIs compared with the pTIs (not mixed with the original TIs). This difference can be explained by a superior recover of the cancer-related protein sequence information in the case of the differences between the original TIs and the average of them for each type of cancer (dTIs) compared with the products of the original TIs and the cancer type probability (pTIs). Thus, we can conclude that the average of star graph structure for each type of cancer (dTIs) is described better the actual QPDR model compared with the composition of the data sets for each type of cancer that generates the cancer probabilities. In addition, Table 2 shows that better results are obtained using the original TIs and the derived ones (pTIs and dTIs) compared with the isolated TIs/pTIs/dTIs. This difference can be explained be the fact that each set of indices can contains different parts of the protein information that is cancer-related. Therefore, the use of all these indices will sum all this information in a better QPDR model. Another interesting aspect is the type of the indices (original or derived from the original) that are more frequent in all models presented in Table S2 from supplementary material. Thus, we can observe the importance of the Wiener index (W) and Kier–Hall connectivity index X5 for the models based on the non-embedded TIs. The embedded TIs models contain more frequent the trace of the graph/sequence connectivity matrixes tr 3 and the non-trivial part of the Schultz TI S (W is based on the distance matrix, X5 and S on the degree matrix, and tr 3 on the connectivity matrix). The most important type of index that is present in both embedded and non-embedded TI equations is J, the Balaban distance connectivity index based on the node distance degree information. In order to compare two equations with the same number of TIs, we have chosen the embedded models with pTIe and {TIe, pTIe} that contain six variables and reduced the common terms (based on tr 3, S and J). Thus, we can observe that the addition of the TIe to the pTIe will shift the preference from the low order traces (p tr 0e, p tr 2e) and Kier–Hall index (p X2e) to high order trace (tr 5e), Harary number (He) and Gutman TI (S6e). The first embedded TIs & dTIs model was chosen to estimate the cancer probability for proteins mutants of non-cancer-related proteins. These values were analysed with tw-JCA using 61 mutated proteins and 17 types of single amino acid mutations. In the case of HBC, we obtained 215 data groups, called input blocks. To detect the larger variability regions (mutants) we computed a tw-JCA partition of input blocks (rearrange of blocks) setting the threshold value of variability at StDv/2 (see Fig. 2 ). The value obtained was 0.059. The 215 input blocks are regrouped, for similarity, into 11 output blocks (see Tables S3 and S4 in the supplementary material). We can observe that the proteins with number 24 to number 48 are very susceptible to become HBC-related proteins for all studied mutations. The plot corresponding to the reduced values of the reordered data matrix (Table S4) is presented in Fig. 3 . On the other hand, we carried out the same study for the HCC mutated proteins and found different susceptible proteins, with visible lower probability to be HCC-related (Fig. 4 ). The 184 input blocks were regrouped, for similarity, into 11 output blocks (StDv/2=0.050) (see Tables S5 and S6 in the supplementary material). The reduced data from Table S6 are presented as a plot in Fig. 5 . The tw-JCA partition obtained in this way is statistically significant as reported by other authors that used this method to reach similar goals (Ferino et al., 2008).

Fig. 2

Graphical representation of two-way joining cluster analysis of the HBC probability after the mutations.

Fig. 3

Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HBC probability after the mutations.

Fig. 4

Graphical representation of two-way joining cluster analysis of the HCC probability after the mutations.

Fig. 5

Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HCC probability after the mutations.

Graphical representation of two-way joining cluster analysis of the HBC probability after the mutations. Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HBC probability after the mutations. Graphical representation of two-way joining cluster analysis of the HCC probability after the mutations. Graphical representation of reduced values of the reordered data matrix by tw–JCA method for HCC probability after the mutations. One interesting non-cancer chain protein is 1QRK B, the human coagulation factor XIII with strontium bound in the ion site (Fox et al., 1999), with eight single amino acid mutations that present HBC probability up to 71% as following: 70.8% for V→L, 68.8% for V→I, 62.0% for L→I, 59.3% for D→N, 58.3% for E→Q, 55.9% for F→L, 54.8% for E→D and 51.0% for V→M. The most persistent mutation (log-odd PAM250=4), valine (V) to isoleucine (M), can be considered as the most dangerous one. The main calcium/strontium binding site within each monomer involves the main chain oxygen atom of Ala-457, and also the side chains from residues Asn-436, Asp-438, Glu-485, and Glu-490. The mutations of Glu (E) in Q and D can affect the capacity of binding metals and the normal biological activity. This coagulation factor XIII is a transglutaminase which stabilizes blood clots by covalently cross-linking fibrin, being essential for normal haemostasis. FXIII deficiency due to the genetic mutations results in a life-long bleeding disorder with added complications in wound healing and tissue repair (Anwar et al., 1998). In addition, the abundant fibrinogen present in the tumor connective tissue might contribute to the structural integrity of breast or colon tumor tissues (Costantini et al., 1991; Takahashi et al., 2000; Yee et al., 1994). We can observe that, in general, the natural mutations with higher PAM250 values are less frequent even for 1QRK B (Y→F with PAM250 of 7 is absent) because we cannot create a direct relation between the PAM250 natural amino acid mutation frequency and the influence of the mutations in these types of cancer. The probability for a cancer-related protein to turn into a non-cancer one was studied too. For each type of cancer, ten HBC/HCC-related proteins where mutated using the same PAM250 values. The tw-JCA plots are presented in Fig. 6 (for HBC) and Fig. 7 (for HCC), and correspond to data in Tables S7 and S8 from the supplementary material. The results did not show important probability to obtain a HBC/HCC-related protein by using single PAM250 natural mutations. Activin beta E (INHBE, C_5) has the highest probability (around 50%) to turn into a HBC-related protein after almost all the mutations (Fig. 6 and Table S7).

Fig. 6

Graphical representation of two-way joining cluster analysis of the probability of the mutated HBC-related proteins to turn into non-cancer proteins.

Fig. 7

Graphical representation of two-way joining cluster analysis of the probability of the mutated HCC-related proteins to turn into non-cancer proteins.

Graphical representation of two-way joining cluster analysis of the probability of the mutated HBC-related proteins to turn into non-cancer proteins. Graphical representation of two-way joining cluster analysis of the probability of the mutated HCC-related proteins to turn into non-cancer proteins.

Conclusions

This study is proposing two cancer/non-cancer input-coded multi-target classification models for HBC and HCC using the star network TIs of the protein amino acid sequences. The results prove the excellent predictive ability (90.0%) of the simple and fast star network TIs and GDA statistics linear models in the case of the actual protein model. In addition, the prediction of cancer probability for mutated proteins was calculated. The human coagulation factor XIII (1QRK B), that normally do not generate HBC, if suffer several mutations, can become a HBC-related protein. This work can help in oncology proteomics or serve as a model for other studies. In addition, S2SNet application is demonstrating his capacity to transform simple protein sequences in TIs and to be the base of numerous protein studies.

59 in total

1. Predicting enzyme class from protein structure without alignments.

Authors: Paul D Dobson; Andrew J Doig
Journal: J Mol Biol Date: 2005-01-07 Impact factor: 5.469

2. Mixtures of tight-binding enzyme inhibitors. Kinetic analysis by a recursive rate equation.

Authors: P Kuzmic; K Y Ng; T D Heath
Journal: Anal Biochem Date: 1992-01 Impact factor: 3.365

3. A radial-distribution-function approach for predicting rodent carcinogenicity.

Authors: Aliuska Helguera Morales; Miguel Angel Cabrera Pérez; Maykel Pérez González
Journal: J Mol Model Date: 2006-01-19 Impact factor: 1.810

4. Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L.

Authors: Guillermín Agüero-Chapin; Humberto González-Díaz; Reinaldo Molina; Javier Varona-Santos; Eugenio Uriarte; Yenny González-Díaz
Journal: FEBS Lett Date: 2006-01-04 Impact factor: 4.124

5. Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach.

Authors: F-M Li; Q-Z Li
Journal: Amino Acids Date: 2007-05-21 Impact factor: 3.520

6. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

Authors: Kuo-Chen Chou; Hong-Bin Shen
Journal: Nat Protoc Date: 2008 Impact factor: 13.491

Review 7. Recent progress in protein subcellular location prediction.

Authors: Kuo-Chen Chou; Hong-Bin Shen
Journal: Anal Biochem Date: 2007-07-12 Impact factor: 3.365

8. Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds.

Authors: Francisco J Prado-Prado; Humberto González-Díaz; Octavio Martinez de la Vega; Florencio M Ubeira; Kuo-Chen Chou
Journal: Bioorg Med Chem Date: 2008-04-29 Impact factor: 3.641

9. Graphical rules for enzyme-catalysed rate laws.

Authors: K C Chou; S Forsén
Journal: Biochem J Date: 1980-06-01 Impact factor: 3.857

10. Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition.

Authors: Ying-Li Chen; Qian-Zhong Li
Journal: J Theor Biol Date: 2007-05-18 Impact factor: 2.691

10 in total

1. Fragment-based optimization of small molecule CXCL12 inhibitors for antagonizing the CXCL12/CXCR4 interaction.

Authors: Joshua J Ziarek; Yan Liu; Emmanuel Smith; Guolin Zhang; Francis C Peterson; Jun Chen; Yongping Yu; Yu Chen; Brian F Volkman; Rongshi Li
Journal: Curr Top Med Chem Date: 2012 Impact factor: 3.295

2. Expression levels of miR-205 and miR-506 in colon cancer tissues and their relationships with clinicopathological features.

Authors: Nan Jing; Lili Yin; Junxi Sun; Zhongzheng Cao; Weizheng Mao
Journal: Oncol Lett Date: 2018-07-13 Impact factor: 2.967

3. Analysis of protein pathway networks using hybrid properties.

Authors: Lei Chen; Tao Huang; Xiao-He Shi; Yu-Dong Cai; Kuo-Chen Chou
Journal: Molecules Date: 2010-11-12 Impact factor: 4.411

4. Machine learning techniques for single nucleotide polymorphism--disease classification models in schizophrenia.

Authors: Vanessa Aguiar-Pulido; José A Seoane; Juan R Rabuñal; Julián Dorado; Alejandro Pazos; Cristian R Munteanu
Journal: Molecules Date: 2010-07-12 Impact factor: 4.411

5. Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks.

Authors: Andrés López-Cortés; Alejandro Cabrera-Andrade; José M Vázquez-Naya; Alejandro Pazos; Humberto Gonzáles-Díaz; César Paz-Y-Miño; Santiago Guerrero; Yunierkis Pérez-Castillo; Eduardo Tejera; Cristian R Munteanu
Journal: Sci Rep Date: 2020-05-22 Impact factor: 4.379

Review 6. Graphical representation and mathematical characterization of protein sequences and applications to viral proteins.

Authors: Ambarnil Ghosh; Ashesh Nandy
Journal: Adv Protein Chem Struct Biol Date: 2011 Impact factor: 3.507

7. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism.

Authors: Guo-Ping Zhou
Journal: J Theor Biol Date: 2011-06-22 Impact factor: 2.691

8. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

Authors: Guosen Xie; Zhongxi Mo
Journal: J Theor Biol Date: 2010-10-20 Impact factor: 2.691

9. PTML Modeling for Pancreatic Cancer Research: In Silico Design of Simultaneous Multi-Protein and Multi-Cell Inhibitors.

Authors: Valeria V Kleandrova; Alejandro Speck-Planche
Journal: Biomedicines Date: 2022-02-18

10. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection.

Authors: Jose Liñares Blanco; Ana B Porto-Pazos; Alejandro Pazos; Carlos Fernandez-Lozano
Journal: Sci Rep Date: 2018-10-24 Impact factor: 4.379

10 in total