Literature DB >> 22615684

A QSAR study of some cyclobutenediones as CCR1 antagonists by artificial neural networks based on principal component analysis.

M Shahlaei¹, A Fassihi, L Saghaie, E Arkan, A Pourhossein.

Abstract

BACKGROUND AND PURPOSE OF THE STUDY: A quantitative structure activity relationship (QSAR) model based on artificial neural networks (ANN) was developed to study the activities of 29 derivatives of 3-amino-4-(2-(2-(4-benzylpiperazin-1-yl)-2-oxoethoxy) phenylamino) cyclobutenedione as C-C chemokine receptor type 1(CCR1) inhibitors.
METHODS: A feed-forward ANN with error back-propagation learning algorithm was used for model building which was achieved by optimizing initial learning rate, learning momentum, epoch and the number of hidden neurons.
RESULTS: Good results were obtained with a Root Mean Square Error (RMSE) and correlation coefficients (R(2)) of 0.189 and 0.906 for the training and 0.103 and 0.932 prediction sets, respectively.
CONCLUSION: The results reflect a nonlinear relationship between the Principal components obtained from calculated molecular descriptors and the inhibitory activities of the investigated molecules.

Entities: Chemical Disease Gene Species

Keywords: PCA; Quantitative Structure Activity Relationship; feed-forward ANN; inhibitory activity

Year: 2011 PMID： 22615684 PMCID： PMC3304395

Source DB: PubMed Journal: Daru ISSN： 1560-8115 Impact factor: 3.117

INTRODUCTION

The chemokine proteins are a class of small molecules that play a significant role in leukocyte trafficking during immune response (1). CCR1, as one of the chemokine receptors, is expressed on a number of human cells such as monocytes, macrophages, dendritic cells, and T cells (2).A large number of studies have provided strong evidences for a significant role of the chemokines, RANTES (Regulated upon Activation, Normal T cell Expressed and presumably Secreted), and MIP-1a (Macrophage inflammatory protein-1) in chronic inflammatory diseases. Because MIP-1a and RANTES are agonists for CCR1, antagonists for this protein may be helpful in treatment of these diseases (3). QSAR has become a very well known discipline in the drug discovery researches (4–6). The basis of such relationships is the assumption that the variation of bioactivity of molecules, as expressed by pIC50, can be regressed with changes in molecular descriptors. Development of QSAR involves selection of most informative independent variables to describe different sets of molecules and the application of various algorithms, such as multiple linear regression or ANN to construct the QSAR model. The advantage of ANN is it inherent power to reveal non-linear relationships between idependent and dependent variables in the derivation of the QSAR models. A quantitative structure activity relationship (QSAR) model based on artificial neural networks (ANN) was developed to study the activities of 29 derivatives of 3-amino-4-(2-(2-(4-benzylpiperazin-1-yl)-2-oxoethoxy) phenylamino) cyclobutenedione as C-C chemokine receptor type 1(CCR1) inhibitors (7).

METHODS

Biological and chemical data from 29 derivatives of 4-3-amino-4-(2-(2-(4-benzylpiperazin-1-yl)-2-oxoethoxy)phenylamino)cyclobutenedione were used in this study (Table 1) (7). All the structures were drawn and optimized using the semiempirical quantum-chemical routine of AM1 implemented in Hyperchem (8).

Table 1a

The structures and biological activities of compounds

Compound	R₁	Observed pIC₅₀^a	Predicted pIC₅₀
1^b		7.585	7.802
2		6.455	6.321
3		7.107	7.125
4		6.568	6.628
5		7.013	7.117
6		5.630	5.739
^b7		5.536	5.234
8		6.657	6.661

The structures and biological activities of compounds The structures and biological activities of compounds 9–13 The structures and biological activities of compounds 14–27 The structures and biological activities of compounds 28–29 pIC50=−log(IC50) Compounds selected as test set In order to model the biological activities of the studied compounds, four classes of descriptors were calculated: constitutional, geometrical, topological, and functional group using Dragon (9) on the minimal energy conformations. The data set was devided into the training and the testing sets based on Kennard and Stone algorithm (10). Principal component analysis was used to compress a pool of descriptors into principal components (PCs) as new variables. After that, a model using a nonlinear regression model, artificial neural network, was constructed to make a relationship between PCs and the pIC50. A feed-forward ANN with error back-propagation learning algorithm was applied for model building.

RESULT AND DISCUSSION

Results

PCA was performed on the data set that gives 14 significant PCs (% variance explaind>1). Fourteen PCs with their eigenvalues are shown in the table 2. Therefore, the next steps of study were restricted to these 14 PCs. Plotting of first PC vs. second PC showed none of the compounds is outlier (Fig. 1). Clarification of the theory of the artificial neural networks in details has been adequately described elsewhere (11) and some relevant remarks is presented.

Figure 1

The first two components (PC1, and PC2) from the principal component analysis of the 29 studied molecules.

The first two components (PC1, and PC2) from the principal component analysis of the 29 studied molecules. The result of of principal component analysis on the total descriptors. Back propagation artificial neural network includes three layers. The first layer namely input layer has NI neurons, and function of this layer is reception of information (i.e. inputs) which transfers them to all neurons in the next layer called the hidden layer that their number are indicated by NH. The neurons in the hidden layer calculate a weighted sum of the inputs that is subsequently transformed by a linear or non-linear function. The last layer is the output layer and its neurons handle the output from the network and it calculate response vector. The function of synapses is connection of input layer to hidden layer and hidden layer to output layer. The manner in which each node transforms its input depends on the “weights” and bias of the node, which are modifiable. BA-ANN network was trained with the training set of molecules using a back propagation algorithm followed by conjugate gradient descent in the second stage (4–7). RMSECV was then employed as tool to select optimum value of various parameters (6). Figure 2 shows the effect of the different number of PCs on predictability of developed model. On the based of this figure, the ANN has the highest degree of predictability when the number of PCs is 4. Optimization of the number of principal components (PCs) to enter the ANN. The parameters of network which should be optimized are learning rate, number of neuron in hidden layer, momentum.The optimal values for these parameters as it is shown in figures 3A-C are 1.2, 9, and 0.9, respectively.

Figure 3

Optimization of number of neurons in hidden layer (A), momentum (B), and, Learning rate (C).

Optimization of number of neurons in hidden layer (A), momentum (B), and, Learning rate (C). The predicted activity of the ANN calculated values of pIC50 versus the experimental values are shown in figure 4 and reported in table 1. As it was expected, the calculated values are in good agreement with the experimental values.

Figure 4

Calculated vs. experimental activity of the investigated compounds in training and test sets…

Calculated vs. experimental activity of the investigated compounds in training and test sets… Various statistical criteria for ANN model were calculated and reported in table 3 (5–8). The external predictability of a proposed model was generally tested using test sets and . The satisfactory prediction of the values of the inhibitory activity of the test set compounds demonstrates the efficacy of the QSAR in predicting the activities of external molecules.

Table 3

Various statistical parameter for developed PC- ANN model

	R²^a	RMSE^b	PRESS^c	R2-R02/R2	R2-R0'2/R2	K^d	k^'f	Rm2
Training set	0.906	0.189	0.752	−0.102	−0.102	1.001	0.997	0.630
Test Set	0.932	0.103	0.388	−0.072	−0.070	1.003	0.994	0.690

R 2=Square Regression coefficient

RMSE=Root mean square error

PRESS=predicted error sum of square for training set

Various statistical parameter for developed PC- ANN model R 2=Square Regression coefficient RMSE=Root mean square error PRESS=predicted error sum of square for training set

Discussion

In the developed model, a network including a fully connected three layer, feed-forward ANN model trained with a back-propagation learning algorithm was used. The input of the network was the eigenvalue ranked PCs, the number of them which entered neural network varied from 1 to 14, of which 4 PCs of them were selected as input of networks. By using this number of PCs the best results on the basis of lowest RMSECV in the output of network were obtained. Since there are no exact theoretical principles for choosing the appropriate network topology, before the training of the network, the adjustable parameters as number of nodes in the hidden layer, transfer function, learning rate and etc. were optimized. The values resulting from hidden layer are transferred to the last layer, which contains a single neuron representing the predicted activity. For output layer a linear transfer function was chosen. Various ANN architectures were run with the four selected PCs as input. In each run, the neuron architecture and parameters were optimized to reach the lowest RMSECV as the performances of the resulted models. According to the criteria proposed by Tropsha and Roy (4–6), for testing the reliability and the robustness of QSAR models, the obtained model is very predictive (Table 3). As a final point, one could dispute that what does the developed model mean to medicinal chemists? As discussed above, the calculated PCs have meaning physicochemically, but they may be employed for building statistical models which help the medicinal chemist limit the number of compounds to be synthesized. For instance, medicinal chemist can propose a training set comprised of molecules which have the characters of two or more chemical classes with the smallest amount of similarity. Then the model can be used to predict the activity of his proposed molecules. Therefore, the QSAR model was used to estimate inhibitory activities of a few suggested compounds. The general structures of four suggested compounds and also their calculated activities are reported in table 4. The suggested compounds are combination of the most potent compounds of table 1. The relative high predicted activity of the tested compounds suggest further study such as synthesis of other compounds with such chemical structures.

Table 4

Structures and details of the proposed molecules as novel CCR15 inhibitors.

Compound	R	Predicted pIC₅₀
S1		8.112
S2		8.082
S3		7.962
S4		8.004

Structures and details of the proposed molecules as novel CCR15 inhibitors.

CONCLUSION

The main objective of this study was to define and establish a QSAR model to predict bioactivity of a series of 3-amino-4-(2-(2-(4-benzylpiperazin-1-yl)-2-oxoethoxy) phenylamino) cyclobutenedione derivatives as novel CCR1 antagonists without any knowledge of the under study system. Various theoretical calculated molecular descriptors were applied to calculate PCs. Calculated PCs were used to make model of the relationship between the molecule structures of the studied compounds and the corresponding bioactivities. The study showed that the calculated PCs as input variable to network can improve the predictive ability of the neural networks. Moreover, the suggested QSAR model was based on nonlinear ANN approach, which can be employed to simulate any kinds of complex correlation or function relationship in a given multivariable system. i.e., ANN approach is more appropriate for modeling where no clearly defined mathematical model for a system is available. Bioactivity is one of the most important properties for a given compound. Therefore, accurate, well-organized and intelligent QSAR model for the bioactivity will be influential for drug design and development.

Table 1b

The structures and biological activities of compounds 9–13

Compound	R₂	R₃	Observed pIC₅₀	Predicted pIC₅₀
9	H	H	6.036	6.012
10	H	Br	7.602	7.243
^b11	H	F	6.795	6.462
12	H	Me	7.045	7.001
13	F	F	6.853	7.192

Table 1c

The structures and biological activities of compounds 14–27

Compound	R₄	Observed pIC₅₀	Predicted pIC₅₀
14	Et	7.142	7.118
15	Pr	6.769	7.097
16	CH₂Ph	6.096	6.083
17	H	8.000	7.976
18		6.795	6.786
19		7.200	7.203
^b20		7.698	7.970
21		7.744	7.693
22		7.193	7.192
23		7.920	7.713
24		7.301	7.273
25^b		7.823	7.426
26		7.376	6.794
27^b		7.585	7.798

Table 1d

The structures and biological activities of compounds 28–29

Compound	R₅	Observed pIC₅₀	Predicted pIC₅₀
28	H	8.154	8.248
29	Me	7.119	7.285

pIC50=−log(IC50)

Compounds selected as test set

Table 2

The result of of principal component analysis on the total descriptors.

Component	Eigenvalues	% of Variance Explained	Cumulative%
1	470.394	39.495	39.495
2	138.563	11.634	51.130
3	127.783	10.729	61.859
4	79.828	6.702	68.561
5	61.826	5.191	73.752
6	42.604	3.577	77.330
7	36.975	3.104	80.434
8	34.61	2.906	83.340
9	30.600	2.569	85.910
10	23.659	1.986	87.896
11	18.673	1.567	89.464
12	17.002	1.427	90.892
13	14.264	1.197	92.089
14	13.344	1.120	93.210

6 in total

1. Validated QSAR analysis of some diaryl substituted pyrazoles as CCR2 inhibitors by various linear and nonlinear multivariate chemometrics methods.

Authors: Elham Arkan; Mohsen Shahlaei; Alireza Pourhossein; Kambiz Fakhri; Afshin Fassihi
Journal: Eur J Med Chem Date: 2010-04-28 Impact factor: 6.514

2. QSAR analysis for some diaryl-substituted pyrazoles as CCR2 inhibitors by GA-stepwise MLR.

Authors: Lotfollah Saghaie; Mohsen Shahlaei; Afshin Fassihi; Armin Madadkar-Sobhani; Mohammad B Gholivand; Alireza Pourhossein
Journal: Chem Biol Drug Des Date: 2010-11-30 Impact factor: 2.817

3. Structure-activity relationships of novel, highly potent, selective, and orally active CCR1 antagonists.

Authors: Yun Feng Xie; Kirk Lake; Kathleen Ligsay; Mallareddy Komandla; Ila Sircar; Gobi Nagarajan; Jian Li; Kui Xu; Jason Parise; Lisa Schneider; Ding Huang; Juping Liu; Kevin Dines; Naoki Sakurai; Miguel Barbosa; Rick Jack
Journal: Bioorg Med Chem Lett Date: 2007-04-05 Impact factor: 2.823

4. Identification of novel series of human CCR1 antagonists.

Authors: Yun Feng Xie; Ila Sircar; Kirk Lake; Mallareddy Komandla; Kathleen Ligsay; Jian Li; Kui Xu; Jason Parise; Lisa Schneider; Dingqiu Huang; Juping Liu; Naoki Sakurai; Miguel Barbosa; Rick Jack
Journal: Bioorg Med Chem Lett Date: 2007-09-25 Impact factor: 2.823

5. Species selectivity of a small molecule antagonist for the CCR1 chemokine receptor.

Authors: M Liang; M Rosser; H P Ng; K May; J G Bauman; I Islam; A Ghannam; P J Kretschmer; H Pu; L Dunning; R M Snider; M M Morrissey; J Hesselgesser; H D Perez; R Horuk
Journal: Eur J Pharmacol Date: 2000-02-11 Impact factor: 4.432

6. QSAR study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using LS-SVM and GRNN based on principal components.

Authors: Mohsen Shahlaei; Razieh Sabet; Maryam Bahman Ziari; Behzad Moeinifard; Afshin Fassihi; Reza Karbakhsh
Journal: Eur J Med Chem Date: 2010-07-16 Impact factor: 6.514

6 in total

1 in total

1. An efficient piecewise linear model for predicting activity of caspase-3 inhibitors.

Authors: Loghman Firoozpour; Khadijeh Sadatnezhad; Sholeh Dehghani; Eslam Pourbasheer; Alireza Foroumadi; Abbas Shafiee; Massoud Amanlou
Journal: Daru Date: 2012-09-10 Impact factor: 3.117

1 in total