Literature DB >> 19325836

QSAR study of p56(lck) protein tyrosine kinase inhibitory activity of flavonoid derivatives using MLR and GA-PLS.

Afshin Fassihi1, Razieh Sabet1.   

Abstract

Quantitative relationships between molecular structure and p56(lck) protein tyrosine kinase inhibitory activity of 50 flavonoid derivatives are discovered by MLR and GA-PLS methods. Different QSAR models revealed that substituent electronic descriptors (SED) parameters have significant impact on protein tyrosine kinase inhibitory activity of the compounds. Between the two statistical methods employed, GA-PLS gave superior results. The resultant GA-PLS model had a high statistical quality (R(2) = 0.74 and Q(2) = 0.61) for predicting the activity of the inhibitors. The models proposed in the present work are more useful in describing QSAR of flavonoid derivatives as p56(lck) protein tyrosine kinase inhibitors than those provided previously.

Entities:  

Keywords:  Chemometrics; Flavonoid; Protein tyrosine kinase; QSAR; SED analysis

Year:  2008        PMID: 19325836      PMCID: PMC2635749          DOI: 10.3390/ijms9091876

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   6.208


1. Introduction

The quantitative structure-activity relationship (QSAR) research field provides medicinal chemists with the ability to predict drug activity by mathematical equations which construct a relationship between the chemical structure and the biological activity [1, 2]. These mathematical equations are in the form of y = Xb+e that describe a set of predictor variables (X) with a predicted variable (y) by means of a regression vector (b) [3]. After the earlier QSAR studies by Hansch, who showed a correlation between biological activity and octanol-water partition coefficient [2], it is now assumed that the sum of substituent effects on the steric, electronic and hydrophobic interaction of compounds with their receptor determines their biological activity [4-6]. The first step in constructing the QSAR models is finding one or more molecular descriptors that represent variation in the structural property of the molecules by a number [7]. Nowadays, a wide range of descriptors are being used in QSAR studies which can be classified into different categories according to the Karelson approach including; constitutional, geometrical, topological, quantum, chemical and so on [8]. There are different variable selection methods available including; multiple linear regression (MLR), genetic algorithm (GA), principal component or factor analysis (PCA/FA) and so on. The mathematical relationships between molecular descriptors and activity are used to find the parameters affecting the biological activity and/or estimate the property of other molecules. It is now well established that protein tyrosine kinases (PTKs) provide a central switching mechanism in cellular signal transduction pathways by catalyzing the transfer of the γ-phosphate of either ATP or GTP to specific tyrosine residues in certain protein substrates [9, 10]. This regulatory control plays a crucial role in signal transduction pathways that regulate several cellular functions under both normal and deregulated conditions [11-14]. PTKs are the intracellular effectors for many growth hormone receptors. After the discovery of activated PTKs as the product of dominant viral-transforming genes (oncogenes) providing the early hypothesis for the connection between protein tyrosine phosphorylation and cell transformation, enough evidence are now available to suggest that inappropriate or elevated expression of PTKs contribute to the transformed state of cells in many human malignancies [15-19]. P56lck is a lymphoid-specific protein tyrosine kinase that is principally expressed in T lymphocytes [20]. Association of p56lck with the cytoplasmic tail of various cell surface receptors, as well as associations of p56lck with intracellular targets of phosphorylation, suggests that this tyrosine kinase plays a central role in coordinating early signal transduction events [21]. Based on this knowledge it is clear that, substances which can modulate the activity of PTKs might be potentially effective therapeutic agents. The key step in the mechanism of kinase activity of all PTKs is the recognition and binding of a nucleoside triphosphate (usually ATP) and an appropriate tyrosyl-containing substrate to the enzyme. Direct transfer of phosphate between the two molecules is the next step in the PTKs function [22]. A variety of compounds can inhibit the function of PTKs in a manner which is competitive with respect to nucleotide binding. Among such competitive inhibitors are flavonoids, a group of low molecular weight plant natural products that include one of the largest classes of naturally-occurring polyphenolic compounds [23, 24]. This group of plant natural products is largely responsible for the colors of many fruits and flowers, and over 4,000 flavonoid pigments have been characterized and classified according to their chemical structure. Chemically they are C6-C3-C6 compounds in which the two C6 groups are substituted benzene rings, and the C3 group is an aliphatic chain which contains a pyran ring. Flavonoids occur as O-or C-glycosides or in the “free” state as aglycones with hydroxyl or methoxyl groups present on the aglycone. The flavonoids may be divided into seven types: flavones, flavonols, flavonones, chalcones, xanthones, isoflavones, and biflavones. Flavonoids have been gained wide interest as potential pharmacological agents since some of the best sources of flavonoids are foods: apples, blueberries, bilberries, onions, soy products and tea. Furthermore numerous medicinal plants contain therapeutic amounts of flavonoids, which are used to treat a wide variety of disorders [25]. Here, we consider the inhibitory activity of flavonoids against protein–tyrosine kinase p56lck. Several QSAR studies were reported on this class of molecules using different descriptors and different methods of modeling. Thakur et al. described a QSAR study on p56lck protein tyrosine kinase inhibitor flavonoids using only hydration energy and hydrophobic parameters [26]. Nikolovska-Coleska et al. treated a set of 104 derivatives with standard linear regression technique by the use of classical/quantum descriptors [27]. The same dataset was treated by Novic et al. with a counter propagation neural network by the use of classical/quantum descriptors [28]. Oblak et al. applied a wide variety of descriptors with CODESSA software on the above-mentioned dataset [29]. A quantum chemical/classical QSAR study on a set of 75 flavonoids and closely related compounds tested as p56lck protein tyrosine kinase and AR inhibitors has been carried out by Stefanic et al. and the obtained structure-activity relationships of both enzyme systems were compared [30]. A comprehensive ab initio study of 3D structures of some flavonoids is reported by Meyer [31]. Deeb et al. calculated nodal orientation with program NODANGLE [32]. In the present paper, the QSAR study for a series of 50 flavonoid analogues with the ability to inhibit protein tyrosine kinase has been considered [32]. In a comprehensive study of the PTK system we used a very large descriptor set (more than 600 topological, geometrical, constitutional, functional group, electrostatic, quantum and chemical descriptors) and different analyses: Hansch, Free-Wilson and substituent electronic descriptors (SED), in order to be able to compare the predictive ability of descriptors from different descriptor groups. Multiple linear regression (MLR) and genetic algorithm partial least squares (GA-PLS) methods were applied as methods for modeling.

2. Results and Discussion

The structural features and biological activity of the studied compounds are listed in Table 1. Calculated descriptors for each molecule are summarized in Table 2.
Table 1.

Chemical structure of flavonoid derivatives used in this study and their experimental and predicted activity for protein kinase inhibition.

Chemical structure of flavonoid derivatives.

CompoundRExperimental pIC50aPredicted pIC50REP b
15,7-OH,4′-NH25.134.7707−0.0753
23,5,7,3′,4′-OH4.884.94310.0128
33,7,3′,4′-OH4.864.7707−0.0187
45,7,4′-OH4.834.4356−0.0889
55,4′-OH4.804.2603−0.1267
66,3′-OH4.804.4242−0.0849
76-OH,5,7,4′-NH24.744.1061−0.1544
85,7-OH4.714.0895−0.1518
94′-OH,3′,5′-OCH34.574.2687−0.0706
105,7,3′,4′-OH4.464.4172−0.0097
117,3′-OH4.414.43580.0058
126-OH,5,7,3′-NH24.344.36810.0064
136-OMe,8,3′-NH24.254.1649−0.0204
146-OH,3′,4′,5′-OCH34.224.35910.0319
153,5,7,4′-OH,3′,5′-OCH34.164.16490.0012
163,5,7,3′,5′-OH4.003.9947−0.0013
176,4′-NH23.993.9613−0.0072
186,8,4′-NH23.973.97640.0016
196-OH,8,4′-NH23.933.94460.0037
206,4′-OH3.933.9247−0.0013
217,8,4′-OH,3′,5′-OCH33.923.8990−0.0054
228,4′-NH23.913.8994−0.0027
236,4′-OH,3′,5′-OCH33.893.91330.0060
247-OH,4′-NH23.863.88150.0056
257-OH,6,4′-NH23.853.8296−0.0053
267,4′-OH3.783.86210.0213
277,8,3′OH3.753.6903−0.0162
286,3′-NH23.704.02280.0803
294′-NH23.684.18500.1207
305-OH,6,4′-NH23.653.93250.0718
313,5,7-OH3.533.97940.1129
325,4′-OH,7-OCH33.553.73150.0487
335,3′-OH3.504.12090.1507
347,8-OH3.503.4873−0.0036
355-OH,8,4′-NH23.493.67050.0492
367-OH,8,4′-NH23.483.66940.0516
377-OH3.473.85670.1003
386-OCH3,8,4′-NH23.433.67090.0683
397,8-OH,3′,4′,5′-OCH33.404.00580.1512
403-COOCH3,4′-OH3.363.70810.0939
414′-OH3.303.70810.1101
427-OH,6,3′-NH23.303.34190.0125
437-OH,6,8,4′-NH23.123.34190.0664
443-COOCH3,4′-NH23.093.34190.0754
453-COOH,7-OCH3,4′-OH2.993.32620.1011
467,4′-OH,3′,5′-OCH32.903.32620.1281
477-OH,6,8,4′-NO22.813.06740.0839
483-COOH,4′-OH2.803.06740.0872
495-OCH3,8,4′-NH22.793.06740.0904
507-OH,8,4′-NO22.733.32620.1793

pIC50 = –log (IC50),

REP = Relative Error Prediction

Table 2.

Brief description of some descriptors used in this study.

Descriptor typeMolecular Description
ConstitutionalMolecular weight, no. of atoms, no. of non-H atoms, no. of bonds, no. of heteroatoms, no. of multiple bonds (nBM), no. of aromatic bonds, no. of functional groups (hydroxyl, amine, aldehyde, carbonyl, nitro, nitroso, etc.), no. of rings, no. of circuits, no of H-bond donors, no of H-bond acceptors, no. of Nitrogen atoms (nN), chemical composition, sum of Kier-Hall electrotopological states (Ss), mean atomic polarizability (Mp), number of rotable bonds (RBN), mean atomic Sanderson electronegativity (Me), etc.
TopologicalMolecular size index, molecular connectivity indices (X1A, X4A, X2v, X1Av, X2Av, X3Av, X4Av), information content index (IC), Kier Shape indices, total walk count, path/walk-Randic shape indices (PW3, PW4, Zagreb indices, Schultz indices, Balaban J index (such as MSD) Wiener indices, topological charge indices, Sum of topological distances between F..F (T(F..F)), Ratio of multiple path count to path counts (PCR), Mean information content vertex degree magnitude (IVDM), Eigenvalue sum of Z weighted distance matrix (SEigZ), reciprocal hyper-detour index (Rww), Eigenvalue coefficient sum from adjacency matrix (VEA1), radial centric information index, 2D petijean shape index (PJI2), etc.
Geometrical3D petijean shape index (PJI3), Gravitational index, Balaban index, Wiener index, etc.
QuantumHighest occupied Molecular Orbital Energy (HOMO) , Lowest Unoccupied Molecular Orbital Energy (LUMO), Most positive charge (MPC), Least negative charge (LNC), Sum of squares of charges (SSC), Sum of square of positive charges (SSPC), Sum of square of negative charges (SSNC), Sum of positive charges (SUMPC), Sum of negative charges (SUMNC), Sum of absolute of charges (SAC), Total dipole moment (DMt), Molecular dipole moment at X-direction (DMX), Molecular dipole moment at Y-direction (DMY), Molecular dipole moment at Z-direction (DMZ), Electronegativity (χ= −0.5 (HOMO-LUMO)), Electrophilicity (ω= χ2/2 η) ,Hardness (η = 0.5 (HOMO+LUMO)), Softness (S=1/η).
Functional groupNumber of total tertiary carbons (nCt), Number of H-bond acceptor atoms (nHAcc), number of total hydroxyl groups (nOH), number of unsubstituted aromatic C(nCaH), number of ethers (aromatic) (nRORPh), etc.
ChemicalLogP (Octanol-water partition coefficient), Hydration Energy (HE), Polarizability (Pol), Molar refractivity (MR), Molecular volume (V), Molecular surface area (SA).
Substituent electronic descriptorsRMSQ (Root mean square error of charges), SPQ ( Sum of positive charges), SNQ ( Sum of negative charges), RMSDM (Root mean square of dipole moments at any Cartesian coordinate direction), TDM (Total dipole moment), FRMS (Root mean square force that any atom in constituent molecule see right before the optimization), FMAX (Maximum force on molecule), HOMO (Highest occupied molecular orbital), LUMO (Lowest unoccupied molecular orbital), HD (Hardness), SOF (Softness), EPH (Electrophilicity), EN (Electronegativity).

2.1. MLR analysis

In the first step, separate stepwise selection-based MLR analyses were performed using different types of descriptors, and then, an MLR equation was obtained utilizing the pool of all calculated descriptors. The results are summarized in Table 3. Correlation coefficient (r2) matrix for the descriptors used in different MLR equations is shown in Table 4. Collinear descriptors degrade the performance of MLR equations and such models have lowered prediction ability.
Table 3.

The results of MLR analysis with different types of descriptors.

No.Descriptor sourceMLR EquationsNR2SERMSCVQ2F
E1ChemicalpIC50 = 4.893 (± 0.735) − 0.056 (± 0.017) HE −0.007 (± 0.003) Mass500.400.550.580.3213.82
E2QuantumpIC50 = 6.362 (± 0.565) − 6.805 (± 1.505) MPC500.430.530.540.3817.44
E3ConstitutionalpIC50 = 3.139 (± 1.250) − 0.438 (± 0.100) nBM − 0.506 (± 0.205) AMW − 0.584 (± 0.266) nAB500.490.490.510.4219.65
E4TopologicalpIC50 = 17.242 (± 0.605) − 3.374 (± 0.545) IVDM − 53.95 (± 12.355) X1Av + 2.349 (± 0.696) ICR +24.874 (±9.569) PW4 + 73.575 (±33.719) X4A500.720.380.480.5830.13
E5GeometricalpIC50 = −15.093 (± 3.339) + 19.450 (± 3.406) SPH − 0.010 (± 0.002) G(N...O)500.600.430.470.4917.23
E6Functional grouppIC50 = 3.672 (± 0.123) − 0.414 (± 0.130) nNO2 −1.098 (± 0.369) nOHt + 0.160 (± 0.058) nOH500.530.450.500.4512.67
E7HanschpIC50 = 4.219 (± 0.289) − 0.615 (± 0.202) π5 + 1.462 (± 0.555) ℑR′3 − 1.379 (± 0.490) ℑR8 −0.249 (± 0.111) L3500.530.450.500.4512.67
E8SEDpIC50 = −0.708 (± 1.228) − 9.570 (± 2.500) HOMOA3 + 1.092 (±0.308) SNQ8500.820.320.300.6151.43
E9Molecular descriptorpIC50 = −19.763 (± 4.304) − 4.785 (± 1.275) MPC + 25.113 (± 4.142) SPH + 0.849 (± 0.264) SNQ8 − 0.357 (± 0.136) L3500.830.310.280.6252.43
Table 4.

Correlation coefficient (r2) matrix for the descriptors of flavone derivatives used in the MLR equation.

HEMassMPCnBMAMWnABASPG(N...O)X1AVICRPW4X4AIVDMnNO2nOHtnOHℑR′3L3ℑR8π5pIC50
HE1−0.2340.1920.124−0.3270.236−0.0060.000.6510.075−0.0120.3160.0650.0690.047−0.745−0.3940.067−0.0050.485−0.347
Mass10.5310.5800.5120.136−0.2690.328−0.6550.4160.541−0.6310.8160.5540.0990.2110.3260.1960.4870.040−0.268
MPC10.9530.7150.366−0.2330.623−0.5390.3040.050−0.3290.9040.8760.259−0.227−0.2860.2890.5950.156−0.547
nBM10.7780.165−0.0940.725−0.6240.3900.016−0.3250.9370.9720.114−0.196−0.2110.1250.6870.193−0.498
AMW10.050−0.2000.3560.8970.0370.116−0.2060.7180.7750.1160.4340.1360.1250.6200.065−0.191
nAB1−0.684−0.1270.069−0.1920.257−0.3970.235−0.0730.692−0.086−0.1980.930−0.1080.185−0.364
ASP10.2940.1550.5380.5320.388−0.2210.0690.369−0.273−0.201−0.768−0.039−0.0980.269
G(N...O)1−0.3790.5780.2990.3480.6180.763−0.138−0.478−0.437−0.1820.5080.034−0.329
X1AV1−0.130−0.1710.413−0.651−0.647−0.052−0.572−0.270−0.056−0.5420.2290.058
ICR1−0.212−0.2770.4420.441−0.104−0.410−0.161−0.2780.1530.168−0.080
PW41−0.1570.261−0.0450.1580.3360.4130.3560.029−0.2490.002
X4A1−0.489−0.233−0.252−0.046−0.025−0.466−0.261−0.1570.347
IVDM10.8910.155−0.100−0.0300.2180.6630.192−0.494
nNO21−0.050−0.177−0.166−0.0970.7200.151−0.416
nOHt10.061−0.1370.513−0.0750.128−0.306
nOH10.6210.104−0.004−0.3750.370
R′31−0.0700.008−0.0140.315
L31−0.1430.085−0.259
R810.224−0.367
π51−0.451
pIC501
In Table 3 the QSAR models derived for different derivatives by using different sets of molecular descriptors are listed. Table 3 provides the resulted equations for the studied compounds. The first equation of Table 3 was found by using chemical descriptors (E1). This equation explained the negative effect of hydration energy and molecular weight (Mass) of molecules on protein tyrosine kinase inhibitory activity. Equation E2 shows that among quantum descriptors, most positive charge (MPC) has a negative effect on protein tyrosine kinase inhibitory activity and reveals the presence of columbic interactions between the ligands and receptors. The negative sign of the coefficient of MPC demonstrates that ligands with the least MPC could interact with receptor more efficiently. This indicates that there is probably a negative region in receptor which produces columbic interactions with ligand. Equation E3 of Table 3 demonstrates the effect of constitutional descriptors. It includes the negative effects of average molecular weight (AMW), number of multiple bonds (nBM) and number of aromatic bonds (nAB) on protein tyrosine kinase inhibitory activity. Molecules with lower coefficient of AMW show better protein tyrosine kinase inhibitory activity and decreasing the number of multiple bonds of compounds results in activity enhancement. The MLR equation of Table 3 was obtained from the pool of topological descriptors (E4) explained the positive effect of mean information content on the distance equality (ICR), path/walk 4-randic shape index (PW4), average connectivity index chi-4 (X4v) and the negative effect of mean information content vertex degree magnitude (IVDM) and average valence connectivity index chi-1 (X1v) on protein tyrosine kinase inhibitory activity. This equation describes the structure-activity relationship better than those obtained from the chemical, quantum, constitutional descriptors. The equation obtained from the effect of geometrical parameter on protein tyrosine kinase inhibitory activity of the studied compounds has been described as E5 of Table 3. It explains the positive effect of spherosity (SPH) and negative effect of sum of geometrical distances between N...O, i.e. G (N...O) on protein tyrosine kinase inhibitory activity. The effect of functional groups on protein tyrosine kinase inhibitory activity of the studied compounds has been described by equation E6 of Table 3. This three-parametric equation does not have a high statistical quality, which suggests that the protein tyrosine kinase inhibitory activity of the studied molecules is not highly dependent on the type of functional group; but it is dependent on the structural changes induced by variations in functional groups. The negative sign of nNO2 and nOHt indicates that molecules with lower number of nitro groups (aliphatic) and tertiary alcohols (aliphatic) bind to protein kinase stronger. On the other hand, number of hydroxyl groups (nOH) represents direct effect on the inhibitory activity of the compounds. The Hansch equation (E7) shows the importance of steric, electronic and lipophilic factors on protein tyrosine kinase inhibitory activity. These factors are described by L3 (Length parameter of C3 substituent), ℑR′3, ℑR8 (Swain and Lupton field parameter of C-R′3 and C-R8 substitutes) and π5 (lipophilic parameter of C5 substitute), respectively. The negative coefficient of π5 indicates that lipophilic substituents at R5 are not favorable for binding affinity. This equation shows the positive effect of ℑR′3 and the negative effect of ℑR8 on the inhibitory activity of the compounds. In addition the negative effect of L3 describes that the presence of bulky groups at C3 leads to decreased activity because bulky groups hinder strong interaction between ligands and the enzyme. The SED equation (E8) shows the importance of SED factors on protein tyrosine kinase inhibitory activity. One of the parameters is molecular orbital energy HOMOA3 (Highest occupied molecular orbital parameter of C3 substitute) and the other one is SNQ8 (Sum of negative charges parameter of C8 substitute). It explains the positive effect of HOMOA3 and negative effect of SNQ8 on protein tyrosine kinase inhibitory activity. The last Equation (E9) was obtained from the all types of calculated descriptors. Stepwise selection and elimination of variables produced a four-parametric QSAR equation. This equation shows that geometrical (SPH), quantum (MPC), Hansch (L3) and SED (SNQ8) parameters are major factors that affect protein tyrosine kinase inhibitory activity of compounds. Among these descriptors MPC and L3 have negative effects and the others have positive effects on the protein tyrosine kinase inhibitory activity.

2.2. Free-Wilson analysis

The simple Free-Wilson analysis (FWA) was considered to indicate which substituents on ring B and chromone moiety contribute to protein tyrosine kinase inhibitory activity and which ones detract from activity [33]. As indicated in Table 1, the molecules used in this study have a phenyl ring (ring B) and chromone moiety with different types of substituents in different positions of the ring. Some important substituents such as methoxyl, hydroxyl and amine are used in calculations. Therefore, the descriptors data matrix built for the FWA has 44 rows (i.e., number of selected molecules for FWA) and 24 columns (i.e., three substituents at eight substitution positions on the flavonoid structure). The elements of the descriptor data matrix are 1 or 0, to indicate the presence or absence of a given substituent in a specified position in a molecule, respectively. The following two-parametric equation was found between the activity data (y) and the Free-Wilson type descriptors data matrix: Equation (1) describes that protein tyrosine kinase inhibitory activity of studied compounds is directly affected by the presence of electron-donating hydroxyl group in the meta position (R′3) of the phenyl ring and most probably this part of the flavonoid molecule interacts with the catalytic domain of the enzyme. The same result was obtained by other researchers [27]. A methoxyl group on C-R5 detracts from the inhibitory activity, according to this equation.

2.3. GA-PLS analysis

In PLS analysis, the descriptors data matrix is decomposed to orthogonal matrices with an inner relationship between the dependent and independent variables. Therefore, unlike MLR analysis, the multicolinearity problem in the descriptors is omitted by PLS analysis. Because a minimal number of latent variables are used for modeling in PLS; this modeling method coincides with noisy data better than MLR. In order to find the more convenient set of descriptors in PLS modeling, genetic algorithm was used. To do so, many different GA-PLS runs were conducted using different initial set of populations. The data set (n = 50) was divided into two group: calibration set (n = 40) and prediction set (n = 10). Given 40 calibration samples; the leave-one out cross-validation procedure was used to find the optimum number of latent variables for each PLS model. The most convenient GA-PLS model that resulted in the best fitness contained 14 indices, four of them being those obtained by MLR. The PLS estimate of coefficients for these descriptors are given in Figure 1. As it observed, a combination of quantum, topological, geometrical and Hansch descriptors have been selected by GA-PLS to account the protein tyrosine kinase inhibitory activity of flavonoid derivatives. The majority of these descriptors are topological indices. The resulted GA-PLS model possessed a high statistical quality R2 = 0.74 and Q2 = 0.61. The predictive ability of the model was measured by applying to 10 external test set molecules. The squared correlation coefficient for prediction was 0.82 and standard error of prediction was 0.30. The values of pIC50 using GA-PLS model (refined from cross-validation or external prediction set) along with the corresponding relative errors of prediction (REP) are shown in Table 1. Very small values of relative errors (between ± 0.40) confirm the accuracy of the proposed GA-PLS model for modeling protein tyrosine kinase inhibitory activity of the studied flavonoid derivatives.
Figure 1.

PLS regression coefficients for the variables used in GA-PLS model.

Comparison between the results obtained by GA-PLS and MLR methods indicates higher accuracy of GA-PLS method in describing the inhibitory activity of flavonoid derivatives toward protein tyrosine kinase enzyme. The difference in accuracy of the two regression methods used in this study is visualized in Figure 2 by plotting the predicted activity (by cross-validation) against the experimental values. Obviously, two linear models represented scattering of data around a straight line with slope close to one. As it is observed, the plot of data resulted by GA-PLS represents the lowest scattering and the plot obtained by MLR analysis (which is obtained from E9) is in the second order of accuracy.
Figure 2.

Plots of the cross-validated predicted activity against the experimental activity for the QSAR models obtained by MLR, GA-PLS methods.

To measure the significance of the 14 selected PLS descriptors in the protein tyrosine kinase inhibitory activity; VIP was calculated for each descriptor [34]. The VIP analysis of PLS equation is shown in Figure 3. VIP shows that HNar and TI2, which are topological, and SPH which is a geometrical parameter, are the most important indices in the QSAR equation derived by PLS analysis. In addition, quantum parameters such as (HOMO) and Hansch (ℑR′3) have been found to be moderately influential parameters.
Figure 3.

Plot of variables important in projection (VIP) for the descriptors used in GA-PLS model.

3. Methodology

3.1. Software

The two-dimensional structures of molecules were drawn using Hyperchem 7.0 software. The final geometries were obtained with the semi-empirical AM1 method in Hyperchem program. The molecular structures were optimized using the Polak-Ribiere algorithm until the root mean square gradient was 0.01 kcal mol−1. The resulted geometry was transferred into Dragon program package, which was developed by Milano Chemometrics and QSAR Group [35]. The z-matrix of the structures was provided by the software and transferred to the Gaussian 98 program. Complete geometry optimization was performed taking the most extended conformation as starting geometries. Semi-empirical molecular orbital calculation (AM1) of the structures was preformed using Gaussian 98 program [36].

3.2. Activity data & descriptor generation

The biological data used in this study are protein tyrosine kinase inhibitory activity, −log (IC50), of a set of 50 flavonoid analogues [32]. The structural features and biological activity of these compounds are listed in Table 1 and then used for subsequent QSAR analysis as dependent variables. The large number of molecular descriptors was calculated using Hyperchem, Dragon package and Gaussian 98. Some chemical parameters including molecular volume (V), molecular surface area (SA), hydrophobicity (Log P), hydration energy (HE) and molecular polarizability (MP) were calculated using Hyperchem Software. The Dragon software calculated different functional groups, topological, geometrical and constitutional descriptors for each molecule. Gaussian 98 was employed for calculation of different quantum chemical descriptors including, dipole moment (DM), local charges, and HOMO and LOMO energies. Hardness (η), softness (S), electronegativity (χ) and electrophilicity (ω) were calculated according to the method proposed by Thanikaivelan et al. [37]. Classical substituent constants including hydrophobic constant (π), the Hammet electronic constants (σ), the Taft field effect (FI), resonance (R) substituent and steric (molar refractivity MR and STERIMOL) constants were also used as descriptor in this study [38]. The calculated descriptors for each molecule are summarized in Table 2.

3.3. Data screening & model building

The selected descriptors from each class and the experimental data were analyzed by the stepwise regression SPSS (version 12.0) software. The calculated descriptors were collected in a data matrix whose number of rows and columns were the number of molecules and descriptors, respectively. Multiple linear regression (MLR) and partial least squares (PLS) were used to derive the QSAR equations and feature selection was performed by the use of genetic algorithm (GA). The resulted models were validated by leave-one out cross-validation procedure (using MATLAB software) to check their predictability and robustness. However, this procedure did not produce good results and therefore we used genetic algorithm (GA-PLS) to select the best variables. Application of PLS allows the construction of larger QSAR equations, while still avoiding over-fitting and eliminating most variables. PLS is normally used in combination with cross-validation to obtain the optimum number of components [39, 40]. The PLS regression method used in this study was the NIPALS-based algorithm existed in the chemometrics toolbox of MATLAB software (version 7.1 Math work Inc.). Leave-one-out cross-validation procedure was used to obtain the optimum number of factors based on the Haaland and Thomas F-ratio criterion [41].

3.4. Variable importance in the projection (VIP)

In order to investigate the relative importance of the variable appeared in the final model obtained by GA-PLS method, variable important in projection (VIP) was employed [34]. VIP values reflect the importance of terms in PLS model. According to Erikson et al. X-variables (predictor variables) could be classified according to their relevance in explaining y (predicted variable), so that VIP > 1.0 and VIP < 0.8 mean highly or less influential, respectively, and 0.8 < VIP< 1.0 means moderately influential [8].

3.5. Substituent electronic descriptors (SED)

Electronic descriptors obtained from quantum chemical calculations have found major popularity and there is a challenge between calculation complexity and accuracy to select the quantum chemical calculation methods (i.e., semi-empirical and ab initio) [42]. To simplify the quantum chemical calculations Hemmateenejad et al. recently have hypothesized that the calculations could be performed on the substituents instead of whole molecular structures and the resulting electronic features can be considered as electronic descriptors which have found major popularity in QSAR/QSPR studies [43,44]. Hemmateenejad et al. proposed substituent electronic descriptors (SED) as an alternative to both substituent constants and molecular descriptors [43]. SED analysis for each substituent was used in our study and the calculated descriptors are listed in Table 2. They can be classified into three different electronic categories including local charges, dipoles and orbital energies. Since most of the constituents are open shell quantum species (due to being in doublet quantum state as a radical molecule), a difference in energy between two electronic energy populations, alpha (spine up) and beta (spine down) can be seen using Gaussian 98. It provides some additional descriptors HOMOA, HOMOB, LUMOA, LUMOB, HAD, HDB, SOFA, SOFB, ENA, ENB, EPHA, and EPHB stem from two different alpha and beta electronic population energy, where the subscript A and B stand for alpha and beta population of electronic energy, respectively. Therefore, a total of 26 electronic descriptors were calculated for each substituent.

4. Conclusions

Quantitative relationships between molecular structure and protein tyrosine kinase inhibitory activity of flavonoid derivatives were discovered by two chemometrics methods: MLR and GA-PLS. Different QSAR models revealed that SED parameters have significant impact on protein tyrosine kinase inhibitory activity of the compounds. In this series a significant role of topological and geometrical parameters on the inhibitory activity was observed. Using the pool of all types of calculated descriptors a new QSAR model was derived for these compounds. In this model the importance of quantum, geometrical, SED and Hansch parameters have an effect on protein tyrosine kinase inhibitory activity was indicated. A comparison between the two statistical methods employed indicated that GA-PLS represented superior results. The resulted GA-PLS model possessed a high statistical quality (R2 = 0.74 and Q2 = 0.61) for predicting the activity of the inhibitors. The models proposed in present work are more useful in describing QSAR of flavonoid derivatives as p56lck protein tyrosin kinase Inhibitors than those proposed previously.
  26 in total

Review 1.  Signaling--2000 and beyond.

Authors:  T Hunter
Journal:  Cell       Date:  2000-01-07       Impact factor: 41.582

2.  Comparative QSAR: Toward a Deeper Understanding of Chemicobiological Interactions.

Authors:  Corwin Hansch; David Hoekman; Hua Gao
Journal:  Chem Rev       Date:  1996-05-09       Impact factor: 60.622

Review 3.  Oncogenic kinase signalling.

Authors:  P Blume-Jensen; T Hunter
Journal:  Nature       Date:  2001-05-17       Impact factor: 49.962

Review 4.  Protein tyrosine kinase inhibitors.

Authors:  P W Groundwater; K R Solomons; J A Drewe; M A Munawar
Journal:  Prog Med Chem       Date:  1996

Review 5.  Signal transduction by the lymphocyte-specific tyrosine protein kinase p56lck.

Authors:  R Weil; A Veillette
Journal:  Curr Top Microbiol Immunol       Date:  1996       Impact factor: 4.291

Review 6.  Involvement of the protein tyrosine kinase p56lck in T cell signaling and thymocyte development.

Authors:  S J Anderson; S D Levin; R M Perlmutter
Journal:  Adv Immunol       Date:  1994       Impact factor: 3.543

7.  Quantitative structure-activity relationship of flavonoid analogues. 3. Inhibition of p56lck protein tyrosine kinase.

Authors:  M Oblak; M Randic; T Solmajer
Journal:  J Chem Inf Comput Sci       Date:  2000 Jul-Aug

8.  Quantitative structure-activity relationship for cyclic imide derivatives of protoporphyrinogen oxidase inhibitors: a study of quantum chemical descriptors from density functional theory.

Authors:  Jian Wan; Li Zhang; Guangfu Yang; Chang-Guo Zhan
Journal:  J Chem Inf Comput Sci       Date:  2004 Nov-Dec

9.  Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene.

Authors:  D J Slamon; G M Clark; S G Wong; W J Levin; A Ullrich; W L McGuire
Journal:  Science       Date:  1987-01-09       Impact factor: 47.728

10.  Synthesis and biochemical evaluation of a series of aminoflavones as potential inhibitors of protein-tyrosine kinases p56lck, EGFr, and p60v-src.

Authors:  M Cushman; H Zhu; R L Geahlen; A J Kraker
Journal:  J Med Chem       Date:  1994-09-30       Impact factor: 7.446

View more
  6 in total

1.  Molecular modeling on structure-function analysis of human progesterone receptor modulators.

Authors:  Ria Pal; Md Ataul Islam; Tabassum Hossain; Achintya Saha
Journal:  Sci Pharm       Date:  2011-06-30

2.  Application of different chemometric tools in QSAR study of azolo-adamantanes against influenza A virus.

Authors:  R Karbakhsh; R Sabet
Journal:  Res Pharm Sci       Date:  2011-01

3.  Differential effects of polyphenols on proliferation and apoptosis in human myeloid and lymphoid leukemia cell lines.

Authors:  Amani A Mahbub; Christine L Le Maitre; Sarah L Haywood-Small; Gordon J McDougall; Neil A Cross; Nicola Jordan-Mahy
Journal:  Anticancer Agents Med Chem       Date:  2013-12       Impact factor: 2.505

4.  Mining Feature of Data Fusion in the Classification of Beer Flavor Information Using E-Tongue and E-Nose.

Authors:  Hong Men; Yan Shi; Songlin Fu; Yanan Jiao; Yu Qiao; Jingjing Liu
Journal:  Sensors (Basel)       Date:  2017-07-19       Impact factor: 3.576

5.  Quinazoline analogues as cytotoxic agents; QSAR, docking, and in silico studies.

Authors:  Leila Emami; Razieh Sabet; Soghra Khabnadideh; Zeinab Faghih; Parvin Thayori
Journal:  Res Pharm Sci       Date:  2021-08-19

6.  QSAR study of antimicrobial 3-hydroxypyridine-4-one and 3-hydroxypyran-4-one derivatives using different chemometric tools.

Authors:  Razieh Sabet; Afshin Fassihi
Journal:  Int J Mol Sci       Date:  2008-12-02       Impact factor: 6.208

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.