Literature DB >> 28911371

Amino substituted nitrogen heterocycle ureas as kinase insert domain containing receptor (KDR) inhibitors: Performance of structure-activity relationship approaches.

Hayriye Yilmaz1, Natalia Sizochenko2, Bakhtiyor Rasulev3, Andrey Toropov4, Yahya Guzel5, Viktor Kuz'min6, Danuta Leszczynska7, Jerzy Leszczynski8.   

Abstract

A quantitative structure-activity relationship (QSAR) study was performed on a set of amino-substituted nitrogen heterocyclic urea derivatives. Two novel approaches were applied: (1) the simplified molecular input-line entry systems (SMILES) based optimal descriptors approach; and (2) the fragment-based simplex representation of molecular structure (SiRMS) approach. Comparison with the classic scheme of building up the model and balance of correlation (BC) for optimal descriptors approach shows that the BC scheme provides more robust predictions than the classic scheme for the considered pIC50 of the heterocyclic urea derivatives. Comparison of the SMILES-based optimal descriptors and SiRMS approaches has confirmed good performance of both techniques in prediction of kinase insert domain containing receptor (KDR) inhibitory activity, expressed as a logarithm of inhibitory concentration (pIC50) of studied compounds.
Copyright © 2015. Published by Elsevier B.V.

Entities:  

Keywords:  KDR inhibitors; QSAR; SMILES; SiRMS; amino-substituted nitrogen heterocyclic ureas; descriptors

Year:  2015        PMID: 28911371      PMCID: PMC9351780          DOI: 10.1016/j.jfda.2015.03.001

Source DB:  PubMed          Journal:  J Food Drug Anal            Impact factor:   6.157


1. Introduction

The kinase insert domain containing receptor (KDR), alternatively referred to as VEGFR-2, is a receptor for vascular endothelial growth factors (VEGFs). It functions as a key regulator of angiogenesis, the process by which new capillaries are created from preexisting blood vessels [1]. Accordingly, interruption of VEGFR-2 signaling by small molecule inhibitors to VEGFR-2 kinase domain has been shown to be an attractive strategy in the treatment of cancer. In recent years, a novel series of amino-substituted nitrogen heterocyclic urea derivatives has been reported as being essential inhibitors against KDR [2]. Quantitative structure–activity relationship (QSAR)methods are widely applied nowadays to find mathematical relationships between the chemical structure of a compound and its biological activity [3-17]. This technique was utilized here, based on experimental data available, and calculated theoretical descriptors, to perform an inhibitory activity study [6,10,17]. In the present study, the predictive QSAR models were developed for a set of amino-substituted nitrogen heterocyclic ureas for which the molecular structure is represented by simplified molecular input-line entry systems (SMILES) applying new techniques, such as the SMILES-based such as the SMILES-based optimal descriptors approach implemented in COR relations And Logic (CORAL) (http://www.insilico.eu/coral), and the simplex representation of molecular structure (SiRMS) approach [18].

2. Materials and methods

2.1. Dataset

For prediction of inhibitory binding affinities (pIC50, i.e., logarithm of the 50% effective concentration) the data on 63 amino-substituted nitrogen heterocyclic ureas were collected from existing literature [19].

2.2. Computational details

2.2.1. CORAL approach

There are three options for the selection of optimal descriptors in CORAL: (1) graph based; (2) SMILES based; and (3) hybrid descriptors which are calculated using both graph and SMILES approaches [20-23]. There are two classes of graph invariants which are available in CORAL: vertices and Morgan vertices degrees. In the case of hydrogen-suppressed graphs (HSGs) and hydrogen-filled graphs, vertices are representations of the chemical elements, such as carbon, nitrogen, oxygen, etc. In the case of graphs of atomic orbitals, vertices represent electronic structures i.e. atomic orbitals such as 1s1, 2s2, 2p5, 3d10, etc. [24]. The optimal graph-based descriptor based on so-called correlation weights (DCW) is calculated as the following: Three topological invariants of the molecular graphs were involved in current study: vertex degree (EC0); extended connectivity of first order (EC1); and extended connectivity of second order (EC2) [25]. The optimal SMILES-based descriptor based on correlation weights: Sk, SSk, and SSSk are representations of molecular fragments, for example if SMILES = Clc1ccccc1 then sk = (Cl, c, 1, c, c, c, c, c, 1); ssk = (Clc, c1, cc, cc, cc, cc, cc, c1); sssk = (Clc1, c1c, ccc, ccc, ccc, ccc, cc1). PAIR, NOSP, HALO, and BOND are global SMILES attributes which are calculated with SMILES. These global attributes provide the possibility of carrying out an additional discrimination of substances into separated classes: for example nitrogen, oxygen, sulphur, and phosphorus (NOSP); fluorine, chlorine, and bromine (HALO) [24]. The BOND attribute is related to presence/absence of three categories of chemical bonds: double, triple, and stereospecific. The coefficients a, β, γ, x, y, and z can be either 1 or 0. One (1) indicates that the SMILES attribute is involved in the calculation of the descriptor of correlation weights (DCW) (Threshold) and zero (0) indicates that the SMILES attribute is not involved. Combinations of values of different attributes provide the possibility of defining various versions of SMILES based optimal descriptors [20]. CORAL software can be also used to build up a hybrid model which is calculated with SMILES-based and GRAPH-based descriptors: The graph- and SMILES-based models are mathematical functions of the threshold and the number of Nepoch of the Monte Carlo optimization. The most predictive combination of T and Nepoch values for a split of data can be found by analyzing results of the calculations for several different splits of data in the training and test sets.

2.2.2. SiRMS approach

In addition to the above mentioned approaches, the SiRMS technique [18] was also applied to calculate fragmentary 2D descriptors (fragments of the size 2–5). In the framework of SiRMS, any molecule can be represented as a system of different simplexes (fragments of fixed composition and topology). In previous studies this method provided good results for solving different “structure–activity” problems [26-29]. In the current study a 2D level of molecule representation was utilized to generate simplex fragments. During the first step, the connectivity of atoms in simplex, atom type, and bond nature were considered. For each property the range is created with four to seven intervals. In this study all atoms were divided into groups corresponding to their atomic refraction (A < 1.5 < B < 3 < C < 8 < D), partial charges (A < −0.5 < B < 0 < C < 0.5), electronegativity (A < 2.19 < B < 2.5 < C < 3 < D) and lipophilicity (A < −1 < B < −0.5 < C < −0.1 < D < 0.1 < E < 0.5 < F < 1 < G). The vertices of simplexes were marked by properties mentioned before. After the differentiation step, all molecules were divided into fragments and all possible simplexes were calculated. Finally, the number of simplexes of definite type (for example, A-B-D-G) was used as a descriptor.

3. Results and discussion

Table 1 contains the data on the best statistical quality of the models obtained by using the CORAL approach with molecular GRAPHS and molecular SMILES using their extended connectivity. In the current study models based on EC0 in the HSG and Sk, SSk in the NOSP, HALO, and PAIRS were selected as the best hybrid-based models. Statistical characteristics of the model for three splits of data obtained by the balance of correlations and by the classic scheme are reported in Table 1. These results were obtained with the threshold ranging from zero to three. How the number of epochs of the optimization influences the statistical quality of the model for the external test set was also studied. Fig. 1 shows the best model for pIC50 (Split 1, Probe 2, Threshold = 0).
Table 1

Statistical quality of models developed by the CORAL approach.

TrshdNactProbeTraining setCalibration setTest set



n t rt2 s t F t n c rc2 s c F c n v rv2 s v F v Rm2
Split 1 Balance of correlations
0931390.85060.313211130.98150.531585110.75370.497280.7369
0 93 2 39 0.8522 0.312 213 13 0.9851 0.510 728 11 0.7873 0.465 33 0.7834
0933390.84960.314209130.98730.513852110.75020.500270.7468
00.85080.3132110.98460.5187220.76370.487290.7557
1901390.84820.316207130.98870.506962110.74130.513260.7320
1902390.84600.318203130.98500.517723110.75260.505270.7408
1903390.85050.313210130.98710.516840110.70470.548210.6737
10.84820.3162070.98690.5138420.73290.522250.7155
2731390.83290.331184130.98160.580587110.72180.527230.7212
2732390.84030.324195130.97620.554450110.73090.519240.7065
2733390.82940.335180130.98390.560674110.72450.528240.6873
20.83420.3301860.98060.5655700.72570.524240.7050
Split 1 Classic scheme
0931520.86480.295320110.79900.485360.6607
0932520.86470.295319110.71370.591220.5350
0933520.86300.297315110.77890.515320.6551
00.86420.2953180.76390.530300.6169
1901520.86540.294321110.76160.535290.6273
1902520.86520.294321110.78550.510330.6356
1903520.86680.293325110.80010.531360.6684
10.86580.2943230.78240.525330.6438
2731520.85880.301304110.74840.525270.6310
2732520.85640.304298110.76550.497290.6812
2733520.85440.306293110.75640.504280.6847
20.85650.3042990.75680.508280.6656
Split 2 Balance of correlations
0291420.80230.395162110.88700.43971100.77520.400280.5824
0292420.80360.394164110.88480.42569100.76290.417260.5875
0293420.80240.395162110.88790.43871100.58860.560110.4300
00.80280.3951630.88660.434700.70890.459220.5333
1291420.80270.395163110.88570.41770100.68910.479180.5334
1292420.80020.397160110.88680.43971100.66890.496160.5318
1293420.80250.395163110.88720.44671100.72800.438210.5773
10.80180.3961620.88660.434700.69530.471180.5475
2281420.68520.49987110.88730.38971100.78870.397300.5450
2282420.68670.49788110.88700.38971100.62950.532140.3921
2283420.68380.50087110.88980.40473100.76270.409260.5067
20.68520.499870.88810.394710.72700.446230.4813
Split 2 Classic scheme
0291530.80780.383214100.72270.500210.5357
0292530.80570.385211100.77010.454270.5577
0293530.80670.384213100.56270.633100.3850
00.80670.3842130.68520.529190.4928
1291530.80810.382215100.76120.458250.5240
1292530.80750.383214100.72130.498210.5341
1293530.80780.383214100.70770.511190.5139
10.80780.3832140.73010.489220.5240
2281530.70940.470125100.76280.474260.5223
2282530.71230.468126100.74980.474240.4727
2283530.71240.468126100.68050.540170.4303
20.71140.4691260.73110.496220.4751
Split 3 Balance of correlations
0311400.77550.372131130.96460.348300100.59780.844120.5730
0312400.77620.371132130.96280.355285100.56540.896100.5378
0313400.77340.373130130.96620.357314100.58600.857110.5695
00.77500.3721310.96450.3543000.58310.866110.5601
1311400.77750.370133130.96410.361295100.58760.862110.5638
1312400.77370.373130130.96590.355312100.57920.857110.5701
1313400.77580.371132130.96030.333266100.56120.918100.5226
10.77570.3711310.96340.3502910.57600.879110.5522
2281400.60320.49458130.95150.273216100.66800.910160.4494
2282400.60390.49458130.95290.269223100.66750.910160.4503
2283400.60690.49259130.95110.264214100.67270.896160.4597
20.60470.493580.95180.2692170.66940.905160.4532
Split 3 Classic scheme
0291530.81250.338221100.62920.777140.6130
0292530.81310.338222100.61740.794130.5942
0293530.81190.339220100.64070.756140.6104
00.81250.3382210.62910.776140.6058
1291530.81250.338221100.62330.789130.6047
1292530.81300.338222100.61390.795130.5857
1293530.81200.339220100.63070.769140.6062
10.81250.3382210.62260.785130.5989
2281530.68170.441109100.77030.735270.5530
2282530.68350.440110100.76010.756250.5410
2283530.68460.439111100.74280.788230.5216
20.68330.4401100.75770.759250.5385

The values in bold are values for the best selected model.

c = calibration set; F = Fischer ratio; n = number of compounds in the set; Nact = number of SMILES attributes involved in building up a model; probe = number of runs of the Monte Carlo method calculation; r = correlation coefficient; s = root-mean-standard error; t = training set; Thrsd = threshold; v = test (validation) set.

Fig. 1

Graphical representation of the model calculated with Equation 4 (CORAL).

This model is characterized below: For a model with good external predictability, the cross-validation coefficient ( ) value should be > 0.5. In the case of the model developed here the average value of the external set for all 11 compounds is about 0.70 and as such is quite satisfactory. Equation 4 describes a satisfactory model, in view of two features: (1) the standard error for the external set is close to the training set, and (2) there are no influential outliers in either the training or the test sets, therefore all considered chemicals possess inhibitory activity. Biological activity is related to the presence of molecular fragments with different roles: some increase, some reduce, and some do not have any effect on biological activity. These fragments can be distinguished by the optimization procedure. The approach under consideration requires the correlation coefficient between descriptors to be calculated with the correlation weight (CW) and inhibitory activity. Experimental and calculated values using Equation 4 values of pIC50 are displayed in Table 2. Table 3 contains the CW for calculation with Equation 4. SAk is a symbol in SMILES notation. Subtraining (Ntrain), calibration (Ncalib) and test sets (Ntest) represent distribution of structural attributes.
Table 2

Experimental and calculated values of the activity pIC50 for 63 amino-substituted nitrogen heterocyclic ureas (CORAL).

SetSMILESDCWExpCalcExp–CalcID
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccccc169.7007.1907.331−0.1411
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)F72.2807.6407.6240.0164
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccc(cc1)C76.4957.9208.103−0.1837
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)CC77.9018.2208.263−0.0438
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)Cl77.5528.1008.223−0.1239
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)Br69.9407.4407.3580.08210
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)C(F)(F)F77.6798.0008.238−0.23811
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)O68.9897.2607.2500.01012
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1F)C72.7838.4007.6810.71913
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccc(c(c1)C)F75.5668.4007.9980.40214
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccc(c(c1)F)C74.6997.4407.899−0.45915
+n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1F)C(F)(F)F73.6567.0507.781−0.73116
+n1n(c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)C)C68.1767.9607.1580.80217
+n1n(c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1F)C)CCOC60.5065.5106.287−0.77719
+n1[nH]c2c(c1N)c(ccc2C)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C80.4058.5208.547−0.02720
+n1[nH]c2c(c1N)c(ccc2OC)c1ccc(cc1)NC(=O)Nc1cccc(c1)C76.1957.5908.069−0.47921
+n1[nH]c2c(c1N)c(ccc2F)c1ccc(cc1)NC(=O)Nc1cccc(c1)C78.2378.3008.301−0.00122
+n1[nH]c2c(c1N)c(ccc2OCCN(CC)CC)c1ccc(cc1)NC(=O)Nc1cccc(c1)C71.9207.4607.583−0.12325
+n1[nH]c2c(c1N)c(ccc2OCCN1CCCC1=O)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C72.0617.6007.5990.00127
+n1[nH]c2c(c1N)c(ccc2OCCOC)c1ccc(cc1)NC(=O)Nc1cccc(c1)C72.2377.6807.6190.06128
+n1[nH]c2c(c1N)c(ccc2CNN1CCOCC1)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C61.5836.4106.4090.00129
+n1[nH]c2c(c1N)c(ccc2OCCOC)c1ccc(cc1)NC(=O)Nc1cc(ccc1)Cl73.2937.8907.7390.15131
+n1[nH]c2c(c1N)c(ccc2OCCOC)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C)F71.3087.6807.5140.16632
+n1[nH]c2c(c1N)c(ccc2OCCN1CCOCC1)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C)F65.3167.2106.8330.37734
+n1n(c2c(c1NC(=O)C)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)C48.4454.9204.9170.00335
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)Br68.7217.1407.220−0.08037
+c1cc2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N72.9397.3307.699−0.36938
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1ccccc172.1057.6607.6040.05639
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1ccsc175.2267.9607.9590.00142
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1ccc2c(c1)OCO273.4217.7707.7540.01644
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1nn(cc1)C79.1318.4008.403−0.00346
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C#N58.2516.0306.031−0.00147
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C(=O)NC65.5867.0006.8640.13649
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C(=O)N1CCN(CC1)C59.9556.1006.224−0.12450
+c1cc2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C(F)(F)F)N73.2558.0507.7350.31554
+c1cc2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)Cl)N70.1387.3607.381−0.02155
+c1cc2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C(F)(F)F)F)N71.1487.8507.4960.35457
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C(F)(F)F)N)c1cn(nc1)C78.6808.5208.3510.16958
+c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(c(cc1)F)C(F)(F)F)N)c1cn(nc1)C~76.5738.2208.1120.10861
n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)C76.4958.5208.1030.4172
n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccccc1F70.3647.0907.407−0.3173
n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccc(cc1)F72.2807.1707.624−0.4545
n1[nH]c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1ccccc1C70.8137.0607.458−0.3986
n1[nH]c2c(c1N)c(ccc2OCCN1CCCC1)c1ccc(cc1)NC(=O)Nc1cccc(c1)C73.0147.5107.708−0.19826
n1[nH]c2c(c1N)c(ccc2CN1CCN(CC1)C)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C65.9235.9206.902−0.98230
n1[nH]c2c(c1N)c(ccc2OCCN(CC)CC)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C)F70.9917.1307.478−0.34833
c1cc2n(n1)c(c(c(n2)C)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N72.6437.4107.666−0.25640
c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C1CC174.0857.8007.829−0.02943
c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C(=O)OCC73.2387.6007.733−0.13348
c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1ccccc1)N)Br65.4355.8006.847−1.04753
c1cc2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C)F)N71.1427.1307.495−0.36556
c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1c(ccc(c1)C)F)N)c1cn(nc1)C76.5678.3008.1110.18960
#n1n(c2c(c1N)c(ccc2)c1ccc(cc1)NC(=O)Nc1cccc(c1)C)CCO57.4266.2205.9370.28318
#n1[nH]c2c(c1N)c(ccc2Br)c1ccc(cc1)NC(=O)Nc1cccc(c1)C69.6658.0007.3270.67323
#n1[nH]c2c(c1N)c(ccc2OCCN(C)C)c1ccc(cc1)NC(=O)Nc1cccc(c1)C69.1097.4207.2640.15624
#n1n(c2c(c1N(C)C)c(ccc2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)C51.5355.4905.2680.22236
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1cccs173.7707.8207.7940.02641
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)c1cnccc175.2698.0507.9640.08645
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)C)N)C(=O)NCCN(CC)CC61.3126.3006.378−0.07851
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1ccc(cc1)C)N)Br68.7216.0907.220−1.13052
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)Cl)N)c1cn(nc1)C75.5638.4007.9970.40359
#c1c(c2n(n1)c(c(cn2)c1ccc(cc1)NC(=O)Nc1cc(ccc1)F)N)c1cn(nc1)C73.2818.0007.7380.26262
#c12c(c(c(cc1)c1ccc(cc1)NC(=O)Nc1cccc(c1)C)N)nccn266.3906.7706.955−0.18563

CORAL = CORrelations And Logic; SMILES = simplified molecular input-line entry systems; DCW = descriptor of correlation weights.

Exp and Calc are experimental and calculated pIC50; “+”, “−”, and “#” are indicators for the training, calibration, and test sets, respectively.

Table 3

Correlation weights for calculation of DCW (SMILES) used in Eq. (1).

SAkCW (SAk)NtrainaNcalibaNtesta
#.−2.44150100
(...(..7.79887600
(.−0.95894391311
++++B2–B3==−1.52444100
++++F—B2==0.262811451
++++F—N===−0.160061451
++++F—O===0.207941451
++++CL–N===0.33213301
++++CL–O===1.13663301
++++Br–B2==−0.13863212
++++Br–N===−2.68450212
++++Br–O===0.03606212
++++Cl–B2==2.87200301
++++N—B2==8.05269391311
++++N—B3==−3.19050100
++++N—O===9.31931391311
++++N—S===−0.19131101
++++O—B2==8.84956391311
++++O—B3==−4.31250100
++++O—S===0.99619101
++++S—B2==−2.30469101
1...(..3.79988391311
1.−0.79406391311
2...(..1.4112529108
2.−1.97956391311
2...1..0.99500001
=...(..1.90125391311
=.0.57331391311
=...1..5.67387100
C...#..−3.43750100
C...(..0.29106391311
C.−0.42569391311
C...1..1.26863440
C...2..5.18650210
C...C..1.875001053
BOND100000009.25481381311
BOND11000000−2.22175100
F...(..−0.576131441
F.−0.287061451
F...1..4.16887310
F...2..7.23338100
EC0-C...1...2.28525291011
EC0-C...2...−0.04388391311
EC0-C...3...−2.05969391311
EC0-C...4...7.18350600
EC0-F...1...−1.512621451
EC0-Br..1...−0.00681212
EC0-Cl..1...1.99719301
EC0-N...1...15.74219381310
EC0-N...2...−1.69731391311
EC0-N...3...−3.864192199
EC0-O...1...−5.31931391311
EC0-O...2...1.611371031
EC0-s...2...1.09856101
H.1.073312272
Br..(..−0.09175212
Br−1.51944212
Br..2..1.00200001
Cl..(..−3.56550301
Cl0.43450301
N...#..−2.11037100
N...(..−0.65725391311
N.−1.94531391311
N...1..−2.283252674
N...C..−2.36919391311
N...N..−3.82512100
O...(..2.47275391311
O.−2.50881391311
O...2..3.87100821
O...=..3.97175391311
O...C..−1.998001032
NOSP110000009.06450391311
[.1.984382272
[...1..3.740192272
[...H..2.998002272
c...(..1.69150391311
c.0.98838391311
c...1..0.26863391311
c...2..0.58394391311
c...N..9.94150391311
c...[..4.056692272
c...c..−1.39544391311
n...(..1.188501769
n.−0.35256391311
n...1..6.86037391310
n...2..3.496191467
n...H..0.544882272
n...[..1.732382272
n...c..1.683691457
n...n..−1.56450100
s.−0.36238101
s...1..1.00200001
s...c..0.79387101

CW = correlation weight; DCW = descriptor of correlation weights; SMILES = simplified molecular input-line entry systems.

The Ntrain, Ncalib, and Ntest are the frequencies of SAk in the training, calibration, and test sets, respectively.

The results obtained by SiRMS are summarized in Table 4. In model 1S the fragments representing tetratomic bonded simplexes were used. In model 2S tetratomic unbound simplexes were used. In model 3S unbound fragments of the size 2–5 were used. Each model consists of nine descriptors. As seen in Table 4, all models have similar statistical characteristics. Despite this, it is necessary to consider the second model for further interpretation since the first and the third models do not distinguish structural isomers. Thus, nine significant descriptors were combined into four groups: type of atom, lipophilicity, van-der-Waals interactions, and partial charges. The relative influences (%) are presented in Fig. 2.
Table 4

Summarized statistical evaluation of each model developed by the SiRMS approach.

Model (split) Rtraining2 strainingq2scross-validation Rtest2 stest
1S0.860.310.810.370.750.47
2S0.840.330.790.390.700.50
3S0.820.350.760.420.720.49

q = LOO cross-validation coefficient; R2 = correlation coefficient; s = standard error; SiRMS = simplex representation of molecular structure.

Fig. 2

Diagram of relative influence (%) of various groups of SiRMS descriptors.

Three descriptors of atom type reflect differences among functional groups located in the same place of the molecule. The descriptor of partial charges describes differences for aromatic substitution. Lipophilicity reflects the impact of nonaromatic connectors between aromatic parts of molecules. A set of van-der-Waals-related descriptors includes four descriptors. They describe the influence of aromatic substitution, and the impact of functional groups. A plot of experimentally determined versus predicted log values is presented in Fig. 3.
Fig. 3

Plot of experimental (observed) versus predicted log values, SiRMS approach.

It can be noted that both approaches applied in this study (SMILES-based optimal descriptors and SiRMS) deliver good performance in prediction of KDR inhibitory activity by amino-substituted heterocyclic urea derivatives. As seen in Table 2 and Table 4, both approaches display similar results on average.

4. Conclusion

A structure–activity relationship analysis was performed for a set of amino-substituted nitrogen heterocyclic urea derivatives. Two approaches were applied: the SMILES-based optimal descriptors approach (CORAL) and the fragment-based SiRMS approach. In the case of the SMILES-based optimal descriptors approach, three various splits of the experimental data into subtraining set, calibration set, and test set were examined. Comparison of the classic scheme of building up the model and balance of correlation (BC) scheme show that the balance scheme is characterized by more robust predictions than the classic scheme for the pIC50 of the studied compounds. The SiRMS approach was examined for three various splits of the descriptors set. Comparison of the SMILES-based optimal descriptors and SiRMS approaches has confirmed a good performance of both approaches in prediction of KDR inhibitory activity (pIC50) of amino-substituted nitrogen heterocyclic urea derivatives. Both methods are quite fast and reliable and possess comparable statistical quality.
  19 in total

1.  A comparative QSAR study of benzamidines complement-inhibitory activity and benzene derivatives acute toxicity.

Authors:  S C Basak; B D Gute; B Lucić; S Nikolić; N Trinajstić
Journal:  Comput Chem       Date:  2000-03

2.  Use of quantitative structure-enantioselective retention relationship for the liquid chromatography chiral separation prediction of the series of pyrrolidin-2-one compounds.

Authors:  Bakhtiyor Rasulev; Malakhat Turabekova; Magdalena Gorska; Katarzyna Kulig; Anna Bielejewska; Janusz Lipkowski; Jerzy Leszczynski
Journal:  Chirality       Date:  2011-11-26       Impact factor: 2.437

3.  Simplified molecular input line entry system-based optimal descriptors: quantitative structure-activity relationship modeling mutagenicity of nitrated polycyclic aromatic hydrocarbons.

Authors:  Andrey A Toropov; Alla P Toropova; Emilio Benfenati
Journal:  Chem Biol Drug Des       Date:  2009-05       Impact factor: 2.817

4.  QSAR modeling of acute toxicity on mammals caused by aromatic compounds: the case study using oral LD50 for rats.

Authors:  Bakhtiyor Rasulev; Hrvoje Kusić; Danuta Leszczynska; Jerzy Leszczynski; Natalija Koprivanac
Journal:  J Environ Monit       Date:  2010-05

5.  Interpretation of QSAR Models Based on Random Forest Methods.

Authors:  Victor E Kuz'min; Pavel G Polishchuk; Anatoly G Artemenko; Sergey A Andronati
Journal:  Mol Inform       Date:  2011-07-12       Impact factor: 3.353

6.  Receptor- and ligand-based study of fullerene analogues: comprehensive computational approach including quantum-chemical, QSAR and molecular docking simulations.

Authors:  Lucky Ahmed; Bakhtiyor Rasulev; Malakhat Turabekova; Danuta Leszczynska; Jerzy Leszczynski
Journal:  Org Biomol Chem       Date:  2013-09-21       Impact factor: 3.876

7.  Structure-activity relationship investigations of leishmanicidal N-benzylcytisine derivatives.

Authors:  Malakhat A Turabekova; Valentina I Vinogradova; Karl A Werbovetz; Jeffrey Capers; Bakhtiyor F Rasulev; Mikhail G Levkovich; Shukhrat B Rakhimov; Nasrulla D Abdullaev
Journal:  Chem Biol Drug Des       Date:  2011-05-25       Impact factor: 2.817

Review 8.  Small molecule inhibitors of KDR (VEGFR-2) kinase: an overview of structure activity relationships.

Authors:  Stephen J Boyer
Journal:  Curr Top Med Chem       Date:  2002-09       Impact factor: 3.295

9.  Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds.

Authors:  Nikolay A Kovdienko; Pavel G Polishchuk; Eugene N Muratov; Anatoly G Artemenko; Victor E Kuz'min; Leonid Gorb; Frances Hill; Jerzy Leszczynski
Journal:  Mol Inform       Date:  2010-05-14       Impact factor: 3.353

10.  Molecular modelling and QSAR analysis of the estrogenic activity of terpenoids isolated from Ferula plants.

Authors:  B F Rasulev; A I Saidkhodzhaev; S S Nazrullaev; K S Akhmedkhodzhaeva; Z A Khushbaktova; J Leszczynski
Journal:  SAR QSAR Environ Res       Date:  2007 Oct-Dec       Impact factor: 3.000

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.