Literature DB >> 21589807

QSAR study of some 5-methyl/trifluoromethoxy- 1H-indole-2,3-dione-3-thiosemicarbazone derivatives as anti-tubercular agents.

M Shahlaei1, A Fassihi, A Nezami.   

Abstract

In the present study, quantitative relationships between molecular structure and anti-tubercular activity of some 5-methyl/trifluoromethoxy-1H-indole-2,3-dione-3-thiosemicarbazone derivatives were discovered. The detailed application of an efficient linear method and principal component regression (PCR) for the evaluation of quantitative structure activity relationships of the studied compounds is demonstrated. Components produced by principal component analysis were used as the input for a linear model development. Results indicate a linear relationship between the principal components obtained from molecular descriptors and the inhibitory activity of this set of molecules. The maximum variance in the activity of the molecules in PCR method was 73%. The performance of the developed model was tested by several validation methods.

Entities:  

Keywords:  Anti-tuberculosis activity; Principal component analysis; QSAR; Thiosemicarbazone derivaives

Year:  2009        PMID: 21589807      PMCID: PMC3093630     

Source DB:  PubMed          Journal:  Res Pharm Sci        ISSN: 1735-5362


INTRODUCTION

Tuberculosis (TB) is an infectious condition with high degree of mortality all over the world. According to the World Health Organization one-third of the world’s population was infected with Mycobacterium tuberculosis in 2006 (1). Other statistics indicate that 8 million of new cases are estimated to appear annually, 95% of which occurs in developing countries, and almost 2 million of the sufferers are believed to die from the disease (23). A large number of the infected people bear the latent form of TB which produces a possible risky source of the sickness for the future generations. The HIV epidemic has resulted in the fast increase of the TB pandemic and has enhanced the probability of death from TB. The emergence of multidrug resistant tuberculosis to the firstline drugs such as isoniazid, rifampicin, ethambutol, streptomycin and pyrazinamide has made the disease hard to cure (4–6). This obstacle in the treatment of TB and the statistical facts about its prevalence highlights the necessity of designing novel, more potent and less prone to resistance compounds with fewer side effects. The design of new anti-tubercular drugs may be accomplished using the sophisticated computational methods which are generally based on two different approaches. Designing new ligands for inhibiting a known biochemical target by noticing its structural features which seems a more ideal approach and proposing novel biologically active compounds after statistical analyses of the structures of some known drugs having the desired biological activities. The first approach is called structure-based drug design, and the other is referred to as ligand-based drug design (LBDD) (7). Quantitative structure activity relationship (QSAR) is a method of the LBDD approach. As one of the most powerful techniques for predicting the bioactivity of various molecules, QSAR starts only with the molecular structure information of the previously reported active molecules. Such a data mining approach offers important insight into the relationships between the molecular structure and biological activity of the investigated compounds by means of statistical models. Various methods have been applied to construct QSAR models including linear and nonlinear regression methods (89). Here we investigate the quantitative structure activity relationships of a series of 49 methyl/trifluoromethoxy 1H-indole-2,3-dione-3-thiosemicarbazone derivatives reported in literature as anti-tubercular compounds. Principal component regression (PCR) as a method for feature extraction was employed, and its output was used as the input of linear regression. After reaching the model it was validated using various model validation techniques.

MATERIALS AND METHODS

Data preparation

All calculations were performed in a Pentium IV personal computer (CPU at 2.6 GHz) with Windows XP operating system. All biological data used here were derived from the literature (10). General structures and the structural details of these compounds are reported in Table 1.
Table 1

General structures and structural details of 5-methyl/trifluoromethoxy-1H-indole-2,3-dione-3-thiosemicarbazone derivatives used in this study.

CompoundR1R2R3Type
1CH3CH3CH=CH2HA
2CH3C4H9HA
3CH3C6H5CH2HA
4CH34-FC6H4HA
5aCH32-BrC6H4HA
6CH33-BrC6H4HA
7CF3O4-NO2C6H4HA
8aCF3OCH3HA
9CF3OC2H5HA
10CF3OCH3CH=CH2HA
11CF3OC4H9HA
12CF3Ocycl-C6H11HA
13aCF3OC6H5CH2HA
14CF3OC6H5HA
15aCF3O4-CH3C6H4HA
16CF3O4-CH3OC6H4HA
17CF3O4-FC6H4HA
18CF3O4-ClC6H4HA
19CF3O4-BrC6H4HA
20CH32-BrC6H4CH3A
21CF3OC2H5CH3A
22CF3OCH3CH=CH2CH3A
23CF3OC4H9CH3A
24CF3Ocycl-C6H11CH3A
25CF3OC6H5CH2CH3A
26aCF3O4-CH3C6H4CH3A
27CF3O4-FC6H4CH3A
28CF3O4-ClC6H4CH3A
29CF3O4-BrC6H4CH3A
30-CH3-B
31a-C2H5-B
32-CH3CH=CH2-B
33-C4H9-B
34-cycl-C6H11-B
35a-C6H5CH2-B
36-C6H5-B
37a-4-CH3C6H4-B
38-4-CH3OC6H4-B
39-4-FC6H4-B
40-4-ClC6H4-B
41-4-BrC6H4-B
42-4-NO2C6H4-B
43CH3C2H5HA
44CH3cycl-C6H11HA
45aCH3C6H5HA
46CH34-CH3C6H4HA
47CH34-ClC6H4HA
48CH34-BrC6H4HA
49CH3C6H5CH3A

Molecules assigned by Kennard and Stone algorithm as test set

General structures and structural details of 5-methyl/trifluoromethoxy-1H-indole-2,3-dione-3-thiosemicarbazone derivatives used in this study. Molecules assigned by Kennard and Stone algorithm as test set pIC50 (log 1/IC50) is the dependent variable that characterizes the biological parameter for the developed QSAR model. The structures of molecules were drawn and optimized using Hyperchem 7.0 software (11). Semi-empirical AM1 method with Polak-Ribiere algorithm until the root mean square gradient of 0.01 kcal/mol was the optimization method. The resulted geometries were transferred into Dragon program, version 2.1 (developed by Milano Chemometrics and QSAR Group) to calculate some descriptors (12). A large number of theoretical different descriptors were calculated for each molecule. The name and number of calculated descriptors can be seen in the Table 2. These four classes of descriptors were selected to ease the calculations. All the required model development calculations were performed within the MATLAB (version 7.1, MathWorks, Inc.) environment.
Table 2

Some descriptors used in model building.

DescriptorMolecular Descriptor
ConstitutionalMolecular weight, no. of atoms, no. of non-H atoms, no. of bonds, no. of heteroatoms, no. of multiple bonds (nBM), no. of aromatic bonds, no. of functional groups (hydroxyl, amine, aldehyde, carbonyl, nitro, nitroso, etc.), no. of rings, no. of circuits, no of H-bond donors, no of H-bond acceptors, no. of Nitrogen atoms (nN), chemical composition, sum of Kier-Hall electrotopological states (Ss), mean atomic polarizability (Mp), number of rotable bonds (RBN), etc.
TopologicalMolecular size index, molecular connectivity indices (X1A, X4A, X2v, X1Av, X2Av, X3Av, X4Av), information content index (IC), Kier Shape indices, total walk count, path/walk-Randic shape indices (PW3, PW4, Zagreb indices, Schultz indices, Balaban J index (such as MSD) Wiener indices, topological charge indices, Sum of topological distances between F..F (T(F..F)), Ratio of multiple path count to path counts (PCR), Mean information content vertex degree magnitude (IVDM), Eigenvalue sum of Z weighted distance matrix (SEigZ), reciprocal hyperdetour index (Rww), Eigenvalue coefficient sum from adjacency matrix (VEA1), radial centric information index, 2D petijean shape index (PJI2), etc.
GeometricalD petijean shape index (PJI3), Gravitational index, Balaban index, Wiener index, etc
Functional groupNumber of total tertiary carbons (nCt), Number of H-bond acceptor atoms (nHAcc), number of total hydroxyl groups (nOH), number of unsubstituted aromatic C(nCaH), number of ethers (aromatic) (nRORPh), etc.
Some descriptors used in model building. In this study, the Kennard and Stone algorithm was employed for assigning the training and test sets (13). This method of data splitting has some advantages: the training set molecules map the measured region of the input variable space completely with respect to the induced metric. The other advantage is that the test molecules all fall inside the measured region. Principal component analysis (PCA) was employed to compress a pool of calculated descriptors into principal components (PCs) as new variables. As a matter of fact, PCR is a multiple linear regression method which uses the score matrix as new variables for building of the model. After the building of the model, its validation is a crucial part of any QSAR procedure. In other words, after the calculation of the regression coefficients (b) by the least square methods, the coefficients are used to predict the activity of external test set. For example, if Xc and Xt are the matrices of factors for calibration and test sets respectively, the Yc and Yt matrices of activity for calibration and test sets can be obtained using the following equations: Yc = b Xc Yt = b Xt Matrix of b can be calculated. The statistical qualities of the generated QSAR models were evaluated using methods such as leave one out cross validation and parameters like predicted residual sum of squares (PRESS) and the root mean square error (RMSE). Leave one out cross validation is a wellknown and accepted method applied to discover the reliability of the generated QSAR models. Based on this method, a number of modified data sets (equals to the number of the studied molecules) are generated by removing one of the molecules in each case. For each new data set, a model is generated using the modeling procedure applied in the study. Each model is examined through the evaluation of its power in predicting the bioactivity of deleted molecule. This process is repeated until a total set of predicted bioactivity for all of the investigated molecules is achieved. The predictive ability will be evaluated by the cross validation coefficient (R) calculated using the following equation: Root mean square error of cross validation (RMSECV) for the developed models is reported, as well Some criteria for the prediction of the model are suggested by Tropsha. If these criteria are satisfied, it can then be concluded that the model is predictive (14). These criteria include: R2 >0.5 R2>0.6 0.85 where, R is the correlation coefficient of regression between the predicted and observed activities of compounds in training and test set. R2o is the correlation coefficients for regressions between predicted versus observed activities through the origin, R’2o is the correlation coefficient for the regressions between observed versus predicted activities through the origin, and the slopes of the regression lines through the origin are assigned by k and k’, respectively. Details of the definitions of parameters such as R2o, R’2o, k and k’ are presented in the literature (14). In addition, according to Roy and Roy(15), it is necessary to study the differences between the values of R2o and R’2o. They suggested the following modified R2 form: If R2m value for the given model is >0.5, it indicates good external predictability of the developed model.

RESULTS

After deleting zero variance columns of X block, PCA was carried out on the pool of all descriptors. As evident in Table 3, among the generated PCs, only 9 eigenvalue ranked PCs were selected for the next model building. Table 3 demonstrates that PCA gives 9 significant PCs (% variance explained >1) which can explain more than 95.17% of the variances in the original descriptors data matrix. Nine PCs with their eigenvalues are shown in the Table 3. In this Table, the eigenvalues, the percent of variances explained by each eigenvalue, and the cumulative percent of variances are represented. Therefore, the subsequent studies were restricted to these 9 PCs, and the selection of their best subset to perform the linear regression method. Plotting the first PC vs. the second one showed that none of the compounds used in this study were outlier, although 5 clusters existed within the data set (Fig. 1).
Table 3

Eigenvslues of calculated PCs, % of explained variances and cumulative variances.

PC No.Eigenvalue% variance explainedcumulative variance
1195.060.7060.70
239.4612.2672.96
327.928.67281.63
412.283.81485.45
58.7552.71988.16
68.2812.57290.73
75.8791.82692.56
84.5081.40093.96
93.8871.20795.17
Fig. 1

The first two components (PC1, and PC2) from the principal component analysis of the 49 considered molecules.

Eigenvslues of calculated PCs, % of explained variances and cumulative variances. The first two components (PC1, and PC2) from the principal component analysis of the 49 considered molecules. When factor scores were used as the predictor parameters in a multiple regression equation using stepwise selection method of PCR, the following equation was obtained: pIC50 = 14.021 (± 2.143) + 12.210 (± 3.216) f1+ 1.453 (± 0.320) f3 R2 = 0.83, S.E. = 0.20, F = 20.05 where, F is F of ANOVA and S.E. is standard error in the resulted model. The above equation shows high equation statistics (81% explained variance in pIC50 data). Since factor scores were used instead of selected descriptors, and each factor-score contains information from different descriptors, loss of information is avoided, and the quality of PCR equation is better than similar equations such as those derived from MLR. The cross-validation method was used to evaluate the robustness of the proposed model. In leave one out cross validation method, one object at a time was eliminated, and then PCR was performed on the remaining of training set. The activity of the left-out object was predicted using this regression model. This procedure was repeated until each compound in the calibration set was left out once. The optimum number of factors was selected with respect to the quantities of RMSECV, the root mean square error of calibration, and 2 PCs were selected as the optimum number of PCs. For the evaluation of the predictive power of the generated PCR, the optimized model was applied for the prediction of pIC50 values of all compounds in the calibration and prediction set. The calculated pIC50 for each molecule and relative error of prediction by model are summarized in Table 4. Very small values of relative errors confirm the accuracy of the proposed PCR model for modeling the anti-tubercular activity of the studied compounds.
Table 4

Calculated activities by PCR and their relative error of prediction (REP).

CompoundExperimental activityPredicted activityREP
13.7433.8530.029
25.3795.124-0.050
33.8273.554-0.077
44.8664.9830.024
54.5384.6670.028
64.9414.776-0.035
74.8824.669-0.046
85.1155.5550.079
95.1975.5440.063
104.2624.5540.064
113.4783.8540.097
124.9554.459-0.111
133.8473.454-0.114
144.4594.8360.078
154.4764.433-0.010
164.8754.433-0.100
174.9004.654-0.053
184.8734.739-0.028
194.4244.7630.071
204.2604.6540.085
214.2234.5300.068
224.5644.6740.024
233.3903.5570.047
243.7043.640-0.018
254.1374.7180.123
263.1783.4330.074
275.0635.1530.018
284.7684.454-0.071
295.2135.4600.045
304.8054.406-0.091
314.6794.455-0.050
324.9994.652-0.074
333.9093.763-0.039
344.7414.630-0.024
355.0004.775-0.047
364.3154.6340.069
374.8624.394-0.107
384.6194.451-0.038
394.6334.123-0.124
404.6084.6200.003
414.5934.535-0.013
424.1754.2850.026
434.1574.4360.063
445.1525.4390.053
454.9684.529-0.097
464.8474.453-0.089
474.5984.440-0.036
483.6453.439-0.060
493.3323.237-0.029
Calculated activities by PCR and their relative error of prediction (REP). Experimental versus predicted values for pIC50 values of training and test sets, obtained by the PCR modeling, are shown graphically in Fig. 2.
Fig. 2

Plot of calculated vs. experimental activity of investigated compounds in training and test sets.

Plot of calculated vs. experimental activity of investigated compounds in training and test sets. Results of various statistical criteria and figures of merit for this model for two subsets of molecules i.e. training and test sets are reported in Table 5. The external predictability of a proposed model is generally tested using test sets and R. The satisfactory prediction of the values of the inhibitory activity of test set compounds demonstrates the efficacy of the QSAR in predicting the activities of external molecules. Moreover, the low values of RMSE and PRESS for training, and test sets also add to the statistical significance of the developed models. Besides, on the basis of criteria recommended by Tropsha and also R by Roy, the obtained model is very predictive (Table 5).
Table 5

Statistical parameters obtained for the developed model for anti-tubrecular activity of the investigated compounds.

ParameterTraining setTest set
N409
R20.7350.762
RMSE0.2840.324
PRESS3.2740.946
R2LOOCV0.792
RMSELOOCV0.207
R2L5O.CV0.746
RMSEL5OCV0.255
R2-R02/R2-0.008-0.041
R2-R’02/R2-0.009-0.041
k1.0230.987
k’1.0120.99
Rm20.8000.711
R2 adjusted0.713

N: Number of objects in data set, R2: Correlation coefficient of experimental and predicted activities, RMSE: Root mean square error:, PRESS: Predicted error sum of square: , R2cv: Correlation coefficient of leave one out cross validation, RMSEcv: Root mean square error of cross validation

Statistical parameters obtained for the developed model for anti-tubrecular activity of the investigated compounds. N: Number of objects in data set, R2: Correlation coefficient of experimental and predicted activities, RMSE: Root mean square error:, PRESS: Predicted error sum of square: , R2cv: Correlation coefficient of leave one out cross validation, RMSEcv: Root mean square error of cross validation

DISCUSSION

To obtain the effects of the structural features of the studied derivatives on their anti-tubercular activity, QSAR model development was performed with various calculated molecular descriptors (8). Because of the large number of calculated descriptors, PCA was employed to solve the collinearity problem in the generated descriptors, and the PCs were used as new variables for model building. In the PC analysis at first a data pre-processing step must be performed on the descriptors calculated by dragon using autoscaling. Suppose Xi,j is the column mean-centered and scaled matrix of descriptors for i samples and j descriptors, and yi,1 the matrix of the activity (pIC50). After the generation of principal components, using matrix Xi,j, the new matrix containing scores of PCs is created. Then these scores are used as new variables for regression. Scores as new variables possess two interesting properties (8): (i) They are sorted as the information content (variance) explaining decreases from the first PC to the last one of the PCs. As a result, the last PCs can be deleted, since they don’t have useful information. (ii) PCs are orthogonal, so the correlation problem that exists in the pool of descriptors calculated in this study is solved. After the calculation of PCs, these factors are used as new variables in the building of the model. In order to evaluate the final developed model, the existing data set was divided into training and external prediction (test) sets. Almost 20% of the molecules (9 out of 49) were selected as external test set molecules. The training set plays an important role in developing the properties of the model. The best situation in this stage of the model building is splitting of the data set to guarantee that both training set and test set individually cover the total space occupied by the original data set. The possibility of overfitting of the developed model is increased by the selection of more similar molecules as training set. Hence, ideal splitting of data set can be performed in such a way that each object in the test set is close to at least one of the objects in the training set. One of the best methods for data splitting is using Kennard and Stone algorithm. After dividing the molecules into two parts, training and test sets, based on Kennard and Stone algorithm, building of the regression models using the calibration set was performed. In order to get the linear relationship with independent variables, logarithms of the inverse of the biological activity (log 1/IC50) data of 49 molecules were used. An exact consideration of different statistical parameters indicated that the developed QSAR model could explain and predict 73% and76% of the variances in the pIC50 in training and test sets data, respectively. It was observed that, the plot of data resulted by PCR represents the lowest scattering, with no systematic error As shown in Table 5, R2 that is an indicative of the goodness of the fitting of the proposed model, was obtained for three sets, and the high value of this parameter indicates a good fitting between the PCs and the predicted values of anti-tubercular activities of the investigated compounds by developed PCR model. This shows the high predictability of the proposed model.

CONCLUSION

Quantitative relationships between the molecular structure and the inhibitory activity of the series of some 5-methyl/trifluoro-methoxy-1H - indole- 2,3- dione- 3-thiosemicarbazone derivatives as anti-tubercular agents were discovered by the collection of the calculated descriptors including topological, geometrical, constitutional, and functional group. As a result, it was found that correctly opted and designed PCR model could practically represent dependence of the 5-methyl/trifluo-romethoxy- 1H- indole-2,3- dione- 3-thio-semicarbazone derivatives as anti-tubercular compounds to the extracted PCs from various geometrical, topological, and other calculated descriptors. The optimized principal regression method could simulate the linear relationship between pIC50 value and the PCs.
  8 in total

Review 1.  Prospects for new antitubercular drugs.

Authors:  Ken Duncan; Clifton E Barry
Journal:  Curr Opin Microbiol       Date:  2004-10       Impact factor: 7.934

2.  Validated QSAR analysis of some diaryl substituted pyrazoles as CCR2 inhibitors by various linear and nonlinear multivariate chemometrics methods.

Authors:  Elham Arkan; Mohsen Shahlaei; Alireza Pourhossein; Kambiz Fakhri; Afshin Fassihi
Journal:  Eur J Med Chem       Date:  2010-04-28       Impact factor: 6.514

3.  Synthesis and in vitro and in vivo antimycobacterial activity of isonicotinoyl hydrazones.

Authors:  Dharmarajan Sriram; Perumal Yogeeswari; Kasinathan Madhu
Journal:  Bioorg Med Chem Lett       Date:  2005-10-15       Impact factor: 2.823

4.  Synthesis and antituberculosis activity of 5-methyl/trifluoromethoxy-1H-indole-2,3-dione 3-thiosemicarbazone derivatives.

Authors:  Ozlen Güzel; Nilgün Karali; Aydin Salman
Journal:  Bioorg Med Chem       Date:  2008-08-27       Impact factor: 3.641

5.  Application of PC-ANN and PC-LS-SVM in QSAR of CCR1 antagonist compounds: a comparative study.

Authors:  Mohsen Shahlaei; Afshin Fassihi; Lotfollah Saghaie
Journal:  Eur J Med Chem       Date:  2010-01-28       Impact factor: 6.514

Review 6.  Tuberculosis.

Authors:  Thomas R Frieden; Timothy R Sterling; Sonal S Munsiff; Catherine J Watt; Christopher Dye
Journal:  Lancet       Date:  2003-09-13       Impact factor: 79.321

7.  The neglected global tuberculosis problem: a report of the 1992 World Congress on Tuberculosis.

Authors:  D E Snider; J R La Montagne
Journal:  J Infect Dis       Date:  1994-06       Impact factor: 5.226

Review 8.  New drug candidates and therapeutic targets for tuberculosis therapy.

Authors:  Ying Zhang; Katrin Post-Martens; Steven Denkin
Journal:  Drug Discov Today       Date:  2006-01       Impact factor: 7.851

  8 in total
  7 in total

1.  Discovery of novel isatin-based thiosemicarbazones: synthesis, antibacterial, antifungal, and antimycobacterial screening.

Authors:  Maryam Hassan; Ramtin Ghaffari; Soroush Sardari; Yekta Farmahini Farahani; Shohreh Mohebbi
Journal:  Res Pharm Sci       Date:  2020-07-03

2.  A modeling study of aldehyde inhibitors of human cathepsin K using partial least squares method.

Authors:  M Shahlaei; A Fassihi; L Saghaie; E Arkan; A Pourhossein
Journal:  Res Pharm Sci       Date:  2011-07

3.  Prediction of partition coefficient of some 3-hydroxy pyridine-4-one derivatives using combined partial least square regression and genetic algorithm.

Authors:  M Shahlaei; A Fassihi; L Saghaie; A Zare
Journal:  Res Pharm Sci       Date:  2014 Mar-Apr

4.  Combined Unfolded Principal Component Analysis and Artificial Neural Network for Determination of Ibuprofen in Human Serum by Three-Dimensional Excitation-Emission Matrix Fluorescence Spectroscopy.

Authors:  Gholamreza Bahrami; Hamid Nabiyar; Komail Sadrjavadi; Mohsen Shahlaei
Journal:  Iran J Pharm Res       Date:  2018       Impact factor: 1.696

5.  Exploration of Photophysical and Nonlinear Properties of Salicylaldehyde-Based Functionalized Materials: A Facile Synthetic and DFT Approach.

Authors:  Muhammad Imran; Muhammad Khalid; Rifat Jawaria; Asif Ali; Muhammad Adnan Asghar; Zahid Shafiq; Mohammed A Assiri; Hafiza Munazza Lodhi; Ataualpa Albert Carmo Braga
Journal:  ACS Omega       Date:  2021-12-03

6.  Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods.

Authors:  Mehdi Ahmadi; Mohsen Shahlaei
Journal:  Res Pharm Sci       Date:  2015 Jul-Aug

7.  Prediction of p38 map kinase inhibitory activity of 3, 4-dihydropyrido [3, 2-d] pyrimidone derivatives using an expert system based on principal component analysis and least square support vector machine.

Authors:  M Shahlaei; L Saghaie
Journal:  Res Pharm Sci       Date:  2014 Nov-Dec
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.