Literature DB >> 35552494

QSAR study of phenolic compounds and their anti-DPPH radical activity by discriminant analysis.

Ang Lu1, Shi-Meng Yuan2, Huai Xiao2,3, Da-Song Yang2, Zhi-Qiong Ai1, Qi-Yan Li4,3, Yu Zhao2,3, Zhuang-Zhi Chen5, Xiu-Mei Wu6,7.   

Abstract

Phenolic compounds (PCs) could be applied to reduce reactive oxygen species (ROS) levels, and are used to prevent and treat diseases related to oxidative stress. QSAR study was applied to elucidate the relationship between the molecular descriptors and physicochemical properties of polyphenol analogues and their DPPH radical scavenging capability, to guide the design and discovery of highly-potent antioxidant substances more efficiently. PubMed database was used to collect 99 PCs with antioxidant activity, whereas, 105 negative PCs were found in ChEMBL database; their molecular descriptors were generated with Python's Rdkit package. While the molecular descriptors significantly related to the antioxidant activity of PCs were filtered by t-test. The prediction QSAR model was then established by discriminant analysis, and the obtained model was verified by the back-substitution and Leave-One-Out cross-validation methods along with heat map. It was revealed that the anti-DPPH radical activity of PCs was correlated with the drug-likeness and molecular fingerprints, physicochemical, topological, constitutional and electronic property. The established QSAR model could explicitly predict the antioxidant activity of polyphenols, thus were applicable to evaluate the potential of candidates as antioxidants.
© 2022. The Author(s).

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35552494      PMCID: PMC9098848          DOI: 10.1038/s41598-022-11925-y

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


Introduction

Oxidative stress (OS) is the imbalance of redox reactions, which leads to the production of reactive oxygen species (ROS)[1]. The accumulation of ROS in the body or cells causes cytotoxic reaction, thus inducing a variety of pathological injuries, such as cardiovascular diseases, diabetes, tumors and other chronic diseases[2-6]. Therefore, it is feasible to initiate disease treatment with antioxidants to reverse this imbalance by reducing ROS level[7]. In this context, various measures to increase the use of antioxidants have been employed, and source from nature has become an important source of research and development of green and safe antioxidants[5]. In recent years, a number of nature antioxidant substances, including crude extracts, bioactive partitions, chemical entities, have been discovered from insects[8-12]. As a continuous study of medicinal insects, five new phenolic compounds (PCs) with DPPH radical scavenging activity were isolated from the medicinal insect Blaps rynchopetera (Fairmaire), a local Chinese medicine, and their antioxidant activities were equivalent to vitamin C[13]. However, difficulties in separation and enrichment of above-mentioned PCs impeded further development of these lead compounds[14]. The design and synthesis of analogues of PCs from B. rynchopetera, would be a feasible alternative to more efficiently and economically prepare antioxidants. Quantitative structure–activity relationship (QSAR) study is a powerful in-silico method in terms of design and discovery of bioactive compounds[15]. The presented work herein aimed to establish predictive QSAR model of PCs and their antioxidant activity with their 2D-structures and physicochemical properties as potential predictors. Multiple linear regression and discriminant analysis are mature multivariate statistical methods with reported application in the establishment of QSAR models[16-19]. Multiple linear regression requires assumptions of homoscedasticity and independent errors, which would likely be violated by data derived from different investigations. Although discriminant analysis could not provide the concrete predicted values of antioxidant effect, it could determine the probability of compounds’ classification without the requirements of modeling assumptions as multiple linear regression does[20,21]. Thus, compounds from different sources could be incorporated together to fit QSAR model. Therefore, discriminant analysis was adopted to establish predictive models of PCs’ molecular characteristics and their antioxidant activity for further molecular design for the discovery and development of efficient antioxidants.

Data preparation and methods

The strategy of modeling and the selection of molecular descriptors were shown in Scheme 1: (1) Literature retrieval and data collection; (2) Filtering and generating molecular descriptors; (3) Model establishment and fitting evaluation.
Scheme 1

The red area is the process of data mining, yellow is the process of filtering molecular descriptor, and blue is the process of model fitting evaluation.

The red area is the process of data mining, yellow is the process of filtering molecular descriptor, and blue is the process of model fitting evaluation.

Collection and preparation of compounds

DPPH (Compound 1) radical scavenging assay is a valid and widely-accepted method for evaluating molecular antioxidant activities. The antioxidant activity was calculated by the rate of DPPH scavenging as Formula (1), and the 50% inhibitory concentration (IC50) or 50% effective concentration (EC50) values were regarded as indicator for molecular antioxidant activity. where A0 and A are the absorbances in the absence and in the presence of antioxidant, respectively. The absorbance reads for substance could be varied experiment by experiment. Compounds with values of IC50/EC50 greater than 300 μM are hardly worthy for further development, therefore they would not be treated as lead compounds. Thus, 300 μM was applied as the cut point to differentiate positive and negative samples. Phenolic compounds (PCs) were derivatives with hydroxy-containing substitutions on aromatic ring of phenol (Compound 2). The key words of DPPH, phenolic compounds, and IC50/EC50 values were applied for literature search in the PubMed database. Meanwhile, phenol was used as the keyword to search for negative samples in ChEMBL database. The overall quality control exclusive criteria for positive samples included the followings: (1) a clear positive control could not be found in the original literature; (2) the IC50/EC50 values of the same positive control substances from different articles were obviously inconsistent. The exclusive criteria for negative samples were: (1) the IC50/EC50 values were less than 300 μM in the literature; (2) compound’s DPPH scavenging activity was described though without generalizable IC50/EC50 values.

Determination of molecular descriptors

The 200 molecular descriptors of well-defined, easy-interpretable, and most-often used in QSAR were generated by applying Python's Rdkit package[22]. These molecular descriptors mainly fall into the following categories namely: drug-likeness and molecular fingerprints, physicochemical, topological, constitutional and electronic property[23]. The semi-constant (more than 80% data with the same value) descriptors were excluded to eliminate redundant information and increase power-of-test, consequently, a total of 122 descriptors were retained for later QSAR modeling.

QSAR model development

The performance of the obtained models mainly depends on their modeling descriptors. The model developed with improper descriptors would lead to either over-fitting or predict weakly and would be useless[23]. The t-test was then used to initially filter molecular descriptors that were less relevant with antioxidant activity. The primary objective of discriminant analysis was to build QSAR models that could predict the antioxidant activity of unknown compounds more reliably and precisely. These two statistical analysis methods were performed in SPSS 25.0, and the significance level was set as 0.05.

Evaluation of model fitting

Two validation techniques along with heat map were applied to assess the accuracy and robustness of the established QSAR model. The following analyses were carried out with SPSS 25.0 and Microsoft® Excel® 2019. Back-substitution method By comparing the predicted classification of the discriminant function and the actual classification, the correct discriminant proportion of the classification function was calculated. Jackknife (Leave-One-Out cross-validation) (i) Individual sample was sequentially treated as predicative subset, and established the discriminant function with the remaining N-1 samples as training subset; (ii) Judging the correctness of predictive classification of predicative subset; (iii) Repeat the above two steps N times; (iv) Generating the correct classification proportion. The error discriminant proportion (Formula (2)) of 10% or 20% was most often used as the standard to evaluate the established model, that is, those error discriminant proportion in validation check less than the standard suggestion satisfactory established model[20]. Furthermore, Kappa consistency coefficients (Formula (3)) were employed to evaluate the stability of the model. where, PN is the number of predicted negative samples, and AP is the number of actual positive samples. where, P0 is observed coincidence rate, and Pe is chance coincidence rate. (3) The performance of established discriminant models on the randomly selected samples consisted each of 10 positive and 10 negative samples respectively were visually presented by heat map.

Results

Structural molecular data

In total 140 PCs with unambiguous DPPH radical scavenging activity were collected from PubMed database, after sifting through chemicals and rearranging duplicated labels for compounds with the same structure, 99 PCs were eventually included. Meanwhile, keeping phenol as the search keyword, negative samples in ChEMBL database, with a total of 8721 compounds were presented. To make the sample size of negative subset comparable to positive one, after sifting through the first 1200 pieces of data through exclusion criteria, 105 PCs were finally selected as negative subset (Scheme 1). The chemical structures, IC50 values and literature source of 99 positive samples, whereas the chemical structures of 105 negative PCs were available in the supplemental document.

Initially filtered arguments

The t-test was performed to opt 87 molecular descriptors significantly related to the antioxidant activity of PCs (P < 0.05) (Table 1). Among them, 4 molecular descriptors belonged to drug-likeness and molecular fingerprints, whereas 35, 5, 13 and 30 molecular descriptors belonged to physicochemical property, electronic property, topological property, and constitutional property, respectively.
Table 1

Results of filtering molecular descriptors by t-test.

DescriptorstTypeaDescriptorstTypeaDescriptorstTypea
qed− 10.570DFPEOE_VSA144.316PHeavyAtomCount4.415C
MinAbsPartialCharge8.807EPEOE_VSA26.152PNHOHCount6.640C
FpDensityMorgan2− 4.704DFPEOE_VSA32.684PNOCount6.429C
FpDensityMorgan3− 4.755DFPEOE_VSA6− 6.030PNumAromaticCarbocycles4.053C
FpDensityMorgan1− 4.376DFPEOE_VSA82.058PNumAromaticRings3.794C
HeavyAtomMolWt4.695PPEOE_VSA92.629PNumHAcceptors6.024C
MaxAbsEStateIndex6.917TSMR_VSA17.549PNumHDonors7.201C
MaxAbsPartialCharge− 7.485ESMR_VSA105.525PNumHeteroatoms6.315C
MaxEStateIndex6.917TSMR_VSA3− 4.382PNumSaturatedHeterocycles− 2.706C
MinPartialCharge7.485ESMR_VSA4− 2.525PNumSaturatedRings− 2.230C
ExactMolWt4.331PSMR_VSA5− 2.577PMolMR2.875P
MolWt4.330PSMR_VSA6− 5.399Pfr_Al_COO3.677C
NumValenceElectrons3.988ESMR_VSA99.010Pfr_Ar_OH10.104C
MinEStateIndex− 7.975TSlogP_VSA118.878Pfr_COO3.849C
MinAbsEStateIndex− 10.317TSlogP_VSA22.459Pfr_COO23.849C
MaxPartialCharge8.807ESlogP_VSA32.541Pfr_C_O8.317C
BertzCT6.238CSlogP_VSA5− 2.360Pfr_C_O_noCOO7.579C
Chi04.749PSlogP_VSA84.451Pfr_NH0− 5.262C
Chi0n2.881PTPSA7.602Pfr_Ndealkylation2− 4.482C
Chi0v2.792PEState_VSA13.270Tfr_aldehyde3.849C
Chi14.148PEState_VSA106.841Tfr_allylic_oxid4.613C
HallKierAlpha− 9.187PEState_VSA212.192Tfr_aryl_methyl− 2.603C
Kappa13.852PEState_VSA33.701Tfr_benzene4.053C
Kappa22.233PEState_VSA4− 5.536Tfr_bicyclic2.508C
Kappa32.431PEState_VSA7− 2.701Tfr_ester2.899C
LabuteASA3.827PEState_VSA8− 6.401Tfr_ether2.749C
PEOE_VSA15.858PEState_VSA94.202Tfr_phenol10.104C
PEOE_VSA119.775PVSA_EState96.962Tfr_phenol_noOrthoHbond9.761C
PEOE_VSA123.818PFractionCSP3− 8.698Cfr_piperdine− 3.833C

All the P values were less than 0.05.

aThe types of molecular descriptors: “DF” denotes drug-likeness and molecular fingerprints “P” denotes physicochemical property, “T” denotes topological property, “C” denotes constitutional property, “E” denotes electronic property.

Results of filtering molecular descriptors by t-test. All the P values were less than 0.05. aThe types of molecular descriptors: “DF” denotes drug-likeness and molecular fingerprints “P” denotes physicochemical property, “T” denotes topological property, “C” denotes constitutional property, “E” denotes electronic property.

Establishment of QSAR model

Discriminant analysis was applied to establish QSAR model by incorporating above-mentioned 87 molecular descriptors. Estate_VSA2 and FractionCSP3 were first two included in Model 1 and 2 (Step 1 and 2 in stepwise discriminant analysis), but removed from final model (Final model was obtained in 20th step.). The 16 descriptors included in the final model (P < 0.001) were listed in Table 2 with the sequency of inclusion of model fitting.
Table 2

Results of discriminant analysis.

Independent VariableMolecular descriptorsLambdaPCoefficients of positive samplesCoefficients of negative samples
X1qed0.408 < 0.00147.69162.455
X2MinAbsPartialCharge0.337 < 0.00119.218− 52.885
X3FpDensityMorgan20.177 < 0.001− 67.637− 112.116
X4FpDensityMorgan30.172 < 0.00114.34530.362
X5FpDensityMorgan10.239 < 0.001151.179200.918
X6MinAbsEStateIndex0.337 < 0.00123.98743.389
X7Kappa20.303 < 0.0010.3532.401
X8PEOE_VSA60.229 < 0.0010.4160.820
X9SMR_VSA40.258 < 0.001− 0.458− 0.078
X10SlogP_VSA50.182 < 0.001− 0.481− 0.614
X11VSA_EState90.196 < 0.0012.3531.871
X12NOCount0.211 < 0.001− 9.660− 5.499
X13fr_C_O_noCOO0.248 < 0.001− 10.962− 14.661
X14fr_allylic_oxid0.380 < 0.001− 2.070− 3.966
X15fr_aryl_methyl0.167 < 0.0012.7193.980
X16fr_ester0.275 < 0.001− 7.376-0.809
Results of discriminant analysis. Combined Tables 1 and 2, it could be found that in the final QSAR model, the molecular descriptor of drug-likeness and molecular fingerprints included qed (X1), FpDensityMorgan2 (X3), FpDensityMorgan3 (X4) and FpDensityMorgan1 (X5); those of physicochemical property contained Kappa2 (X7), PEOE_VSA6 (X8), SMR_VSA4 (X9) and SlogP_VSA5 (X10); those of topological property comprised of MinAbsEStateIndex (X6) and VSA_EState9 (X11); those of constitutional property consisted of NOCount (X12), fr_C_O_noCOO (X13), fr_allylic_oxid (X14), fr_aryl_methyl (X15) and fr_ester (X16); and the electronic property was represented by MinAbsPartialCharge (X2). The classification discriminant functions (DF1, DF2) were therefore generated based on estimation of corresponding β values (Table 2). where, y = 1 means the belongingness of positive subset, y = 2 means the belongingness of negative subset. In addition, Fig. 1 illustrated the correlation between molecular descriptors and DPPH radical scavenging activity of phenolic compounds. It was revealed that the antioxidant activity of PCs against DPPH was related to all types of molecular descriptors including drug-likeness and molecular fingerprints, physicochemical, topological, constitutional and electronic property.
Figure 1

Relationship between antioxidant activity and molecular descriptors of PCs.

Relationship between antioxidant activity and molecular descriptors of PCs. As shown in Table 3, the classification results of back-substitution method were as following: 99 cases were positive samples, and 96 cases were determined as positive subset, with the correct discrimination proportion of 96.97%; As for the negative subset, all of the 105 chemical identities were determined negative, with the correct discrimination proportion of 100.00%. Furthermore, the Kappa consistency coefficient of 0.971 suggested high consistence between the predictive classification and actual classification. Jackknife cross-validation was employed to further evaluate the stability of the discriminant functions. The correct discriminant proportions were 95.96% and 99.05% for positive and negative samples, respectively (Table 4). The Kappa consistency coefficient calculated based on jackknife results was 0.951.
Table 3

Classification results of back-substitution method.

Predicted group membershipActual classificationTotal
Positive samples (%)Negative sample (%)
Positive samples96 (96.97%)0 (0%)96
Negative sample3 (3.03%)105 (100.00%)108
Total99105204

.McNemar’s test: P = 0.250.

Table 4

Classification results of jackknife cross-validation method.

Predicted group membershipActual classificationTotal
Positive samples (%)Negative sample (%)
Positive samples95 (95.96%)1 (0.95%)96
Negative sample4 (4.04%)104 (99.05%)108
Total99105204

McNemar’s test: P = 0.375.

Classification results of back-substitution method. .McNemar’s test: P = 0.250. Classification results of jackknife cross-validation method. McNemar’s test: P = 0.375. The results of both validation methods supported the accuracy and robustness of the obtained QSAR model. The heat map further visually validated the established QSAR model. As shown in Fig. 2, the color of the first and third molecular descriptors of the positive samples basically tend to be green, while the color of the negative samples tend to be red. The color of the two kinds of samples was opposite to the former on the fourth and fifth kinds of descriptors. Whereas the second kind of descriptor could distinguish the two kinds of samples effectively.
Figure 2

Heat map of the molecular descriptors matrix.

Heat map of the molecular descriptors matrix.

Discussion

Positive samples were composed of PCs with definite DPPH radical scavenging activity, so it was easier to collect chemical structures with specific IC50/EC50 values in PubMed database. However, it was not applicable to search PCs without anti-DPPH radical activity. Therefore, the strategy of data mining to establish training sets was designed as the positive samples searched in PubMed database and negative samples collected in ChEMBL database. In addition to that the assumption of homoscedasticity and uncorrelated residuals might be violated in positive subset, the negative subset could not be included in modeling fitting of multiple linear regression due to the lack of concrete IC50/EC50 values. Therefore, stepwise linear discriminant analysis was employed to establish more robust QSAR model. In the point of view of drug discovery and development, the potential of molecular druggability demonstrated by positive bioactivity is more important than predicting their IC50/EC50 values. According to the molecular descriptors in the final model, each single type of chemical descriptors was indispensable to predict the DPPH scavenging activity of PCs, since at least one descriptor was included for corresponding type. The application of bond dissociation energy (BDE) on the antioxidant of compounds based on Gaussian methods was extensively reported[25-27]. However, inconsistent results might be found for the same compound when using different algorithm, which impeded the generalization of BDE in terms of the molecular antioxidant prediction. In addition, we had attempted to establish QSAR models with the physicochemical properties and/or electronic properties of studied compounds as independent variables, but failed in obtaining a robust and reliable model. The success of drug development not only depend on molecular bioactivity. The toxicity and pharmacokinetics of the hits also play important role and should be considered[24]. The drug-likeness and molecular fingerprints could well reflect certain similarities of drug in constitutional and physicochemical properties of related to compounds’ absorption, distribution, metabolism, excretion or toxicity (ADMET) properties. Compared to other bioactivity prediction model with single independent variable like BDE and our previous modeling experiments, it could be reasonably projected that establishment of the discriminant functions was based on the RDkit integrated molecular descriptors and possessed more accurate and robust prediction since the valuation of molecular bioactivities have been fully explained by the 5 types of chemical descriptors. Furthermore, the QSAR model established based on the discriminant function could predict the antioxidant activity of compounds relatively effectively, but it could not be judged chemical mechanism of antioxidant of individual compound. However, since the included descriptors contain 5 aspects of indicators, therefore, it could be reasonably assumed that the phenolic compounds with positive antioxidant activity display their bioactivity based on various mechanisms. The error discriminant proportions found in both back-substitution method (0% and 3.03%, Table 3) and Leave-One-Out cross-validation method (4.04% and 0.95%, Table 4) were all less than 10%, hence the goodness of fit of the obtained QSAR model was satisfactory. According to the results shown in Table 3, the validity of established QSAR model was outstanding with Youden index close to 1 (Sensitivity = 0.9697, Specificity = 1). Heat map, a vivid visualization method, was employed to qualitatively and objectively reflect the correlation between inclusive molecular descriptors and corresponding chemicals. Figure 2 illustrated the adequateness of the established QSAR model by manifestly different color patterns between the positive and negative samples. The presented discriminant equations could provide in silico screening for the DPPH scavenging activity of PCs with new structures prior to in vitro and in vivo bioassays, in turn to form more economic and efficient drug development protocol. Inevitably, the study results confined the application to DPPH scavenging activity. To explore molecular antioxidant activities of other types such as superoxide anion radical scavenging activity and ferric ion reducing antioxidant power, it calls for additional investigations based on the same strategy as presented.

Conclusions

The DPPH radical scavenging activity of PCs was related with 5 types of chemical descriptors namely drug-likeness and molecular fingerprints, physicochemical, topological, constitutional and electronic property. The reported model could serve as templates of QSAR for various parent nuclear compounds with different bioactivities. Supplementary Information.
  26 in total

1.  QSAR modeling of carcinogenic risk using discriminant analysis and topological molecular descriptors.

Authors:  Joseph F Contrera; Philip Maclaughlin; Lowell H Hall; Lemont B Kier
Journal:  Curr Drug Discov Technol       Date:  2005-06

2.  Honeybee (Apis cerana) vitellogenin acts as an antimicrobial and antioxidant agent in the body and venom.

Authors:  Hee Geun Park; Kwang Sik Lee; Bo Yeon Kim; Hyung Joo Yoon; Yong Soo Choi; Kyeong Yong Lee; Hu Wan; Jianhong Li; Byung Rae Jin
Journal:  Dev Comp Immunol       Date:  2018-04-03       Impact factor: 3.636

3.  Multivariate QSAR analysis of a skin sensitization database.

Authors:  M T Cronin; D A Basketter
Journal:  SAR QSAR Environ Res       Date:  1994       Impact factor: 3.000

4.  QSAR modeling the toxicity of pesticides against Americamysis bahia.

Authors:  Lu Yang; Yinghuan Wang; Jing Chang; Yifan Pan; Ruojin Wei; Jianzhong Li; Huili Wang
Journal:  Chemosphere       Date:  2020-05-27       Impact factor: 7.086

5.  Bioassay-guided preparative separation of angiotensin-converting enzyme inhibitory C-flavone glycosides from Desmodium styracifolium by recycling complexation high-speed counter-current chromatography.

Authors:  Ying-Qi Zhang; Jian-Guang Luo; Chao Han; Jin-Fang Xu; Ling-Yi Kong
Journal:  J Pharm Biomed Anal       Date:  2014-09-30       Impact factor: 3.935

6.  Effect of crude extract of Bombyx mori coccoons in hyperlipidemia and atherosclerosis.

Authors:  Mir Mahdi Ali; Sarasa Bharati A Arumugam
Journal:  J Ayurveda Integr Med       Date:  2011-04

Review 7.  Computational studies of free radical-scavenging properties of phenolic compounds.

Authors:  Petko Alov; Ivanka Tsakovska; Ilza Pajeva
Journal:  Curr Top Med Chem       Date:  2015       Impact factor: 3.295

8.  Oxidative Stress in Disease and Aging: Mechanisms and Therapies 2016.

Authors:  Claudio Cabello-Verrugio; Felipe Simon; Capucine Trollet; Juan F Santibañez
Journal:  Oxid Med Cell Longev       Date:  2017-01-26       Impact factor: 6.543

9.  Five New Phenolic Compounds with Antioxidant Activities from the Medicinal Insect Blaps rynchopetera.

Authors:  Huai Xiao; Tian-Peng Yin; Jian-Wei Dong; Xiu-Mei Wu; Qing Luo; Jian-Rong Luo; Le Cai; Zhong-Tao Ding
Journal:  Molecules       Date:  2017-08-04       Impact factor: 4.411

10.  Structure of dihydrochalcones and related derivatives and their scavenging and antioxidant activity against oxygen and nitrogen radical species.

Authors:  Alexandre L A Bentes; Rosivaldo S Borges; Waldinei R Monteiro; Luiz G M de Macedo; Cláudio N Alves
Journal:  Molecules       Date:  2011-02-21       Impact factor: 4.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.