Literature DB >> 33980965

Correlation between the structure and skin permeability of compounds.

Ruolan Zeng1, Jiyong Deng2, Limin Dang1, Xinliang Yu3.   

Abstract

A three-descriptor quantitative structure-activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Entities:  

Year:  2021        PMID: 33980965      PMCID: PMC8115152          DOI: 10.1038/s41598-021-89587-5

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Modeling the penetration of manmade and naturally derived chemicals through human skin is of great importance for pharmaceutical and cosmetic industries, as well as toxicology and risk assessment of environmental and occupational hazards. It is very time-consuming and expensive to estimate the skin permeability of chemicals. Further, there are many ethical challenges associated with human and animal testing for assessment of skin permeability[1,2]. Quantitative structure–activity/toxicity relationship (QSAR/QSTR) models[3-6] can be used for the prediction of physicochemical property of compounds, even for those that have not been synthesized. Some researchers have carried out QSAR studies for skin permeability of chemicals (the logarithm of the skin permeability coefficients, log Kp). Patel et al. developed QSAR models for the skin permeability of 158 chemicals with multiple linear regression (MLR) analysis[7]. The model based on four descriptors has an excellent fit to the data with a coefficient of determination of R2 of = 0.90. Fujiwara et al. proposed MLR QSARs for the skin permeability of 94 structurally diverse compounds[8]. The models obtained from ten data sets of the skin permeability possess high R2 values with an average R2 of 0.815. Magnusson et al. introduced a regression model (R2 = 0.760) for the skin permeability of 269 compounds[9]. They found that molecular weight was the main determinant of log KP and QSAR model can be improved when other descriptors such as melting point and hydrogen bonding acceptor capability were added. Chauhan and Shakya built a QSAR model for the skin permeability from the training set of 150 compounds through partial least-squares regression[10]. The model with a R2 of 0.936 for the training set was validated by the test set of 53 compounds. The root mean square (rms) error and R2 from the test set were equal to 0.670 and 0.542. Xu et al. proposed an expanded version of a linear free-energy relationship model for the skin permeability of complex chemical mixtures[11]. The model (R2 = 0.70) showed a better fit and predictive power compared with the simple model (R2 = 0.21). Chen et al. generated a MLR model for the skin permeability with four molecular descriptors[12]. The model has a R2 of 0.858 for the training set (85 compounds), and 0.839 for the test set (21 compounds), which are accurate and acceptable. All these QSAR models referred to were obtained with the linear techniques. Generally, nonlinear QSAR models possess better statistical performance than linear QSAR models because of the nonlinear correlation between molecular physicochemical properties and structure descriptors. Neely et al. constructed a nonlinear artificial neural network (ANN) model for the skin permeability of 160 molecular structures[13]. The ANN model (10-3-7-1) based on ten descriptor and two hidden layers had an absolute-average percentage deviation, rms error, and R of 8.0%, 0.34, and 0.93, respectively. Khajeh and Modarress introduced a novel nonlinear QSAR model for the skin permeability of 283 compounds with the hybrid of ANN and a fuzzy inference system, adaptive neuro-fuzzy inference system (ANFIS)[14]. The ANFIS model was based on a training set of 225 compounds and validated by a test set of 58 compounds. The R2 values for the two sets were 0.899 and 0.890, respectively. The model possesses good predictive ability, although there are nine compounds in duplicate in the data set. ANN algorithm may easily fall into a local minimum value and possesses the disadvantages of slow convergence speed[15]. Support vector machine (SVM) algorithm is based on the principle of structural risk minimization. SVMs can effectively avoid local optimums and have unique advantages in solving practical problems such as limited training samples, high dimensional and nonlinear data. The aim of this study was to develop a nonlinear SVM QSAR model for the skin permeability of a sufficiently large data set consisting of 274 compounds.

Materials and methods

Khajeh and Modarress reported 283 compounds and their experimental log Kp values[14]. After careful investigation, we found that the sample, p-Chlorobenzene, should be 1-chloro-4-nitrobenzene and 4-Chloro-4-phenylenediamine should be 4-Chloro-m-phenylenediamine. There are no counterions or organometallics in the data set. The molecular weights of 283 compounds were calculated with ChemDraw Ultra 8.0 in ChemOffice 2004. These molecules possessing the same molecular weights were checked carefully to identify the duplicates. There are nine compounds in duplicate, including 4-phenylenediamine (1,4-benzenediamine), 4-hydroxynitrobenzene (4-nitrophenol), methylhydroxybenzoate (methyl 4-hydroxybenzoate), 1,2-benzenediamine (2-phenylenediamine), 2-naphthol (naphthalene-2-ol), 2-nitro-1,4-phenylenediamine (2-nitro-4-phenylenediamine), 1-nonanol (Nonanol), 4-chloro-1,3-phenylenediamine (4-Chloro-m-phenylenediamine), and 1-heptanol (Heptanol). After these duplicates were deleted, 274 compounds were obtained. Table S1 in “Supplementary Materials” shows their SMILES structures and the log Kp values. The units for skin permeability coefficients Kp are cm/h and these log Kp values ranged from − 6.10 to − 0.76. The Kennard-Stone algorithm[16] was used to group the compounds in the training set (139 compounds) and test set (135 compounds). The training set was used to adjust model parameters and train QSAR models; and the test set was used to validate the models. ChemDraw Ultra 8.0 in ChemOffice 2004 was adopted to generate the structures of 274 compounds, which were converted into three-dimensional structures with Chem3D Ultra 8.0 and optimized with a semi-empirical AM1 method in MOPAC. Dragon 6.0[17] was used to calculate 4885 molecular descriptors for each compound. After some molecular descriptors that equal a constant or their correlation coefficients are above 0.90 were deleted, 1820 descriptors (including Neoplastic-80) were obtained for descriptor selection. Stepwise MLR analysis in IBM SPSS Statistical 19 was performed to select the optimal subset of descriptors and develop MLR models. For non-linear regression, SVM algorithms map input variables into high-dimensional feature space, from which linear regression analysis is carried out[18,19]. For sample data, , the regression function is expressed as follows: The optimal regression function can be obtained by means of the following minimization problem: Subject to Eqs. (3–4): In SVM regression, the ε-insensitive loss function is employed for minimizing the training error: Thus, Eq. (1) is: By applying a kernel function k(x, y), Eq. (6) can be expressed as: Gaussian radial basis function (RBF) was used in this work: For SVM models, their SVM parameters C and γ can affect greatly their prediction performance. Both C and γ were optimized with the genetic algorithm. In this study, the LibSVM toolbox[20] working on Matlab platform was used to develop models, which can be downloaded freely from https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

Results and discussion

After carrying out stepwise MLR analysis in IBM SPSS Statistical 19 for the skin permeability log Kp of 274 compounds and 1820 descriptors, a three-descriptor QSAR model was obtained, which includes A log P, X3v, and Neoplastic-80. The Ghose–Crippen–Viswanadhan octanolwater partition coefficient (A log P) is based on the A log P model[21] and calculated by:where n is the number of atom of type i and a is the corresponding hydrophobicity constant. Previous works have shown that A log P is positively correlation with skin permeability log Kp. In this work, the descriptors were converted to a new descriptor cos2[(4.31 + A log P)/8.66]. An analysis of cos2[(4.31 + A log P)/8.66] with respect to the skin permeability log Kp of 274 compounds resulted in regression Eq. (10) and statistical parameters:where n is the number of samples in the training set, R2 is the coefficient of determination, R2adj is the adjusted R square, se is the standard error of the estimate, and F is the Fischer ratio. Figure 1 shows the correlation between cos2[(4.31 + A log P)/8.66] and log Kp. The descriptor cos2[(4.31 + A log P)/8.66] (or A log P) describes the hydrophobic character of a compound and is related to log Kp.
Figure 1

Plot of the descriptor cos2[(4.31 + A log P)/8.66] versus log Kp, generated by OriginPro 7.5 SR1.

Plot of the descriptor cos2[(4.31 + A log P)/8.66] versus log Kp, generated by OriginPro 7.5 SR1. Connectivity indices are used widely in QSARs. They are based on the H-depleted molecular graph whose vertexes belong to non-hydrogen atom and are correlated with the number of connected non-hydrogen atoms[17]. The general formula for calculating connectivity indices is:where n is the number of vertices; k is an integer ranging from 0 to 5, denoting the total number of kth order paths present in the molecular graph; and δ is the vertex degrees. Valence connectivity indices (Xkv) can be used to account for the presence of heteroatoms in the molecule as well as of double and triple bonds, by means of replacing the vertex degree with the valence vertex degree. The valence connectivity index of order 3, X3v, describes molecular size and shape. By correlating log Kp to the two descriptors, cos2[(4.31 + A log P)/8.66] and X3v, we obtained the following regression equation: Compared with Eq. (10), the quality of Eq. (12) improved noticeably when the descriptor X3v was added. Figure 2 shows the correlation between the experimental and calculated log Kp with Eq. (12). As illustrated in Fig. 2, there were two samples, ouabain (No. 5 in Table S1), and fluocinonide (No. 11) with larger prediction errors for log Kp. Thus, more molecular descriptors should be added.
Figure 2

Plot of experimental versus calculated log Kp with Eq. (12), generated by OriginPro 7.5 SR1.

Plot of experimental versus calculated log Kp with Eq. (12), generated by OriginPro 7.5 SR1. The descriptor Ghose–Viswanadhan–Wendoloski antineoplastic-like index at the qualifying range that covers approximately 80% of the drugs studied, Neoplastic-80, depends on A log P and reflects molecular polarity and hydrophobicity[17]. The Neoplastic-80 value of a molecule that has a benzene ring, heterocyclic ring, aliphatic amine, carboxamide group, alcoholic hydroxyl group, carboxy ester and/or keto group, was equal to 1, when its A log P value is in the range of − 1.5 to 4.7, the molar refractivity of 43–128, the molecular weight of 180–470, and the total number of atoms of 21–63; otherwise Neoplastic-80 equals zero. A molecule with larger Neoplastic-80 might have a smaller log Kp value. Carrying out regression analysis between log Kp of 274 compounds and the three descriptors stated above resulted in Eq. (13): The correlation coefficient R of 0.945 in Eq. (13) was slightly higher than the 0.942 of the model[13]. Moreover, Eq. (13) has accurate prediction for the skin permeability log Kp of compounds including the two samples (Nos. 5 and 11 in Table S1 in “Supplementary Materials”) stated above, since Fig. 3 shows that there are no samples with obvious larger errors. When the descriptor A log P, together with X3v and Neoplastic-80, was directly used to develop the MLR model, its correlation coefficient R was only 0.939, which was lower than the 0.945 of Eq. (13). Thus the three descriptors, cos2[(4.31 + A log P)/8.66], X3v, and Neoplastic-80 shown in Table S1 in “Supplementary Materials” were used to develop QSAR models.
Figure 3

Plot of experimental versus calculated log Kp with Eq. (13), generated by OriginPro 7.5 SR1.

Plot of experimental versus calculated log Kp with Eq. (13), generated by OriginPro 7.5 SR1. A correlation analysis between the skin permeability log Kp of 139 compounds in the training set and the three descriptors resulted in Eq. (14) (i.e., MLR model): The characteristics of molecular descriptors in MLR model are listed in Table 1. As can been observed in Table 1, the three descriptors, cos2[(4.31 + A log P)/8.66], X3v, and Neoplastic-80 descriptor all were significant and made a contribution to log Kp, because their significance values (or P values) are less than 0.05. In addition, their variance inflation factors (VIF) were far less than ten suggesting that the three descriptors describe different structure factors affecting skin permeability log Kp. The t-test can be used to measure the significance of descriptors in making a contribution to molecular physicochemical properties. The higher the absolute value of the t-test, the greater the significance of the descriptor. According to the t-test values in Table 1, the absolute values of t-test increased in the sequence: Neoplastic-80, X3v, and cos2[(4.31 + A log P)/8.66], the significance of descriptors increased in the same sequence.
Table 1

Characteristics of molecular descriptors in MLR model.

DescriptorCoefficientsStd. errort-testP-valueVIF
Constant2.0680.14514.2210.000
cos2[(4.31 + A log P)/8.66]− 6.5150.206− 31.6250.0001.102
X3v− 0.7220.074− 9.7500.0001.420
Neoplastic-80− 0.1680.012− 14.2480.0001.442
Characteristics of molecular descriptors in MLR model. The MLR model was further used to predict the skin permeability log Kp of 135 compounds in the test set. The correlation coefficient R of the test set was 0.928. The rms errors for the training set, test set and total set were 0.343, 0.302, and 0.323, respectively. The prediction log Kp values are illustrated in Fig. 4 and listed in Table S1 in “Supplementary Materials”.
Figure 4

Plot of experimental versus predicted log Kp with Eq. (14), generated by OriginPro 7.5 SR1.

Plot of experimental versus predicted log Kp with Eq. (14), generated by OriginPro 7.5 SR1. The three molecular descriptors used in Eq. (14) were used as input variables to develop SVM models for skin permeability log Kp from the training set of 139 compounds, by applying the LibSVM toolbox in the MATLAB R2014a software platform. A genetic algorithm was adopted to optimize the SVM parameters C and γ under the following conditions: the searching range of parameters C was [0, 1000], the searching range of γ was [0, 10], the m in m-fold-cross-validation was 5, the maximum generation was 200, the maximum population size was 20, and the ε in the ε-insensitive loss function was 0.001. The optimization results for the SVM model were obtained: the parameters C being 7.2906 and γ being 1.7200, and the internal correlation coefficient based on leave-one-out (LOO) cross-validation method being 0.82. The optimal SVM model was further validated with the test set of 135 compounds. The SVM prediction results are listed in Table S1 in “Supplementary Materials” and illustrated in Fig. 5. The coefficient of determination R2 and rms error for the training set of 139 compounds were 0.946 and 0.253, respectively; R2 and rms for the test set of 135 compounds were 0.872 and 0.302, respectively; and R2 and rms error for the total set were 0.925 and 0.270, respectively. The rms errors of 0.253, 0.302, and 0.270, respectively, for the training set, test set and total set from the SVM model were lower than those (0.343, 0.302, and 0.323, respectively) of Eq. (14) (MLR model) in this study. Therefore, there were non-linear relationships between the skin permeability log Kp and molecular descriptors used.
Figure 5

Plot of experimental versus predicted log Kp with SVM model, generated by OriginPro 7.5 SR1.

Plot of experimental versus predicted log Kp with SVM model, generated by OriginPro 7.5 SR1. The SVM model was further evaluated with the criteria by Golbraikh and Tropsha:[22]where is external correlation coefficient; R02 and R0′2 are determination coefficients of the predicted vs. the observed values and of the observed vs. the predicted values, respectively; k and k′ are slopes of regression lines of the predicted vs. the observed values and of the observed values vs. the predicted values; is the average value of the training set; y and are the observed and the predicted activities, respectively; and . Obviously, our SVM model satisfied the validation criteria[22,23]. The coefficient of determination R2 (= 0.946) in this study is higher than the R2 of 0.907, 0.8158, 0.7609, 0.93610, 0.7011, 0.85812, and 0.9313. In addition, the rms errors of the training set, test set and total set from the ANFIS model of Khajeh and Modarress that dealt with the 283 samples were 0.318, 0.308, and 0.316 respectively[14], which were greater than the rms errors ( 0.253, 0.302, and 0.270, respectively) from our SVM model. Compared with results of other models reported in the literature[9-14], our SVM model shows better statistical performance in a model that deals with more samples in the test set.

Conclusions

A three-descriptor SVM model with SVM parameters C of 7.2906 and γ of 1.7200 was successfully built for the skin permeability log Kp of a sufficiently large data set consisting of 274 compounds, by means of a genetic algorithm. The SVM model possesses rms errors of 0.253 for the training set (139 compounds), 0.302 for the test set (135 compounds), and 0.270 for the total set (274 compounds). Our SVM model shows better statistical performance in a model that deals with more samples in the test set, compared with other QSARs of the skin permeability of log Kp reported in the literature. There were non-linear relationships between the skin permeability log Kp and molecular descriptors used. It was reasonable applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability. Supplementary Table S1.
  13 in total

1.  Quantitative structure-activity relationships (QSARs) for the prediction of skin permeation of exogenous chemicals.

Authors:  Hiren Patel; Wil ten Berge; Mark T D Cronin
Journal:  Chemosphere       Date:  2002-08       Impact factor: 7.086

2.  Modelling skin permeability in risk assessment--the future.

Authors:  D Fitzpatrick; J Corish; B Hayes
Journal:  Chemosphere       Date:  2004-06       Impact factor: 7.086

3.  Molecular size as the main determinant of solute maximum flux across the skin.

Authors:  Beatrice M Magnusson; Yuri G Anissimov; Sheree E Cross; Michael S Roberts
Journal:  J Invest Dermatol       Date:  2004-04       Impact factor: 8.551

4.  Predicting skin permeability from complex chemical mixtures: incorporation of an expanded QSAR model.

Authors:  G Xu; J M Hughes-Oliver; J D Brooks; R E Baynes
Journal:  SAR QSAR Environ Res       Date:  2013-06-14       Impact factor: 3.000

5.  Linear and nonlinear quantitative structure-property relationship modelling of skin permeability.

Authors:  A Khajeh; H Modarress
Journal:  SAR QSAR Environ Res       Date:  2013-10-03       Impact factor: 3.000

6.  Deep learning driven QSAR model for environmental toxicology: Effects of endocrine disrupting chemicals on human health.

Authors:  SungKu Heo; Usman Safder; ChangKyoo Yoo
Journal:  Environ Pollut       Date:  2019-07-06       Impact factor: 8.071

7.  Nonlinear quantitative structure-property relationship modeling of skin permeation coefficient.

Authors:  Brian J Neely; Sundararajan V Madihally; Robert L Robinson; Khaled A M Gasem
Journal:  J Pharm Sci       Date:  2009-11       Impact factor: 3.534

8.  Predicting chemically-induced skin reactions. Part II: QSAR models of skin permeability and the relationships between skin permeability and skin sensitization.

Authors:  Vinicius M Alves; Eugene Muratov; Denis Fourches; Judy Strickland; Nicole Kleinstreuer; Carolina H Andrade; Alexander Tropsha
Journal:  Toxicol Appl Pharmacol       Date:  2015-01-03       Impact factor: 4.219

9.  Evaluating Molecular Properties Involved in Transport of Small Molecules in Stratum Corneum: A Quantitative Structure-Activity Relationship for Skin Permeability.

Authors:  Chen-Peng Chen; Chan-Cheng Chen; Chia-Wen Huang; Yen-Ching Chang
Journal:  Molecules       Date:  2018-04-15       Impact factor: 4.411

10.  Prediction of Depuration Rate Constants for Polychlorinated Biphenyl Congeners.

Authors:  Xinliang Yu
Journal:  ACS Omega       Date:  2019-09-12
View more
  2 in total

1.  Multifunctional Analysis of Chia Seed (Salvia hispanica L.) Bioactive Peptides Using Peptidomics and Molecular Dynamics Simulations Approaches.

Authors:  José E Aguilar-Toalá; Abraham Vidal-Limon; Andrea M Liceaga
Journal:  Int J Mol Sci       Date:  2022-06-30       Impact factor: 6.208

2.  In Silico Prediction of Skin Permeability Using a Two-QSAR Approach.

Authors:  Yu-Wen Wu; Giang Huong Ta; Yi-Chieh Lung; Ching-Feng Weng; Max K Leong
Journal:  Pharmaceutics       Date:  2022-04-28       Impact factor: 6.525

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.