Literature DB >> 27685320

Quantitative Structure Activity Relationship Models for the Antioxidant Activity of Polysaccharides.

Zhiming Li1, Kaiying Nie1, Zhaojing Wang1, Dianhui Luo1.   

Abstract

In this study, quantitative structure activity relationship (QSAR) models for the antioxidant activity of polysaccharides were developed with 50% effective concentration (EC50) as the dependent variable. To establish optimum QSAR models, multiple linear regressions (MLR), support vector machines (SVM) and artificial neural networks (ANN) were used, and 11 molecular descriptors were selected. The optimum QSAR model for predicting EC50 of DPPH-scavenging activity consisted of four major descriptors. MLR model gave EC50 = 0.033Ara-0.041GalA-0.03GlcA-0.025PC+0.484, and MLR fitted the training set with R = 0.807. ANN model gave the improvement of training set (R = 0.96, RMSE = 0.018) and test set (R = 0.933, RMSE = 0.055) which indicated that it was more accurately than SVM and MLR models for predicting the DPPH-scavenging activity of polysaccharides. 67 compounds were used for predicting EC50 of the hydroxyl radicals scavenging activity of polysaccharides. MLR model gave EC50 = 0.12PC+0.083Fuc+0.013Rha-0.02UA+0.372. A comparison of results from models indicated that ANN model (R = 0.944, RMSE = 0.119) was also the best one for predicting the hydroxyl radicals scavenging activity of polysaccharides. MLR and ANN models showed that Ara and GalA appeared critical in determining EC50 of DPPH-scavenging activity, and Fuc, Rha, uronic acid and protein content had a great effect on the hydroxyl radicals scavenging activity of polysaccharides. The antioxidant activity of polysaccharide usually was high in MW range of 4000-100000, and the antioxidant activity could be affected simultaneously by other polysaccharide properties, such as uronic acid and Ara.

Entities:  

Year:  2016        PMID: 27685320      PMCID: PMC5042491          DOI: 10.1371/journal.pone.0163536

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

In our normal metabolism process, oxygen free radicals and non-oxygen free radicals are continuously produced, and lower concentrations of free radical can play a crucial role in regular physiological functions [1-5]. However, many diseases, such as cardiovascular diseases, diabetes, aging and cancer, can be conducted by unregulated overproduction of free radicals [6-8]. Thus, it is essential to develop natural and effective antioxidants [9]. Previously reports revealed that many natural polysaccharides possess potent scavenging activities of free radicals and can be used as potential antioxidants [10-11]. It is always impossible to obtain a large quantity of experimental data because of a lack of perfect data sites, and so the study on relationship between bioactivities and the properties of polysaccharides by model forecast approach was relatively poor [12]. The quantitative structure-activity relationship (QSAR) model, which use relevant molecular physico-chemical properties to predict important treatment responses, is considered as an alternative to the experimental evaluation [13]. It has gained increasingly attention, and a variety of QSAR methods have been developed for water treatment process selection, membrane separation and adsorption etc [14-15]. To date, QSAR models for predicting the bioactivities of polysaccharides have seldom been developed. A study reported the relationship between monosaccharide composition ratio and macrophage stimulatory activity by model forecast approach [12]. To obtain theoretical supports for applications of polysaccharides from natural products, the main aim of this work was to establish reliable soft measurement models to predict performance and study the relationship between polysaccharide properties and antioxidant activities of polysaccharides by QSAR. In our QSAR studies, multiple linear regression (MLR) method, and the nonlinear methods including artificial neural network (ANN) and support vector machine (SVM) were used.

Materials and Methods

Data set

The present study showed that the antioxidant activity of polysaccharide has related with many factors, including monosaccharide composition [16], uronic acid (UA), molecular weight (MW), protein content (PC) and sulfate group content et al [17]. In the data selection, we chose natural purified polysaccharides without sulfate groups to study QSAR models for predicting antioxidant activities of polysaccharides. A various set of polysaccharides and their antioxidant activities were collected from different published papers [18-45]. Antioxidant activities of polysaccharides were represented by the 50% effective concentration (EC50). To set up a more reliable model, we selected 141 compounds. The detailed publication lists with corresponding antioxidant activities and compounds were given. The normalization process was adopted in the distribution of the parameters with 2 as the bottom of the log logarithm, and MW was divided by 10000 in the normalization process. In models, a training data set was applied to develop the model. A test set, which was never included during their development, was used to validate the predictive power of model [46-47]. The training set and test set were chosen by random distribution.

Descriptors

The structure of polysaccharide was complex and could be represented by variety of descriptors. However, the major composition of polysaccharide was monosaccharide joined together by glycosidic bonds, which was essential to their bioactivities, so we used monosaccharide composition as descriptors. The following descriptors of monosaccharide composition were considered for modeling EC50 values in MLR, ANN and SVM analysis. Descriptors of monosaccharide composition: rhamnose (Rha), arabinose (Ara), mannose (Man), glucose (Glc), galactose (Gal), fucose (Fuc), xylose (Xyl), ribose (Rib), glucuronic acid (GlcA) and galacturonic acid (GalA). Usually, gas chromatography (GC) and high-performance liquid chromatography (HPLC) were performed for the identification and quantification of monosaccharide composition. For HPLC analysis, glucuronic acid (GlcA) and galacturonic acid (GalA) could not be identified. Thus, total uronic acid (UA) could be determined by other methods, such as the sulfuric acid carbazole method, and then UA was also used as a descriptor in our models. The descriptors of PC and MW were also adopted in models. STATISTIC.10 method was used to establish SVM, MLP and ANN models, and the picture was drawn by using RStudio (Version 0.99.902–2009–2016 RStudio, Inc.).

Linear model generation

There were primarily two different approaches for choosing a descriptor subset in MLR, and they were filter and wrapper methods. The procedure of filter method was that setting and filtering descriptors were supposed to generate the top priority subset before training. However, the learning algorithm was wrapped into the selection procedure in the wrapper method [48]. In MLR, we used wrapper as the target learning algorithm. The training data set was applied only for selecting descriptor. At first, we employed a two-dimensional research method. It was a combination of forward and backward search. Then we assessed the selected descriptors on the target learning algorithm. In the learning process, we used 10 fold cross validation method. In stepwise MLR analysis, we selected training descriptor sets and then established a linear model [49].

Artificial neural network and Support vector machines

It was appropriate for artificial neural network (ANN) to model nonlinear relationship. We can find many reviews about ANN research and its application in QSAR studies [49-51]. In this study, we employed multi-layer perceptron (MLP) [52] and three layer reverse Back-Propagation (BP) network. In the back-propagation ANN, we utilized the technique of supervised learning, and the trained network was trained by minimizing the squared error of the network’s output. The first step of training model was to confirm the number of layers and neurons in each player. The second step was to optimize the learning rate as well as momentum parameters. In the input layer, the architecture of the network was composed of eleven neurons, which were the eleven relative descriptors chosen. In the output layer, there was one neuron, i.e. EC50 values of the antioxidant activity. In all the layers, logistic function was applied. In the hidden layer, through changing the number of neurons, we got the lowest RMSE and highest correlation coefficient. We applied 30% of the training data set for verification. The verification was employed to hinder from the over fitting. All of optimization process were taken with 10 fold cross validation [53]. Support vector machines (SVM) was originally developed for the classification problem, and SVM has been used to solve nonlinear regression estimation. Nowadays, SVM has demonstrated much success in QSAR and quantitative structure-property relationship (QSPR) studies [54-57]. We selected support vector machine classifier method (epsilon-SVM) which was most commonly used in QSPR and QSAR studies to optimize the value of kernel parameter g (gamma) [53].

Validation techniques and model performance evaluation

We used a 10 fold cross validation technique. This procedure divided the data set into 10 folds or groups, created the model using 9 of the sets, and tested it on the remaining group. When the procedure was repeated, each of the 10 groups had served as a test group. The root mean square error (RMSE) was calculated, averaged, and then used to evaluate the predictive performance of three models.

Results and Discussion

Models for the DPPH scavenging activity of polysaccharides

The data was divided into two parts using random classification. One was the training set, the other was the test set. The entire data set including 74 compounds was divided into two clusters. The test set of 22 compounds was chosen randomly from this cluster, and the remaining compounds were used as the train set. Compound number 4, 5, 7, 9, 17, 20, 25, 30, 31, 32, 33, 34, 36, 38, 44, 54, 57, 62, 63, 64, 69,73 were selected as the test set, and the rest of the compounds were the train set. The test set and train set were given in Table 1. The data distribution of parameter was shown in Fig 1, the data distribution was uniform, and no other single variable values was close to EC50 values distribution (-6, 2). The shape of data distribution from EC50 and Ara was similar, which indicated that there was a certain relation between them. In addition to MW, other physical quantities were all the components of polysaccharides, so MW was used to establish the model by itself.
Table 1

Polysaccharides data set with descriptors and EC50 values of the DPPH scavenging activity.

NoNameaRhabAracMandGlceGalfFucgXylhGlcAiGalAjUAkPClEC50Refs
1S1 (Glu)0022.1328.3313.89031.484.0304.0301.10418
2S1 (Visco)0063.0110.3716.0604.75.8705.8700.79418
3GBP50S246.7042.2011.10000000.41219
4LBP-s80062.913.22.912.53.84.70017.070.692.9720
5LBP-s75056.819.23.610.24.95.30035.691.31.9820
6WB10012.559.49.7611.76.6400000.1221
7WB2007.72624.728.413.200000.3121
8WB3009.718.624.529.917.300000.2121
9IOP404.44.29.84014.53.39.69.74.652.40.8822
10IOP609.75.6832.212.6022.94.74.42.23.20.69722
11IOP80115.69.731.38.5025.32.83.91.54.61.1922
12FUP-10009.816.78083.4100000.4723
13CLP-23.32.114.5482804.10023.591.480.8624
14CLP-3008.65629.4060017.060.951.2724
15TYAP-1078.9805.7410.604.6800003.9225
16PV-P1024.21.98.39.7055.90.83.53.41.220.87826
17PV-P23.615.714.41621.6028.70.35.45.74.220.16926
18PV-P36.116.516.111.213.3036.80.27.98.17.090.04826
19Control-EPS6.114.620.420.724.201400019.752.327
20Control-IPS13.17.22836.7190600017.571.0827
21Tween 80-IPS13.32.46.973.411.402.600016.720.7427
22Tween 80-IPS21.45.218.360.912.102.100015.610.8427
23CPSI00277300000000.2328
24G110.91.26.252.514.9014.3004.26.490.3429
25G212.20.84.95616.209.9007.455.110.5629
26G312.23.83.250.212.5018.1001.923.620.8729
27P111.430.31.59.244.403.200000.6230
28P210.422.13.111.253.10000001.0730
29CP1.215.67.528.224.705.44.812.617.47.570.0931
30SCG019.934.4315.3760.270000000.732
31PNMP205.7828.6214.4241.577.242.3700000.329716
32PNMP303.4526.5821.5536.428.443.5600000.151616
33GLP60003.285.98.21.5000000.978
34GLP80009.479.45.41.1000000.728
35GLP004.886.56.11.2000000.98
36LLPs-D6.832.739.219.2358.190.573.2500000.3833
37LLPs-L5.0319.396.0722.8237.457.042.2100000.9933
38SMWP-1002734110280000.530.1334
39EAP40-12.6303646.7914.58000000.330.2835
40EAP60-13.372.282.8943.6137.67010.180000.480.5235
41CMP-14.20095.800000001.1536
42GPA10.421.210.613.827.52.2014.89.523.043.750.0837
43GPA20.815.68.21821.41.61.61814.832.794.380.0637
44GPA33.87.56.334.316.31.33.124.33.127.015.530.0337
45Ac-CP12.516.45.617.627.501.82.62615.787.250.0644838
46Ac-CP22.815.3614.326.2022.930.525.996.930.0782938
47Ac-CP32.215.65.69.829.901.13.432.427.437.090.0780438
48CP1.215.67.528.124.805.44.812.616.147.570.0983738
49WFPs5.218.53.515.921.307.43.324.928.100.00739
50APs-2-14.68032.324.2021.109.89.81.90.454540
51APs-3-11.52.8035.134016.727.97.91.30.224340
52PTPS-36.8226.2213.8310.2339.343.210.350040.6613.271.7241
53PTPS-515.9820.8415.296.0840.331.680.150040.4419.961.4541
54PSS-EPS8.27.72435.315.409.400020.191.49742
55UKLOxa5.510.26.111.328.40.37.226.54.53100.054643
56UKLK13.66.15.810.99.41.85.036.52.99.400.13643
57UKLK42.56.62.68.36.61.164.47.10.87.900.602343
58UKSOxa5.69.911.818.716.90.49.424.3327.300.016543
59UKSOxa-PG8.615.77.112.426.10.515.410.2414.200.375143
60UKSK13.541.96.23.60.776.52.80.82.600.17743
61UKSK43.68.73.78.36.20.9661.90.72.600.003843
62PMBOxa8.811.99.523.818.30.78.416.32.318.600.021743
63PMBOxa-PG6.712.410.326.625.50.86.7831100.14443
64PMBK14.813.34.817.213.71.537.55.31.97.200.314343
65PMBK42.418.42.39.98.92.651.930.63.600.654743
66AMBOxa10.914.54.225.914.90.56.116.86.22300.018443
67AMBOxa-PG17.422.645.916.60.67.120.8525.800.353343
68AMBK124.61.932.210.71.844.21.51.12.600.109343
69AMBK43.227.21.317.16.52.140.61.30.7201.420343
70PS10.790.6960.5132.662.352.98000001.2144
71PS210.965.8136.1626.9214.554.521.0400000.7344
72PS348.5510.737.3511.4113.854.623.4500000.6744
73WKCP-N02.22091.955.830000000.6145
74WKHP-N012.9073.711001.342.4503.201.0845

aname from reference

brhamnose

carabinose

dmannose

eglucose

f galactose

gfucose

hxylose

iglucuronic acid

jgalacturonic acid

kuronic acid

lprotein content

Fig 1

Data distribution of parameter.

aname from reference brhamnose carabinose dmannose eglucose f galactose gfucose hxylose iglucuronic acid jgalacturonic acid kuronic acid lprotein content

MLR results

In this study, the training data set of 52 compounds was used. A stepwise linear regression analysis was used to determine the relationship between the dependent variable of EC50 and the independent variables of uronic acid (UA), protein content (PC) and monosaccharide compositions (Rha, Ara, Man, Glc, Gal, Fuc, Xyl, GlcA and GalA). To achieve this goal, regression analysis was implemented by using the forward stepwise. In stepwise regression procedures, the first was to choose the most correlated independent variable, and then to select independent variable which was most correlated with the remaining variance in the dependent variable. This procedure was to increase the additional independent variable with R-squared (R2) which was not changing until a significance of at least 80%. Accordingly, the variables of Ara, GalA, GlcA and PC were included in the regression model. The relationship between the matrix of parameters and EC50 was shown in Fig 2. One variable data was used as the abscissa, another variable data was used as ordinate, and all points had been portrayed by the matrix scatter plot. From the diagonal we can see that the distribution of the data was all similar in shape. Fig 3 showed the correlation between model parameters and EC50, and the proportion of Ara, GalA and GlcA accounted 0.51, 0.39 and 0.35, respectively, which indicated that they had the most effect on EC50. In Fig 3, we can see that EC50 had a positive correlation with Ara and PC, and it has negative correlation with GalA and GlcA, which was consistent with the model given in equation. The regression Eq 1, which could be obtained through the statistical analysis, was as follows. Because the effect of UA on EC50 was little, UA was not added to the model equation. The linear model selected four major relevant descriptors, and gave a stable model with R = 0.807 and RMSE = 0.423.
Fig 2

Correlation between the matrix of parameters and EC50 value of the DPPH scavenging activity.

Fig 3

Proportion of the parameters effecting on EC50 value of the DPPH scavenging activity.

In the model, R value was 0.807 (p <0.001), fit indicators of the model were acceptable, the model was coincided with the data structure, and Ara, GalA, GlcA, PC and EC50 were significant correlation. The predicted EC50 values of the training and test set by using the MLR equation were given in the Table 2. Predicted values and experimental values of EC50 in two sets of data were plotted and shown in Fig 4. Most of the data were distributed from 0 to 1.5, and there were some predicted and negative values existing in the left lower corner. The experimental values of these negative values were between 0 and 0.2, which could be accepted. Experimental values and predicted points were distributed in two sides of the curve fitting, and most point of test set distributed among the prediction set, which illustrated that the establishment of training set used for the multiple regression model was very good to predict the numerical value of test set. The above linear model was applied to predict the 22 test data set, and these test data were never used in model building. The result showed R = 0.872, RMSE = 0.361 and p = 1.245E-7, which showed that there was a significant correlation. Multiple linear regressions (MLR) established the relationship between the dependent variable of EC50 and the independent variable of polysaccharide properties. The results showed that the statistics for MLR equation were good, and it also offered some views about the polysaccharide properties influences on DPPH-scavenging activity of polysaccharides.
Table 2

Experimental and predicted values of EC50 for the DPPH-scavenging activity of polysaccharides using MLR, ANN and SVM models.

NoNameaExpMLRSVMANN
PredictResiduePredictResiduePredictresidue
1S1 (Glu)1.1040.3615820.7424180.3662010.7377990.9539440.150056
2S1 (Visco)0.7940.3058590.4881410.7069480.0870520.956870-0.162870
3GBP50S20.4120.483627-0.0716270.608142-0.1961420.3765540.035446
4LBP-s802.972.5750700.3949301.9582101.0117903.430658-0.460658
5LBP-s751.982.388860-0.4088601.8686210.1113792.222439-0.242439
6WB10.120.483627-0.3636270.367456-0.2474560.395514-0.275514
7WB20.310.483630-0.1736300.0409010.2690990.1816480.128352
8WB30.210.483627-0.2736270.0137560.1962440.1382450.071755
9IOP400.880.2001400.6798600.1235910.7564090.5086400.371360
10IOP600.6970.4255140.2714860.2912280.4057720.814391-0.117391
11IOP801.190.5377770.6522230.3783940.8116060.9556500.234350
12FUP-10.470.483627-0.0136270.2742570.1957430.1450750.324925
13CLP-20.860.5892280.2707720.7234930.1365070.863565-0.003565
14CLP-31.270.5069540.7630460.6497550.6202451.1049420.165058
15TYAP-13.923.0884600.8315402.4385041.4814963.7707490.149251
16PV-P10.8781.145072-0.2670720.6811460.1968540.5925260.285474
17PV-P20.1690.876230-0.7072300.577149-0.4081490.551492-0.382492
18PV-P30.0480.874380-0.8263800.516508-0.4685080.321103-0.273103
19Control-EPS2.31.4501090.8498911.2586211.0413792.311131-0.011131
20Control-IPS11.081.152520-0.0725201.093392-0.0133921.781373-0.701373
21Tween 80-IPS10.740.973340-0.2333400.935877-0.1958770.4194140.320586
22Tween 80-IPS20.841.038431-0.1984310.979781-0.1397810.7762610.063739
23CPSI0.230.483627-0.2536270.747002-0.5170020.339270-0.109270
24G10.340.682566-0.3425660.618879-0.2788790.678410-0.338410
25G20.560.635490-0.0754900.619038-0.0590380.912242-0.352242
26G30.870.6978430.1721570.6149610.2550390.920268-0.050268
27P10.621.482949-0.8629491.139900-0.5199000.912532-0.292532
28P21.071.212505-0.1425050.9915190.0784810.7741890.295811
29CP0.090.525923-0.4359230.348891-0.2588910.112901-0.022901
30SCG0.71.140940-0.4409401.009837-0.3098370.745078-0.045078
31PNMP20.32970.674260-0.3445600.595812-0.2661120.513151-0.183451
32PNMP30.15160.597410-0.4458100.504217-0.3526170.423722-0.272122
33GLP600.970.4836300.4863700.7048680.2651320.6506000.319400
34GLP800.720.4836300.2363700.6974020.0225980.5198190.200181
35GLP0.90.4836270.4163730.7181950.1818050.6278240.272176
36LLPs-D0.380.573660-0.1936600.623545-0.2435450.573738-0.193738
37LLPs-L0.991.123127-0.1331270.7948870.1951130.6084990.381501
38SMWP-10.130.496640-0.3666400.528723-0.3987230.605999-0.475999
39EAP40-10.280.491730-0.2117300.688481-0.4084810.330165-0.050165
40EAP60-10.520.570610-0.0506100.593314-0.0733140.667831-0.147831
41CMP-11.150.4836270.6663730.8004340.3495660.9484060.201594
42GPA10.080.440122-0.3601220.275073-0.1950730.109224-0.029224
43GPA20.06-0.0416810.101681-0.0192610.0792610.0434270.016573
44GPA30.030.0047200.0252800.116442-0.0864420.087376-0.057376
45Ac-CP10.064480.065797-0.001317-0.0033990.0678790.0225100.041970
46Ac-CP20.07829-0.1705410.248831-0.0948240.1731140.0175360.060754
47Ac-CP30.07804-0.2491750.327215-0.1174740.1955140.0172600.060780
48CP0.098370.525923-0.4275530.343823-0.2454530.125297-0.026927
49WFPs0.007-0.0194040.0264040.007187-0.0001870.016533-0.009533
50APs-2-10.45450.3953430.0591570.2577610.1967390.0744470.380053
51APs-3-10.22430.225857-0.0015570.2008100.0234900.1475940.076706
52PTPS-31.721.6742310.0457691.5245710.1954291.5328980.187102
53PTPS-51.451.661066-0.2110661.636562-0.1865621.546828-0.096828
54PSS-EPS1.4971.2333400.2636601.1425110.3544891.906331-0.409331
55UKLOxa0.0546-0.1656110.2202110.0366410.0179590.072630-0.018030
56UKLK10.1360.369956-0.2339560.332098-0.1960980.164552-0.028552
57UKLK40.60230.4537300.1485700.1858240.4164760.0872980.515002
58UKSOxa0.0165-0.0478420.0643420.094666-0.0781660.070345-0.053845
59UKSOxa-PG0.37510.529760-0.1546600.3359270.0391730.1592810.215819
60UKSK10.1770.498201-0.3212010.232651-0.0556510.1024400.074560
61UKSK40.00380.684537-0.6807370.368651-0.3648510.231762-0.227962
62PMBOxa0.02170.288880-0.2671800.250147-0.2284470.105432-0.083732
63PMBOxa-PG0.1440.528240-0.3842400.417461-0.2734610.559193-0.415193
64PMBK10.31430.684450-0.3701500.393583-0.0792830.3063240.007976
65PMBK40.65470.975208-0.3205080.5492600.1054400.6267540.027946
66AMBOxa0.01840.200785-0.1823850.172081-0.1536810.053022-0.034622
67AMBOxa-PG0.35330.395625-0.0423250.2796580.0736420.0836120.269688
68AMBK10.10930.545151-0.4358510.353599-0.2442990.546221-0.436921
69AMBK41.42031.3128500.1074500.8596460.5606541.691224-0.270924
70PS11.210.5063840.7036160.8245440.3854561.219299-0.009299
71PS20.730.6752460.0547540.6499970.0800030.4679110.262089
72PS30.670.837512-0.1675120.6122080.0577920.702244-0.032244
73WKCP-N0.610.5568400.0531600.816849-0.2068490.776698-0.166698
74WKHP-N1.080.8348850.2451150.8847730.1952271.335107-0.255107

aname from reference

Fig 4

A comparison of experimental vs predicted EC50 using MLR method.

aname from reference

ANN results

Polysaccharide properties were considered as the input layer node in neural networks, and EC50 values of the DPPH-scavenging activity was the output layer node. Numbers of nodes had a great influence on the test results. The optimization was done with 10 fold cross validation, and 30% of test data were used for validation. Selected parameters of the number of neurons in the hidden layer were optimized by changing from 4 to 14, and it was worthy to mention that the initial value of 7 selected was optimal. The selected network adopted Broyden Fletcher Goldfard Shanno (BFGS) algorithm which was still seen as the best Quasi-Newton algorithm. When the entire training data was trained in the network with the optimized parameters, it gave R = 0.96 and RMSE = 0.018. The experimental and predicted values of EC50 for the train data using the ANN model were plotted and shown in Fig 5. The experimental value was abscissa, the point distribution of the prediction value for the y-coordinate was on both sides of the curve fitting from 0 to 1.5, and the point distribution was uniform and closed to each other. According to the view of point, the density of horizontal and vertical coordinates and the fitting effect were perfect. The predicted values of EC50 for the train and test data were given in the Table 2. The test set was used for prediction and gave R = 0.933 and RMSE = 0.055.
Fig 5

A comparison of experimental vs predicted EC50 using ANN method.

SVM results

We selected radial basis function (RBF) kernel for function modeling in SVM, the best parameter C, g and ε were selected by using 10 fold cross validation, a SVM model was obtained by training the whole training set, and then the model was used for the test set. By varying the parameter values in the training set systematically, we optimized SVM parameters, and calculated RMSE of the model. The parameter value which gave the lowest RMSE was selected. The regularization parameter C controlled the alternate use between maximizing the margin and minimizing the training error. If the value of C was too small, then there was not sufficient stress on fitting the training data. To have a stable learning procedure, a large value of C should be set up first [57]. To discover an optimal value of C, the RMSE of SVM model with different C values was calculated. Then, this value C = 9 was selected as the optimal value. We achieved the selected parameters (g = 0.091, ε = 0.1, C = 9) and the final training running in the whole training set, and EC50 of the DPPH-scavenging activity was predicted. The predicted EC50 on the basis of this model was plotted and shown in Fig 6 and Table 2. The statistical parameters of this model were R = 0.851 and RMSE = 0.151 for the training set, and the test set was used for prediction and gave R = 0.865 and RMSE = 0.144.
Fig 6

A comparison of experimental vs predicted EC50 using SVM method.

Comparison of MLR, ANN and SVM models

The statistical parameters obtained from the investigative models for train and test set were shown in Table 3. The error estimates were applied to model performance evaluation, and RMSE were lower for nonlinear models (SVM, ANN) generated by the machine learning methods than that by multiple linear regression. The correlation coefficients (R) given by SVM and ANN models were also higher than that by multiple linear regression. The above results indicated that the performances of nonlinear models SVM and ANN were better than that of a linear MLR model for the prediction of DPPH-scavenging activity of polysaccharides. The comparison of the nonlinear models demonstrated that ANN model accurately predicted the relationship between polysaccharide properties and the DPPH-scavenging activity for the train data set, and this was obviously evident from a lower RMSE (0.018) and a higher R (0.96) value. While ANN model was also the best one in the prediction of the test set.
Table 3

Comparison of MLR, ANN and SVM models for the DPPH scavenging activity of polysaccharides.

MethodParametersTraining setTest set
MLRR0.8070.872
RMSE0.4230.361
ANNR0.960.933
RMSE0.0180.055
SVMR0.8510.865
RMSE0.1510.144

Effect of MW on the scavenging activity of DPPH radical

Molecular weight was seen as an important indicator of the antioxidant activity of polysaccharides [20], so a single study was used to evaluate the relationship of MW and antioxidant activity of polysaccharides. Due to the relatively large difference in MW of polysaccharide from 2250 to 538500 (Table 4), MW was normalized before the analysis, the size of MW was taken with a base-8 of log, and the data was shown in Table 4 [58-66].
Table 4

MW and EC50 values of the DPPH scavenging activity.

NameaEC50MwRefsNameEC50MwRefsNameEC50MwRefs
PS6.2022500058CLP-31.276014324CMP-11.15430036
PSPO-1a1.431800059TYAP-13.9211500025PPM1.802200064
LBP-805.337060020TYAP-24.1147900025PPE3.063800064
LBP-s751.987170020TYAP-32.6440300025GPA10.081960037
LBP-s504.9653850020PS1-16.816740062GPA20.061060037
BSFP-17.401330060PS1-24.561540062GPA30.03670037
WB20.312800021PS2-12.531210062AAP-2A0.15225265
WB30.211900021PNMP10.722840016RNLP I0.201490066
SP13.20919261PNMP20.333150016WKCP-N0.61960045
FUP-10.474100023PNMP30.152610016WKHP-N1.0811340045
CLP-11.697875424AAP3.292770063WKHP-A3.3416960045
CLP-20.865125724EAP80-21.326531335

aname from reference

aname from reference We used EC50 values as the horizontal coordinate and established the correlation between EC50 and MW. As shown in Fig 7, the value of EC50 decreased with the decrease of MW, which indicated that the smaller MW could have the stronger DPPH free radical scavenging activity. This result was in accord with those reported in the literature [20, 59]. In Fig 7, it could also be found that there were some points which did not conform to the rules, such as TYAP-3 and BSFP-1. BSFP-1 had the smaller MW and a relatively larger EC50 value [60], which may be because BSFP-1 had no UA. TYAP-3 had larger MW, but its EC50 value was smaller. The reason may be that the content of Ara accounted for 45.82% in TYAP-3 [25]. Fig 7 showed that when the value of EC50 arranged from 0 to 2, the value of Y axis was 0–5.5, which indicated that MW was between 4000 and 100000.
Fig 7

Correlation scatter plots of EC50 and MW.

According to the above results, we could conclude that the antioxidant activity of polysaccharide usually was higher in MW range of 4000–100000. However, MW was not the only factor, and the antioxidant activity could be affected by other polysaccharide properties, such as UA and Ara.

Models for the hydroxyl radicals scavenging activity of polysaccharides

To make relationship models of monosaccharide composition and the hydroxyl radicals scavenging activity, the entire data set including 67 compounds was divided into two clusters [67-82]. The test set and the train set were given in Table 5.
Table 5

Polysaccharides data set with descriptors and their EC50 values of the hydroxyl radicals scavenging activity.

NoNameaRhabAracMandGlceGalfFucgXylhGlcAiGalAjUAkPClEC50Refs
1PS-SI027.318.29.145.4000000.50.2167
2CBP-10035.912.851.30000000.63868
3GBP50S246.7042.2011.10000000.48219
4pMTPS-3017.341.628.312.60000001.969
5USEP40-17.958.4237.3417.9428.360000000.37670
6USEP70-114.477.7837.2721.8518.640000000.52470
7IOP404.44.29.84014.53.39.69.74.652.40.5822
8IOP609.75.6832.212.6022.94.74.42.23.20.4622
9CLP-103.67.960.226.401.90015.841.433.6824
10CLP-23.32.114.5482804.10023.591.481.2924
11CLP-3008.65629.4060017.060.952.824
12GPS-244.720.903.610.8019.900000.06971
13P70-1005618260000000.54865
14PS1-10089.57.33.2000001.671.1462
15PS1-20071.13.725.2000001.860.4862
16PS2-10052.72816.90002.403.850.3662
17O.ficus-indica -p15.345.5039.200000000.631872
18G110.91.26.252.514.9014.3004.26.491.8829
19G212.20.84.95616.209.9007.455.111.4129
20P111.430.31.59.244.403.200002.3830
21P210.422.13.111.253.10000000.9830
22CP1.215.67.528.224.705.44.812.607.570.3731
23SSP II-a8.9438.7402.1831.47002.3316.34000.778273
24PNMP205.7828.6214.4241.577.242.3700000.711774
25PNMP303.4526.5821.5536.428.443.5600000.433674
26LLPs-D6.832.739.219.2358.190.573.2500000.6133
27LLPs-L5.0319.396.0722.8237.457.042.2100000.9233
28SMWP-1002734110280000.531.0834
29GRMP100031.50068.500000.147216
30EAP40-12.6303646.7914.58000000.330.9535
31EAP60-13.372.282.8943.6137.67010.180000.481.4935
32EAP80-21.2206.7321.6455.5610.394.460000.141.8435
33PS-24.1717.3318.6535.1419.1105.5900000.8975
34EUPS-28.8315.7712.3943.9411.1507.9200001.3675
35CMP-14.20095.800000000.6536
36EPS-10010.6845.40000019.864.8476
37EPS-20032.757.310.90000020.32.6976
38IPS-100598.532.60000033.971.3276
39IPS-20042.219.8380000020.381.5876
40IPS-30027.272.80000001.91.9176
41PPM0069.17.823.10000001.9964
42GPA10.421.210.613.827.52.2014.89.523.043.750.2237
43GPA20.815.68.21821.41.61.61814.832.794.380.2137
44GPA33.87.56.334.316.31.33.124.33.127.015.530.237
45RCP-II9.821.307.933.809.3017.923.600.9677
46AAP-2A825.7049.3170000000.02265
47TPC012.7011.25.4033.827.10000.10178
48TPC-1021.21626.36.4017.300302.80.18479
49TPC-2026.413.937.50000047.63.80.15879
50TPC-3037.2014.98.3023.10051.840.09379
51GO-224.20025.8000500001.1380
52GO-324.50014.100061.40000.9380
53GO-422.4000.700076.90000.780
54RNLP I10.151.73.522.38.803.6006.7101.7466
55PSCK2-2412.443.30036.400424.701.581
56PSCK2-3511.345.72.5035.50006.6404.881
57APs-1-11.48.1068.20022.30003.10.209240
58APs-2-14.68032.324.2021.109.801.90.196740
59APs-3-11.52.8035.134016.727.901.30.171540
60WSEPS014.5031.940.600130000.0782
61CT-EPS11.47.4194013.508.700014.871.6242
62PSS-EPS8.27.72435.315.409.400020.191.11942
63PSS-DEPS3.35.625.531.529.804.300026.473.52242
64CT-IPS2.16.21859.790500025.068.82842
65PSS-IPS1.76.98.673.1504.700010.820.77942
66PS210.965.8136.1626.9214.554.521.0400000.9844
67PS348.5510.737.3511.4113.854.623.4500000.6644

aname from reference

brhamnose

carabinose

dmannose

eglucose

f galactose

gfucose

hxylose

iglucuronic acid

jgalacturonic acid

kuronic acid

lprotein content

aname from reference brhamnose carabinose dmannose eglucose f galactose gfucose hxylose iglucuronic acid jgalacturonic acid kuronic acid lprotein content We selected five relevant descriptors in MLR model, and a stable model EC50 = 0.12PC+0.083Fuc+0.013Rha-0.02UA+0.372 (R = 0.664, RMSE = 1.149, F = 8.268, p<5.17E-5) was given. According to the model, PC, Fuc, Rha and UA had significant correlation with EC50 of the hydroxyl radicals scavenging activity, and the relevant correlation coefficient was shown in Table 6.
Table 6

Correlation matrix showing inter-correlation among various parameters and EC50 of the hydroxyl radicals scavenging activity.

EC50PCFucRhaUA
EC501.000000
PCa0.5153591.000000
Fucb0.270504-0.1344351.000000
Rhac-0.093930-0.125825-0.0170841.000000
UAd-0.126494-0.0282960.167403-0.1805761.000000

aprotein content

bfucose

crhamnose

duronic acid

aprotein content bfucose crhamnose duronic acid The statistical parameters of MLR, ANN and SVM models for the train set and the test set were shown in Table 7. According to a lower RMSE and a higher R value, the results indicated that nonlinear model ANN was better than models obtained from MLR and SVM for the prediction of hydroxyl radicals scavenging activity of polysaccharides.
Table 7

Comparison of MLR, ANN and SVM models for the hydroxyl radicals scavenging activity of polysaccharides.

MethodParametersTraining setTest set
MLRR0.6640.523
RMSE1.1491.117
ANNR0.9440.857
RMSE0.1190.257
SVMR0.8360.767
RMSE0.7510.645

Sensitivity analysis from ANN

According to two ANN models, the results of sensitivity analysis were shown in Table 8. The higher sensitivity coefficient indicated that this descriptor had the more influence upon the antioxidant activity of polysaccharides. The results indicated that Ara and GalA had a great effect on DPPH-scavenging activity, and PC, UA and GalA had a great effect on hydroxyl radicals scavenging activity of polysaccharides, which was consistent with the results from MLR.
Table 8

Sensitivity analysis from ANN models.

Sensitivity coefficientsComposition
AraaGalAbPCcGlcAdUAeGlcfXylgManhGaliFucjRhak
DPPH-scavenging activity6.483.43.252.982.611.521.291.231.111.110.97
Hydroxyl radicals scavenging activity1.854.767.371.213.783.241.081.492.93.441.35

aarabinose

bgalacturonic acid

cprotein content

dglucuronic acid

euronic acid

fglucose

gxylose

hmannose

igalactose

jfucose

krhamnose

aarabinose bgalacturonic acid cprotein content dglucuronic acid euronic acid fglucose gxylose hmannose igalactose jfucose krhamnose

Conclusions

To establish quantitative structure-activity relationship (QSAR) models for antioxidant activity of polysaccharides, MLR, SVM and ANN methods were used, and polysaccharide properties (UA, PC, monosaccharide compositions, MW) as descriptors were selected. MLR models for predicting EC50 of DPPH-scavenging activity and hydroxyl radicals scavenging activity of polysaccharides consisted of four major descriptors, and the models were EC50 = 0.033Ara- 0.041GalA- 0.03GlcA- 0.025PC +0.484 and EC50 = 0.12PC +0.083Fuc +0.013Rha -0.02UA+0.372, respectively. A comparison of results from models indicated that the ANN model with R = 0.96 and RMSE = 0.018 predicted more accurately the DPPH-scavenging activity of polysaccharides than SVM and MLR models. ANN model (R = 0.933, RMSE = 0.055) was also the best one for predicting the hydroxyl radicals scavenging activity of polysaccharides. According to MLR and ANN models, Ara and GalA were most critical in determining the DPPH-scavenging activity of polysaccharides, and PC, UA and GalA had a great effect on hydroxyl radicals scavenging activity of polysaccharides. The polysaccharide of MW 4000–100000 usually owned higher DPPH-scavenging activity, but the antioxidant activity could simultaneously be affected by other polysaccharide properties. These results may provide some new insights in the complex study of polysaccharide structure and bioactivities, and we can simply predict the antioxidant activity of polysaccharide by using the established models after determining the monosaccharide composition ratios and MW. It is worth noting that the highly GalA-containing polysaccharide could exhibit significantly antioxidant activity, which might be because they owned the functional group–COOH. It has been reported that the functional groups such as–COOH, CH3CO–and–SH were generally recognized as good electron or hydrogen donors that might be related to the antioxidant activity of polysaccharides [5]. The antioxidant activity of polysaccharide was also found to correlate to complex structure such as glycosidic linkages, branch ratios, and microstructure etc, polysaccharide properties is not enough for fine detailed structure of polysaccharide, and the research on more precise structure-function relationships remained to be explored.
  50 in total

1.  Preparation of the different derivatives of the low-molecular-weight porphyran from Porphyra haitanensis and their antioxidant activities in vitro.

Authors:  Zhongshan Zhang; Quanbin Zhang; Jing Wang; Hong Zhang; Xizhen Niu; Pengcheng Li
Journal:  Int J Biol Macromol       Date:  2009-04-02       Impact factor: 6.953

2.  Characterization of polysaccharides extracted from spent coffee grounds by alkali pretreatment.

Authors:  Lina F Ballesteros; Miguel A Cerqueira; José A Teixeira; Solange I Mussatto
Journal:  Carbohydr Polym       Date:  2015-04-01       Impact factor: 9.381

3.  Optimisation of pressurised water extraction of polysaccharides from blackcurrant and its antioxidant activity.

Authors:  Yaqin Xu; Fei Cai; Zeyuan Yu; Ling Zhang; Xingguo Li; Yu Yang; Gaijie Liu
Journal:  Food Chem       Date:  2015-08-21       Impact factor: 7.514

4.  Characterization of antioxidant polysaccharides from Auricularia auricular using microwave-assisted extraction.

Authors:  Wei-Cai Zeng; Zeng Zhang; Hong Gao; Li-Rong Jia; Wu-Yong Chen
Journal:  Carbohydr Polym       Date:  2012-04-02       Impact factor: 9.381

5.  Characterization and antioxidant activities of acidic polysaccharides from Gynostemma pentaphyllum (Thunb.) Markino.

Authors:  Bo Li; Xiaoyu Zhang; Mingzhu Wang; Lili Jiao
Journal:  Carbohydr Polym       Date:  2015-03-28       Impact factor: 9.381

6.  Chemical modification, characterization and bioactivity of a released exopolysaccharide (r-EPS1) from Lactobacillus plantarum 70810.

Authors:  Kun Wang; Wei Li; Xin Rui; Teng Li; Xiaohong Chen; Mei Jiang; Mingsheng Dong
Journal:  Glycoconj J       Date:  2014-11-23       Impact factor: 2.916

7.  Structure and protective effect of exopolysaccharide from P. Agglomerans strain KFS-9 against UV radiation.

Authors:  Hongyuan Wang; Xiaolu Jiang; Haijin Mu; Xiaoting Liang; Huashi Guan
Journal:  Microbiol Res       Date:  2006-03-31       Impact factor: 5.415

8.  Antioxidant and antibacterial activities of sulphated polysaccharides from Pleurotus eryngii and Streptococcus thermophilus ASCC 1275.

Authors:  Siqian Li; Nagendra P Shah
Journal:  Food Chem       Date:  2014-05-29       Impact factor: 7.514

9.  Antioxidant and antitumor activities of 4-arylcoumarins and 4-aryl-3,4-dihydrocoumarins.

Authors:  Keyun Zhang; Weixian Ding; Jie Sun; Bin Zhang; Fujiao Lu; Ren Lai; Yong Zou; Gabriel Yedid
Journal:  Biochimie       Date:  2014-04-08       Impact factor: 4.079

10.  Purification, characterization and antioxidant activities of polysaccharides from thinned-young apple.

Authors:  Jiao Dou; Yonghong Meng; Lei Liu; Jie Li; Daoyuan Ren; Yurong Guo
Journal:  Int J Biol Macromol       Date:  2014-08-08       Impact factor: 6.953

View more
  9 in total

1.  Extraction, Purification, Physicochemical Properties, and Activity of a New Polysaccharide From Cordyceps cicadae.

Authors:  Zizhong Tang; Wenjie Lin; Yusheng Chen; Shiling Feng; Yihan Qin; Yirong Xiao; Hong Chen; Yuntao Liu; Hui Chen; Tongliang Bu; Qinfeng Li; Yi Cai; Huipeng Yao; Chunbang Ding
Journal:  Front Nutr       Date:  2022-06-09

2.  Fingerprint profiling of polysaccharides from different parts of lotus root varieties.

Authors:  Hong-Xun Wang; Yang Yi; Jie Sun; Olusola Lamikanra; Ting Min
Journal:  RSC Adv       Date:  2018-05-04       Impact factor: 3.361

3.  Extraction of Polysaccharide from Spirulina and Evaluation of Its Activities.

Authors:  Bingyue Wang; Qian Liu; Yinghong Huang; Yueling Yuan; Qianqian Ma; Manling Du; Tiange Cai; Yu Cai
Journal:  Evid Based Complement Alternat Med       Date:  2018-04-11       Impact factor: 2.629

Review 4.  Bioactive Mushroom Polysaccharides: A Review on Monosaccharide Composition, Biosynthesis and Regulation.

Authors:  Qiong Wang; Feng Wang; Zhenghong Xu; Zhongyang Ding
Journal:  Molecules       Date:  2017-06-13       Impact factor: 4.411

5.  Support vector regression-based QSAR models for prediction of antioxidant activity of phenolic compounds.

Authors:  Ying Shi
Journal:  Sci Rep       Date:  2021-04-22       Impact factor: 4.379

6.  Isolation, Chemical Characterization and Antioxidant Activity of Pectic Polysaccharides of Fireweed (Epilobium angustifolium L.).

Authors:  Sergey Popov; Vasily Smirnov; Elizaveta Kvashninova; Victor Khlopin; Fedor Vityazev; Victoria Golovchenko
Journal:  Molecules       Date:  2021-11-30       Impact factor: 4.411

7.  Research on Extraction, Structure Characterization and Immunostimulatory Activity of Cell Wall Polysaccharides from Sparassis latifolia.

Authors:  Jing Liu; Xuemeng Zhang; Jingsong Zhang; Mengqiu Yan; Deshun Li; Shuai Zhou; Jie Feng; Yanfang Liu
Journal:  Polymers (Basel)       Date:  2022-01-28       Impact factor: 4.329

Review 8.  Polysaccharides from fungi: A review on their extraction, purification, structural features, and biological activities.

Authors:  Wenli Wang; Jiaqi Tan; Lamu Nima; Yumei Sang; Xu Cai; Hongkun Xue
Journal:  Food Chem X       Date:  2022-08-08

9.  Structural Characterization and Antioxidant Activity of Polysaccharides from Athyrium multidentatum (Doll.) Ching in d-Galactose-Induced Aging Mice via PI3K/AKT Pathway.

Authors:  Liang Jing; Jing-Ru Jiang; Dong-Mei Liu; Ji-Wen Sheng; Wei-Fen Zhang; Zhi-Jian Li; Liu-Ya Wei
Journal:  Molecules       Date:  2019-09-16       Impact factor: 4.411

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.