| Literature DB >> 25709436 |
Varun Kumar Ojha1, Konrad Jackowski2, Ajith Abraham3, Václav Snášel1.
Abstract
Prediction of poly(lactic-co-glycolic acid) (PLGA) micro- and nanoparticles' dissolution rates plays a significant role in pharmaceutical and medical industries. The prediction of PLGA dissolution rate is crucial for drug manufacturing. Therefore, a model that predicts the PLGA dissolution rate could be beneficial. PLGA dissolution is influenced by numerous factors (features), and counting the known features leads to a dataset with 300 features. This large number of features and high redundancy within the dataset makes the prediction task very difficult and inaccurate. In this study, dimensionality reduction techniques were applied in order to simplify the task and eliminate irrelevant and redundant features. A heterogeneous pool of several regression algorithms were independently tested and evaluated. In addition, several ensemble methods were tested in order to improve the accuracy of prediction. The empirical results revealed that the proposed evolutionary weighted ensemble method offered the lowest margin of error and significantly outperformed the individual algorithms and the other ensemble techniques.Entities:
Keywords: ensemble; feature selection; protein dissolution; regression models
Mesh:
Substances:
Year: 2015 PMID: 25709436 PMCID: PMC4327564 DOI: 10.2147/IJN.S71847
Source DB: PubMed Journal: Int J Nanomedicine ISSN: 1176-9114
The PLGA dataset description
| Sl No | Group name | No of features | Importance |
|---|---|---|---|
| 1 | Protein descriptors | 85 | Describes the type of molecules and proteins used |
| 2 | Formulation characteristics | 17 | Describe the molecular properties such as molecular weight, particle size, etc |
| 3 | Plasticizer | 98 | Describe the properties such as fluidity of the material used |
| 4 | Emulsifier | 99 | Describe the properties of stabilizing/increase the pharmaceutical product life |
| 5 | Time in days | 1 | Time taken to dissolve |
| 6 | % of molecules dissolved | 1 | Output |
Abbreviations: PLGA, poly(lactic-co-glycolic acid); SI, serial; No, number.
Figure 1Evolutionary weighted ensemble algorithm.
Parameters setting of the respective regression models used for the feature selection and feature extraction experiments
| Predictor | Parameters |
|---|---|
| GPReg | RBF kernel, gamma value = 1.0 |
| LReg | – |
| MLP | Three-layer MLP, hidden layer nodes - 50, learning rate -0.3, momentum rate - 0.2 |
| SMOReg | Polynomial kernel, epsilon value - 0.001, tolerance level - 0.001 |
| REP tree | Max depth – no restriction |
Abbreviations: GPReg, Gaussian process regression; RBF, radial basis function; LReg, linear regression; MLP, multilayer perception; SMOReg, sequential minimal optimization regression; REP, reduced error pruning; –, no such parameter.
Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature selection technique (Identification of regression model)
| Selection method | Selected features | GPReg | LReg | MLP | REP | SMOReg |
|---|---|---|---|---|---|---|
| No selection | 300 | 16.81 | 17.07 | 18.57 | 13.05 | 17.95 |
| BFE | 1 | 27.47 | 26.61 | 28.33 | 24.37 | 26.97 |
| BFE | 5 | 17.11 | 23.45 | 23.11 | 14.23 | 23.38 |
| CFS | 5 | 20.80 | 25.08 | 22.41 | 18.31 | 25.42 |
| Class-MLP-greedy | 7 | 17.96 | 25.03 | 22.26 | 14.96 | 25.35 |
| BFE | 10 | 15.93 | 19.98 | 21.00 | 13.19 | 19.53 |
| Class-MLP-BFS | 15 | 15.88 | 22.90 | 16.83 | 13.91 | 24.23 |
| Wrapper-GPReg-greedy | 15 | 14.88 | 20.22 | 15.20 | 13.34 | 20.86 |
| Class-GPReg-BFS | 16 | 18.46 | 23.07 | 19.71 | 14.19 | 23.69 |
| Class-GPReg-greedy | 19 | 15.06 | 19.05 | 15.61 | 14.03 | 19.68 |
| Wrapper-MLP-greedy | 19 | 16.44 | 24.01 | 20.42 | 14.26 | 24.85 |
| Wrapper-LReg-greedy | 24 | 15.91 | 17.46 | 17.03 | 13.54 | 18.02 |
| BFE | Optimal | 15.71 | 17.85 | 17.82 | 13.90 | 17.88 |
| Class-LReg-BFS | 31 | 15.95 | 16.92 | 15.63 | 14.00 | 17.58 |
| Class-LReg-greedy | 37 | 16.31 | 17.14 | 16.27 | 14.02 | 17.69 |
Notes: Values are the average of ten RMSE.
Optimal set of attributes for the GPReg, LReg, MLP, REP and SMOReg regression models are 18, 32, 31, 31, and 30, respectively.
Abbreviations: 10-CV, ten-fold cross-validation; GPReg, gaussian process regression; LReg, linear regression; MLP, multilayer perception; REP, reduced error pruning; SMOReg, sequential minimal optimization; No, number; BFE, backward feature elimination; CFS, correlation-based feature selection; BFS, best fit search; wrapper, wrapper feature selection; greedy, greedy search; class, classifier-based feature selection.
Experimental results for 10-CV datasets prepared with distinct random partitions of the complete dataset using feature extraction techniques
| Feature extraction method | Regression model | Dimension reduction
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5
| 10
| 20
| 30
| 50
| ||||||||
| Mean | VAR | Mean | VAR | Mean | VAR | Mean | VAR | Mean | VAR | |||
| Linear method | PCA | GPReg | 28.88 | 1.62 | 27.22 | 3.00 | 24.80 | 3.85 | 19.82 | 2.49 | 16.08 | 3.16 |
| LReg | 29.55 | 1.74 | 29.22 | 1.70 | 27.73 | 2.21 | 23.93 | 1.63 | 17.17 | 2.79 | ||
| MLP | 30.36 | 3.36 | 29.77 | 6.37 | 26.58 | 3.98 | 19.89 | 2.27 | 13.59 | 1.56 | ||
| SMOReg | 30.14 | 3.17 | 29.78 | 3.62 | 27.95 | 2.67 | 24.31 | 1.89 | 17.66 | 3.09 | ||
| FA | GPReg | 29.23 | 1.77 | 28.56 | 2.67 | 28.31 | 3.34 | 28.30 | 3.42 | 28.26 | 3.31 | |
| LReg | 29.97 | 1.77 | 29.97 | 1.77 | 29.97 | 1.77 | 29.97 | 1.77 | 29.98 | 1.82 | ||
| MLP | 30.64 | 2.02 | 30.50 | 1.91 | 31.01 | 1.83 | 30.93 | 2.30 | 30.91 | 0.77 | ||
| SMOReg | 30.28 | 3.45 | 30.28 | 3.45 | 30.26 | 3.37 | 30.29 | 3.44 | 30.28 | 3.46 | ||
| Non-linear method | Kernel | GPReg | 28.60 | 1.68 | 27.08 | 2.12 | 24.96 | 1.96 | 24.32 | 2.17 | 22.81 | 4.43 |
| PCA | LReg | 29.31 | 1.52 | 28.05 | 1.78 | 25.35 | 2.05 | 25.17 | 2.23 | 22.98 | 4.27 | |
| MLP | 29.81 | 3.57 | 29.65 | 7.94 | 27.07 | 4.09 | 25.97 | 5.52 | 25.27 | 8.49 | ||
| SMOReg | 29.43 | 1.41 | 28.68 | 1.65 | 25.90 | 1.70 | 25.79 | 2.00 | 23.24 | 4.76 | ||
| MDS | GPReg | 28.91 | 2.17 | 28.73 | 2.47 | 28.41 | 3.16 | 28.24 | 3.17 | 28.16 | 3.27 | |
| LReg | 29.56 | 1.86 | 29.21 | 2.08 | 29.19 | 2.08 | 29.11 | 1.92 | 29.14 | 2.04 | ||
| MLP | 30.42 | 3.71 | 29.38 | 4.11 | 29.93 | 3.10 | 30.01 | 4.53 | 29.98 | 4.42 | ||
| SMOReg | 29.98 | 2.62 | 29.64 | 2.55 | 29.64 | 2.76 | 29.66 | 2.85 | 29.65 | 2.89 | ||
Note: Mean and variance (VAR) is computed on ten RMSE obtained.
Abbreviations: 10-CV, ten-fold cross-validation; RMSE, root mean square error; PCA, principal component analysis; FA, factor analysis; MDS, multidimensional scaling; GPReg, Gaussian process regression; LReg, linear regression; MLP, multilayer perception; SMOReg, sequential minimal optimization regression.
Figure 2Results of the feature extraction experiment for the reduced dimension set of 30 features: a comparison between the regression models. a comparison using average RMSE (A); a comparison using variances (B).
Abbreviations: RMSE, root mean square error; ICA, independent component analysis; PCA, principle component analysis; FA, factor analysis; kPCA, kernel PCA; MDS, multidimensional scaling; GPReg, Gaussian process regression; LReg, linear regression; MLP, multilayer perception; SMOReg, sequential minimal optimization regression.
A comprehensive conclusion of the results obtained from each regression model, including the ensemble techniques used
|
|