| Literature DB >> 27315205 |
Hossam M Zawbaa1,2, Jakub Szlȩk3, Crina Grosan1,4, Renata Jachowicz3, Aleksander Mendyk3.
Abstract
Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27315205 PMCID: PMC4912096 DOI: 10.1371/journal.pone.0157610
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Construction of the data set.
Parameter setting for the bio-inspired methods.
| Parameter | Value(s) |
|---|---|
| K for cross validation | 10 |
| No. of search agents | 8 |
| No. of iterations | 100 |
| Problem dimension | Number of features in the data |
| Search domain in binary algorithms | {0, 1} |
| Search domain in continuous algorithms | [0, 1] |
| 0.99 | |
| 0.01 |
PLGA features selected by the optimization algorithms.
| FS method | No. Inputs | Input Labels |
|---|---|---|
| 8 | ||
| 12 | ||
| 20 | ||
| 9(1) | ||
| 9(2) | ||
| 9(3) | ||
| 11 | ||
| 12 | ||
| 15 | ||
| 18 | ||
| 24 | ||
| 25 | ||
| 26 | ||
| 8 | ||
| 13 | ||
| 19 | ||
| 22 |
Fig 2Graph showing frequencies of occurrence together with connections between selected vectors of features.
Connections are drawn when variable is present in at least two subsets of features.
NRMSE for input vectors selected by bio-inspired algorithms.
| FS method | No. Inputs | Cubist | Mon-mlp | MARS | CART | RF | fugeR |
|---|---|---|---|---|---|---|---|
| 8 | 22.45 | 24.55 | 25.90 | 27.64 | 21.95 | - | |
| 12 | 25.95 | 25.15 | 25.30 | 26.13 | 22.19 | - | |
| 20 | 18.73 | 20.20 | 21.45 | 21.30 | 16.33 | 20.15 | |
| 9(1) | 21.20 | 20.63 | 21.28 | 24.17 | 18.81 | - | |
| 9(2) | 18.26 | 17.31 | 20.86 | 20.99 | 15.97 | 18.09 | |
| 9(3) | 22.60 | 21.88 | 22.15 | 23.80 | 19.79 | - | |
| 11 | 19.40 | 19.35 | 21.37 | 24.25 | 18.70 | - | |
| 12 | 17.26 | 18.17 | 21.41 | 22.33 | 16.56 | 18.73 | |
| 15 | 19.30 | 18.88 | 20.64 | 20.10 | 16.73 | 19.10 | |
| 18 | 20.65 | 18.58 | 21.34 | 22.21 | 17.63 | - | |
| 24 | 20.30 | 22.30 | 21.40 | 22.88 | 17.90 | - | |
| 25 | 20.04 | 19.29 | 19.10 | 20.50 | 15.86 | 19.10 | |
| 26 | 17.32 | 22.22 | 21.74 | 19.97 | 16.22 | - | |
| 8 | 30.49 | 31.12 | 30.97 | 32.60 | 28.89 | - | |
| 13 | 27.09 | 25.82 | 26.45 | 25.60 | 24.86 | - | |
| 19 | 22.78 | 17.90 | 21.74 | 21.70 | 17.15 | - | |
| 22 | 22.78 | 18.48 | 21.74 | 21.59 | 17.20 | - |
Results for BALO 9(2), trained and tested on 10cv data sets.
| Algorithm | NRMSE | R2 |
|---|---|---|
| 18.26 | 0.611 | |
| 17.31 | 0.652 | |
| 16.87 | 0.655 | |
| 18.09 | 0.612 | |
| 15.97 | 0.692 | |
| 19.97 | 0.571 | |
| 19.10 | 0.591 |
Comparison of selected features by BALO and obtained by Szlȩk et al. [10].
| BALO 9in(2) | Szlȩk et al. [ |
|---|---|
| No of hydrogen atoms–protein descriptor. | Szeged index–protein descriptor. |
| No of heteroaromatic rings–protein descriptor. | pI–protein descriptor. |
| No of rings–protein descriptor. | Quaternary structure of macromolecule: 1- monomer, 2–dimer–protein descriptor. |
| No of heteroaliphatic rings–protein descriptor. | Lactide to Glycolide in polymer ratio—formulation characteristics. |
| PVA inner phase concentration (%)—formulation characteristics. | PVA inner phase concentration (%)—formulation characteristics. |
| Heteroaliphatic ring count–plasticizer descriptor. | PVA outer phase concentration (%)—formulation characteristics. |
| LogD at pH 9–plasticizer descriptor. | Encapsulation rate (%)—formulation characteristics. |
| No of donor atoms–emulsifier descriptor. | Mean particle size (μm)—formulation characteristics. |
| Time (days)–assay conditions. | Dissolution pH–assay conditions. Production method: 1–w/o/w, 2–s/o/w, 3–s/o/o,4–spray-dried—formulation characteristics. Time (days)–assay conditions. |
Fig 3Examples of predicted vs. observed dissolution profiles.
RF model was used to predict profile.