| Literature DB >> 30766789 |
Yilong Yang1,2, Zhuyifan Ye1, Yan Su1, Qianqian Zhao1, Xiaoshan Li2, Defang Ouyang1.
Abstract
Current pharmaceutical formulation development still strongly relies on the traditional trial-and-error methods of pharmaceutical scientists. This approach is laborious, time-consuming and costly. Recently, deep learning has been widely applied in many challenging domains because of its important capability of automatic feature extraction. The aim of the present research is to apply deep learning methods to predict pharmaceutical formulations. In this paper, two types of dosage forms were chosen as model systems. Evaluation criteria suitable for pharmaceutics were applied to assess the performance of the models. Moreover, an automatic dataset selection algorithm was developed for selecting the representative data as validation and test datasets. Six machine learning methods were compared with deep learning. Results showed that the accuracies of both two deep neural networks were above 80% and higher than other machine learning models; the latter showed good prediction of pharmaceutical formulations. In summary, deep learning employing an automatic data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data was developed for the prediction of pharmaceutical formulations for the first time. The cross-disciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical research from experience-dependent studies to data-driven methodologies.Entities:
Keywords: ANNs, artificial neural networks; APIs, active pharmaceutical ingredients; Automatic dataset selection algorithm; DNNs, deep neural networks; Deep learning; ESs, expert systems; FDA, U.S. Food and Drug Administration; HPMC, hydroxypropyl methylene cellulose; MAE, mean absolute error; MD-FIS, the Maximum Dissimilarity algorithm with the small group filter and representative initial set selection; MLR, multiple linear regression; OFDF, oral fast disintegrating films; Oral fast disintegrating films; Oral sustained release matrix tablets; PLSR, partial least squared regression; Pharmaceutical formulation; QSAR, quantitative structure activity relationships; QbD, quality by design; RF, random forest; RMSE, root mean squared error; SRMT, sustained release matrix tablets; SVM, support vector machine; Small data; k-NN, k-nearest neighbors
Year: 2018 PMID: 30766789 PMCID: PMC6362259 DOI: 10.1016/j.apsb.2018.09.010
Source DB: PubMed Journal: Acta Pharm Sin B ISSN: 2211-3835 Impact factor: 11.413
Recent progress of machine learning in formulation design.
| Machine learning techniques | Formulation | Ref. |
|---|---|---|
| Hybrid expert system with ANNs | Hard gelatin capsule formulations | |
| Expert system (SeDeM Diagram) | Orally disintegrating tablets | |
| Expert system with ANNs | Osmotic pump tablets | |
| Ontology-based expert system | Immediate release tablets | |
| ME_expert 2.0 | Microemulsions formulations | |
| Fuzzy logic-based expert system | Freeze-dried formulations | |
| Cubist and Random Forest | Cyclodextrin formulations |
Figure 1The workflow of MD-FIS algorithm.
Results of the conventional machine learning models and the deep neural network on the OFDF training, validation and test sets.
| Machine learning technique | Training set | Validation set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | RMSE | MAE | Accuracy (%) | RMSE | MAE | Accuracy (%) | RMSE | MAE | |
| MLR | 90.11 | 0.0671 | 0.0508 | 60.00 | 0.1311 | 0.0999 | 65.00 | 0.1778 | 0.1183 |
| PLSR | 76.92 | 0.0917 | 0.0705 | 70.00 | 0.1136 | 0.0835 | 70.00 | 0.0970 | 0.0705 |
| SVM | 79.12 | 0.1136 | 0.0711 | 70.00 | 0.1308 | 0.0959 | 75.00 | 0.1039 | 0.0795 |
| ANN | 74.73 | 0.1140 | 0.0809 | 70.00 | 0.1105 | 0.0846 | 70.00 | 0.0959 | 0.0772 |
| RF | 84.62 | 0.0775 | 0.0567 | 80.00 | 0.0917 | 0.0721 | 70.00 | 0.1068 | 0.0774 |
| k-NN | 80.22 | 0.0975 | 0.0649 | 75.00 | 0.1025 | 0.0727 | 75.00 | 0.0877 | 0.0608 |
| DNN | 97.80 | 0.0420 | 0.0307 | 80.00 | 0.0842 | 0.0705 | 80.00 | 0.0714 | 0.0565 |
Results of the conventional machine learning models and the deep neural network on the SRMT training, validation, and test sets.
| Machine learning technique | Training set | Validation set | Test set | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy (%) | RMSE | MAE | Accuracy (%) | RMSE | MAE | Accuracy (%) | RMSE | MAE | |
| MLR | 52.38 | 0.1356 | 0.1031 | 35.00 | 0.1212 | 0.1042 | 25.00 | 0.2182 | 0.1685 |
| PLSR | 55.24 | 0.1446 | 0.1066 | 55.00 | 0.1175 | 0.0961 | 45.00 | 0.1609 | 0.1203 |
| SVM | 60.95 | 0.1568 | 0.1013 | 50.00 | 0.1170 | 0.0960 | 45.00 | 0.1559 | 0.1147 |
| ANN | 57.14 | 0.1330 | 0.0998 | 50.00 | 0.1389 | 0.1137 | 50.00 | 0.1497 | 0.1124 |
| RF | 76.19 | 0.0975 | 0.0692 | 55.00 | 0.1308 | 0.1045 | 55.00 | 0.1170 | 0.0908 |
| k-NN | 64.76 | 0.1229 | 0.0825 | 45.00 | 0.1526 | 0.1264 | 40.00 | 0.1565 | 0.1306 |
| DNN | 99.05 | 0.0335 | 0.0237 | 80.00 | 0.0967 | 0.0660 | 80.00 | 0.0902 | 0.0673 |
Figure 2Comparing the experimental- and the deep learning-predicted disintegration time of the formulations in the OFDF test set.
f2 values between the experimental and the deep learning predicted cumulative drug released curves of the formulations in the SRMT test set.
| Formulation | Formulation | ||
|---|---|---|---|
| 1 | 77.42 | 11 | 65.72 |
| 2 | 63.35 | 12 | 90.05 |
| 3 | 64.84 | 13 | 57.05 |
| 4 | 67.21 | 14 | 41.91 |
| 5 | 59.75 | 15 | 55.06 |
| 6 | 50.85 | 16 | 65.84 |
| 7 | 77.77 | 17 | 51.08 |
| 8 | 30.39 | 18 | 49.57 |
| 9 | 44.56 | 19 | 64.42 |
| 10 | 74.47 | 20 | 59.35 |
Figure 3Relationship between the experimental- and the deep learning-predicted values of the disintegration time on the OFDF training, validation and test sets. The dotted line indicates experimental values ±10 s.
Figure 4Relationship between the experimental- and the deep learning-predicted values of the cumulative drug release percentages at 2, 4, 6, and 8 h on the SRMT training set. A is for the values at 2 h, B is for the values at 4 h, C is for the values at 6 h, D is for the values at 8 h.
Figure 5Relationship between the experimental- and the deep learning-predicted values of the cumulative drug release percentages at 2, 4, 6, and 8 h on the SRMT validation set. A is for the values at 2 h, B is for the values at 4 h, C is for the values at 6 h, D is for the values at 8 h.
Figure 6Relationship between the experimental- and the deep learning-predicted values of the cumulative drug release percentages at 2, 4, 6, and 8 h on the SRMT test set. A is for the values at 2 h, B is for the values at 4 h, C is for the values at 6 h, D is for the values at 8 h.