| Literature DB >> 35283560 |
Ye Lim Jung1, Hyoung Sun Yoo1,2, JeeNa Hwang1.
Abstract
New drug development guarantees a very high return on success, but the success rate is extremely low. Pharmaceutical companies have attempted to use various strategies to increase the success rate of drug development, but this goal has been difficult to achieve. In this study, we developed a model that can guide effective decision-making at the planning stage of new drug development by leveraging machine learning. The Drug Development Recommendation (DDR) model, we present here, is a hybrid model for recommending and/or predicting drug groups suitable for development by individual pharmaceutical companies. It combines association rule learning, collaborative filtering, and content-based filtering approaches for enterprise-customized recommendations. In the case of content-based filtering applying a random forest classification algorithm, the accuracy and area under curve were 78% and 0.74, respectively. In particular, the DDR model was applied to predict the success probability of companies developing Coronavirus disease 2019 (COVID-19) vaccines. It was demonstrated that the higher the predicted score from the DDR model, the more progress in the clinical phase of the COVID-19 vaccine development. Although our approach has limitations that should be improved, it makes scientific as well as industrial contributions in that the DDR model can support rational decision-making prior to initiating drug development by considering not only technical aspects but also company-related variables.Entities:
Keywords: COVID-19 vaccine development prediction; Decision support model; Drug development recommendation; Hybrid recommender system; Pharmaceutical portfolio management
Year: 2022 PMID: 35283560 PMCID: PMC8902892 DOI: 10.1016/j.eswa.2022.116825
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 6.954
Summary of the predictor variables of drug features and company profiles.
| Category | Variables | Type | Descriptive statistics | |||||
|---|---|---|---|---|---|---|---|---|
| Unit | Mean | Median | Min. | Max. | N | |||
| Drug features | Mode of Administration | Multi-label | Buccal, implant, inhalation, injection, intranasal, lingual, oral, topical, transdermal, vaginal | |||||
| Market size of drug class | Continuous | Million USD | 10,730 | 4,527 | 69 | 98,331 | 77 | |
| Company profiles | Country/Region of Incorporation | Multi-label | 34 countries | 367 | ||||
| Company Type | Multi-label | Assets/Products, Government Institution, Private Company, Public Company, Public Investment Firm | 386 | |||||
| Stock Exchange | Multi-label | 40 exchanges | 275 | |||||
| Number of marketed drugs | Continuous | ea | 5 | 2 | 1 | 70 | 957 | |
| Number of Employees | Continuous | person | 5,549 | 440 | 4 | 132,200 | 359 | |
| Year Founded | Continuous | year | 1975 | 1992 | 1678 | 2018 | 347 | |
| Market Capitalization | Continuous | Million USD | 11,809 | 835 | 4 | 344,850 | 280 | |
| Total Revenue | Continuous | Million USD | 2,231 | 133 | 0.03 | 72,209 | 369 | |
| Total Equity | Continuous | Million USD | 2,294 | 162 | −1,805 | 70,109 | 369 | |
| R&D Expense | Continuous | Million USD | 379 | 26 | 0 | 10,944 | 369 | |
| Total Liabilities | Continuous | Million USD | 3,082 | 132 | 0.31 | 94,586 | 362 | |
| Return on Assets | Continuous | % | −5 | 2 | −216 | 64 | 350 | |
| Return on Equity | Continuous | % | −39 | 5 | −2,550 | 126 | 334 | |
| Gross Margin | Continuous | % | 53 | 64 | −276 | 184 | 341 | |
| Earnings from Cont. Ops. Margin | Continuous | % | −17 | 6 | −282 | 389 | 297 | |
| Total Revenues 1 Yr Growth | Continuous | % | 530 | 8 | −99 | 71,068 | 335 | |
| Total Debt/ Equity | Continuous | % | 108 | 38 | 0 | 4,063 | 246 | |
| Total Asset Turnover | Continuous | – | 0.530 | 0.462 | 0.001 | 3.650 | 358 | |
| Current Ratio | Continuous | – | 4.223 | 2.625 | 0.139 | 39.900 | 360 | |
| Number of Total Professionals Profiled | Continuous | person | 35 | 27 | 1 | 192 | 345 | |
Out of a total of 957 companies, companies for which information was not available were not included in the summary statistics.
Market size is based on the global market in 2018.
Company profiles are based on the values for 2017.
The countries and stock exchanges included in the dataset are described in Table S1 and S2 in Supplementary material.
Fig. 1The framework of the DDR model for new drug development planning. The model consists of three main parts: data input, model building, and output of recommendation/prediction. The data on drug development portfolios by pharmaceutical company, company profiles, and drug features are entered to build the DDR model. Individual models of associate rule learning, CF, and CBF are constructed in the DDR model and the recommendation scores of each model are incorporated by weighted linear combination. The DDR model outputs the recommendations for enterprise-specific drug development and the predictions of companies with high probability of success in the development of a particular drug.
Summary statistics for the input dataset of the DDR model (Summary of the company-drug counting matrix).
| Number of pharmaceutical companies | Number of drug classes | Number of drug classes by pharmaceutical company | |||
|---|---|---|---|---|---|
| Mean (SD) | Median (IQR) | Min. | Max. | ||
| 957 | 77 | 3.655 (5.121) | 2 (3) | 1 | 42 |
Multiple counting was allowed if a single drug corresponds to multiple drug class codes and if several companies are involved in the development of a single drug.
Summary statistics for the discovered 1,834 association rules on drug development. The length of rules indicates the number of drug classes included in the discovered rules.
| Length of rules | Support | Confidence | Lift | |
|---|---|---|---|---|
| Mean | 4.510 | 0.036 | 0.906 | 5.002 |
| Median | 4.000 | 0.033 | 0.909 | 5.151 |
| Min. | 2.000 | 0.031 | 0.800 | 1.908 |
| Max. | 7.000 | 0.090 | 1.000 | 11.584 |
Fig. 2(a) Confidence-support plot on the generated 1,834 association rules on drug development. The confidence and support values of each generated rule are shown as a scatter plot. (b) A grouped matrix with antecedent groups (LHS) as columns and consequents (RHS) as rows on the top 20 association rules based on the lift value of the generated rules. The lift value decreases from top to down and from left to right (i.e., the color of the bubble varies from red to grey in descending order.).
Comparison of the accuracy of the algorithms applied in the CF approach using the metrics of RMSE and MAE.
| RMSE | MAE | |
|---|---|---|
| UBCF | 0.963 | 0.739 |
| IBCF | 1.210 | 0.928 |
| SVD | 0.962 | 0.777 |
| SVDF | 0.937 | 0.781 |
| Random | 1.204 | 0.918 |
Fig. 3Comparison of the ROC curve of the algorithms applied in the CF approach. The number from 1 to 20 indicates the number of recommended items (drug classes).
Comparison of the evaluation metrics of accuracy, sensitivity, specificity, and AUC of the algorithms applied in the CBF approach.
| Classifier | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|
| Decision tree | 0.653 | 0.798 | 0.645 | 0.721 |
| Random forest | 0.775 | 0.708 | 0.779 | 0.744 |
| SVM | 0.826 | 0.625 | 0.838 | 0.731 |
| kNN | 0.742 | 0.617 | 0.749 | 0.683 |
Fig. 4(a) The ROC curve for the random forest-based CBF model. (b) The variable importance on the success of drug development obtained from the random forest-based CBF model.
Fig. 5Scaled heatmap depicting the results of recommendations by pharmaceutical company and drug class generated from the DDR model. This map shows that the priorities of the drug classes recommended for development are different for individual companies. The x-axis indicates drug class and the y-axis indicates the company. The company names are indicated anonymously. The color of each cell presents the normalized total recommendation scores (S) obtained from the DDR model.
Comparison of the degree of advancement in the clinical trial phase with the prediction scores (S) obtained from the DDR model. The CBF approach was weighted 100% to generate the DDR model in this case. The ‘Phase advanced’ of zero indicates cases with no progress in the clinical trial phase during the analysis period. Likewise, ‘Phase advanced’ of 3 indicates that there have been three phases of progress in the clinical trial, such as from preclinical to phase 2.
| Phase advanced | Prediction score (mean) | Number of companies |
|---|---|---|
| 0 | 0.029 | 41 |
| 1 | 0.159 | 23 |
| 2 | 0.188 | 14 |
| 3 | 0.411 | 8 |