| Literature DB >> 29072137 |
Jian-Yu Shi1, Jia-Xin Li2, Ke Gao3, Peng Lei4, Siu-Ming Yiu5.
Abstract
BACKGROUND: Drug Combination is one of the effective approaches for treating complex diseases. However, determining combinative drug pairs in clinical trials is still costly. Thus, computational approaches are used to identify potential drug pairs in advance. Existing computational approaches have the following shortcomings: (i) the lack of an effective integration of heterogeneous features leads to a time-consuming training and even results in an over-fitted classifier; and (ii) the narrow consideration of predicting potential drug combinations only among known drugs having known combinations cannot meet the demand of realistic screenings, which pay more attention to potential combinative pairs among newly-coming drugs that have no approved combination with other drugs at all.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29072137 PMCID: PMC5657064 DOI: 10.1186/s12859-017-1818-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Three scenarios in predicting drug combination. Nodes are drugs, among which known drugs are labeled by ‘A’ ~ ‘E’ and new drugs are labeled by ‘x’ and ‘y’. Edges, represented by solid lines, denote approved combinations between drug pairs. Dotted lines show the three scenarios for prediction corresponding to S1, S2 and S3. The drugs involved in the prediction, are filled with colors
Fig. 2The flowchart of predicting drug combination by integrating heterogeneous sources of drugs. The pair of d and d is input into three classifier models, which are trained by three kinds of features of drug pairs, including DDI, DTI and SE. The confidence scores of the pair being a potential drug pair are reported by those classifiers and are further integrated with its ATC-based similarity entry. The average of these scores is taken as the final confidence score of indicating how likely the pair can be a combinative drug pair
Fig. 3Comparison of the original SE feature vectors and the PCA-processed SE feature vectors. a The values of AUC and (b) the values of AUPR in three scenarios. Left bars and right bars are the results generated by the original SE feature and the PCA-processed SE feature
Comparison when using individual features and fusion schemes
| S1 | S2 | S3 | ||||
|---|---|---|---|---|---|---|
| AUC | AUPR | AUC | AUPR | AUC | AUPR | |
| DDI | 0.816 | 0.621 | 0.694 | 0.343 | 0.706 | 0.382 |
| DTI | 0.727 | 0.539 | 0.737 | 0.275 | 0.609 | 0.210 |
| SE | 0.871 | 0.717 | 0.818 | 0.542 | 0.707 | 0.411 |
| ATC | 0.792 | 0.393 | 0.773 | 0.378 | 0.708 | 0.422 |
| Average* |
|
|
|
|
|
|
| Direct* |
|
|
|
|
|
|
| Greedy* |
|
|
|
|
|
|
The marks * denote three schemes of fusion. The bold entries highlight the results achieved by the fusion schemes
Predicting performance with different classifiers
| S1 | S2 | S3 | ||||
|---|---|---|---|---|---|---|
| AUC | AUPR | AUC | AUPR | AUC | AUPR | |
| LR |
|
|
| 0.635 | 0.809 | 0.592 |
| SVM_Linear | 0.904 | 0.639 | 0.856 | 0.470 | 0.720 | 0.373 |
| SVM_RBF | 0.938 |
| 0.904 |
|
|
|
LR is logistic regression, SVM_Linear and SVM_RBF are the SVMs with linear kernel and RBF kernel respectively. The cost parameter is fixed with 100 and the sharp parameter γ of RBF are assigned with 0.02, 0.05 and 0.001 in S1, S2 and S3 respectively when training SVM. The bold entries highlight the best results
Estimated Separability of positive and negative instances using different features
| DDI | ATC | SE | DTI | |
|---|---|---|---|---|
| Separability | 0.6370 | 0.7218 | 0.8065 | 0.5822 |