| Literature DB >> 22718037 |
Mei Liu1, Yonghui Wu, Yukun Chen, Jingchun Sun, Zhongming Zhao, Xue-wen Chen, Michael Edwin Matheny, Hua Xu.
Abstract
OBJECTIVE: Adverse drug reaction (ADR) is one of the major causes of failure in drug development. Severe ADRs that go undetected until the post-marketing phase of a drug often lead to patient morbidity. Accurate prediction of potential ADRs is required in the entire life cycle of a drug, including early stages of drug design, different phases of clinical trials, and post-marketing surveillance.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22718037 PMCID: PMC3392844 DOI: 10.1136/amiajnl-2011-000699
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1Overview of the proposed framework for drug surveillance. Different combinations of features can be used for different phases of drug surveillance. Chemical structures and relevant proteins of drugs can be combined to predict potential adverse drug reactions (ADRs) in the early phase of drug development. As drug indication and other ADRs become available, they can be integrated with chemical and biological information for post-market surveillance.
Data features integrated in this study
| Feature type | Specific feature | Source | Dimension |
| Chemical | Substructures | PubChem | 881 |
| Biological | Targets | DrugBank | 786 |
| Transporters | DrugBank | 72 | |
| Enzymes | DrugBank | 111 | |
| Pathways | KEGG | 173 | |
| Phenotypic | Treatment indications | SIDER | 869 |
| Other side effects | SIDER | 1384 |
Clinical validation examples of cerivastatin and rofecoxib
| UMLS CUI | Known ADRs in SIDER | Chem | Chem+bio | Chem+bio+pheno |
| Cerivastatin (Baycol) | ||||
| C0035410 | Rhabdomyolysis | No | Yes | Yes |
| C0026848 | Myopathy | No | Yes | Yes |
| C0027121 | Myositis | No | Yes | Yes |
| C0231528 | Myalgia | Yes | Yes | Yes |
| C0026821 | Muscle cramps | No | Yes | Yes |
| C0011633 | Dermatomyositis | No | No | No |
| C0027080 | Myoglobinuria | No | No | No |
| Group above ADRs | Yes | Yes | Yes | |
| Rofecoxib (Vioxx) | ||||
| C0027051 | Myocardial infarction | No | No | Yes |
| C0008031 | Chest pain | No | No | Yes |
| C0004238 | Atrial fibrillation | No | No | No |
| C0018802 | Congestive heart failure | No | No | No |
| Group above ADRs | Yes | Yes | Yes | |
ADR, adverse drug reaction; Bio, biological property; Chem, chemical structure; CUI, concept unique identifier; Pheno, phenotypic property; UMLS, unified medical language system.
Figure 2Receiver operating characteristic curves in fivefold cross-validation for various feature sets using support vector machine: (1) chemical structures, ‘chem’; (2) biological properties, ‘bio’; (3) phenotypic properties, ‘pheno’; (4) chemical and biological properties, ‘chem+bio’; (5) chemical and phenotypic properties, ‘chem+pheno’; (6) chemical, biological, and phenotypic properties, ‘chem+bio+pheno’.
Feature comparison—performance of SVM over all versus common ADRs
| Feature set | ADR_All | ADR_50+ | ||||||
| AUC | ACC | Precision | Recall | AUC | ACC | Precision | Recall | |
| Chem | 0.9054 | 0.9538 | 0.4337 | 0.4925 | 0.7659 | 0.8268 | 0.4539 | 0.5569 |
| Bio | 0.9069 | 0.9543 | 0.4324 | 0.5043 | 0.7729 | 0.8287 | 0.4666 | 0.5521 |
| Pheno | 0.9542 | 0.9678 | 0.6607 | 0.6460 | 0.9175 | 0.8891 | 0.6933 | 0.7142 |
| Chem+bio | 0.9098 | 0.9551 | 0.4623 | 0.5008 | 0.7849 | 0.8327 | 0.4776 | 0.5728 |
| Chem+pheno | 0.9526 | 0.9669 | 0.6488 | 0.6443 | 0.9141 | 0.8857 | 0.6757 | 0.7215 |
| Chem+bio+pheno | 0.9524 | 0.9669 | 0.6617 | 0.6306 | 0.9138 | 0.8856 | 0.6750 | 0.7227 |
ADR_All considers all ADRs and ADR_50+ are the common ADRs caused by at least 50 drugs. All AUC, ACC, Precision, and Recall are micro-averages across ADRs in the corresponding dataset.
ACC, accuracy; ADR, adverse drug reaction; AUC, area under the receiver operating characteristic curve; Bio, biological property; Chem, chemical structure; Pheno, phenotypic property; SVM, support vector machine.
Figure 3Receiver operating characteristic curves in fivefold cross-validation on various feature sets for common adverse drug reactions using support vector machine: (1) chemical structures, ‘chem’; (2) biological properties, ‘bio’; (3) phenotypic properties, ‘pheno’; (4) chemical and biological properties, ‘chem+bio’; (5) chemical and phenotypic properties, ‘chem+pheno’; (6) chemical, biological, and phenotypic properties, ‘chem+bio+pheno’.
Figure 4Receiver operating characteristic curves for method comparison. KNN, K-nearest neighbor; LR, logistic regression; NB, naïve Bayes; RF, random forest; SVM, support vector machine.
Algorithm comparison using the full feature set over all versus common ADRs
| Method | ADR_All | ADR_50+ | ||||||
| AUC | ACC | Precision | Recall | AUC | ACC | Precision | Recall | |
| LR | 0.9102 | 0.9486 | 0.4152 | 0.5671 | 0.7648 | 0.8023 | 0.5321 | 0.6908 |
| NB | 0.9116 | 0.9527 | 0.3537 | 0.6302 | 0.8627 | 0.8431 | 0.3929 | 0.7214 |
| KNN | 0.9161 | 0.9595 | 0.5300 | 0.5787 | 0.8508 | 0.8530 | 0.5633 | 0.6401 |
| RF | 0.9491 | 0.9653 | 0.6310 | 0.6250 | 0.9052 | 0.8784 | 0.6522 | 0.7057 |
| SVM | 0.9524 | 0.9669 | 0.6617 | 0.6306 | 0.9141 | 0.8857 | 0.6750 | 0.7227 |
The full feature set here refers to chemical + biological + phenotypic properties. ADR_All considers all ADRs, and ADR_50+ are the common ADRs caused by at least 50 drugs. All AUC, ACC, Precision, and Recall are micro-averages across ADRs in the corresponding dataset.
ACC, accuracy; ADR, adverse drug reaction; AUC, area under the receiver operating characteristic curve; KNN, K-nearest neighbor; LR, logistic regression; NB, naïve Bayes; RF, random forest; SVM, support vector machine.
Figure 5Overlap of the true positive predictions using CHEM (chemical structure), BIO (biological properties), or PHENO (phenotypic properties) features.