| Literature DB >> 31543820 |
Xiangeng Wang1, Yanjing Wang1, Zhenyu Xu1, Yi Xiong1, Dong-Qing Wei1.
Abstract
Anatomical Therapeutic Chemical (ATC) classification system proposed by the World Health Organization is a widely accepted drug classification scheme in both academic and industrial realm. It is a multilabeling system which categorizes drugs into multiple classes according to their therapeutic, pharmacological, and chemical attributes. In this study, we adopted a data-driven network-based label space partition (NLSP) method for prediction of ATC classes of a given compound within the multilabel learning framework. The proposed method ATC-NLSP is trained on the similarity-based features such as chemical-chemical interaction and structural and fingerprint similarities of a compound to other compounds belonging to the different ATC categories. The NLSP method trains predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes the ensemble labels for a compound as final prediction. Experimental evaluation based on the jackknife test on the benchmark dataset demonstrated that our method has boosted the absolute true rate, which is the most stringent evaluation metrics in this study, from 0.6330 to 0.7497, in comparison to the state-of-the-art approaches. Moreover, the community structures of the label relation graph were detected through the label propagation method. The advantage of multilabel learning over the single-label models was shown by label-wise analysis. Our study indicated that the proposed method ATC-NLSP, which adopts ideas from network research community and captures the correlation of labels in a data driven manner, is the top-performing model in the ATC prediction task. We believed that the power of NLSP remains to be unleashed for the multilabel learning tasks in drug discovery. The source codes are freely available at https://github.com/dqwei-lab/ATC.Entities:
Keywords: drug classification; label correlation; label propagation; label space partition; multilabel classification
Year: 2019 PMID: 31543820 PMCID: PMC6739564 DOI: 10.3389/fphar.2019.00971
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Figure 1Label correlation landscape. (A) The pair wise visualization of Cramér’s V statistics for all the labels in a heatmap manner. (B) The UpSet visualization of label intersections. The horizontal bar shows the number of drugs per ATC category, and the vertical bar shows the number of drugs per ATC category intersection.
Comparison with other state-of-the-art multilabel predictors.
| Method | DL | Aiming | Coverage | Accuracy | Absolute true | Hamming loss |
|---|---|---|---|---|---|---|
| EnsANet_LR ⊕ DO | Yes | 0.7957 | 0.8335 | 0.7778 | 0.7090 | Not available |
| EnsANet_LR ⊕DO | Yes | 0.9011 | 0.7162 | 0.7232 | 0.6871 | |
| EnsLIFT | No | 0.7818 | 0.7577 | 0.7121 | 0.6330 | |
| iATC-mHyb | No | 0.7191 | 0.7146 | 0.7132 | 0.6675 | |
| Chen et al. | No | 0.5076 | 0.7579 | 0.4938 | 0.1383 | |
| iATC-mISF | No | 0.6783 | 0.6710 | 0.6641 | 0.6098 | |
| NLSP-ERT-LPA | No | 0.7948 | 0.7691 | 0.7578 | 0.7213 | 0.03817 |
| NLSP-RF-LPA | No | 0.8072 | 0.7889 | 0.7778 | 0.7489 |
|
| NLSP-SVM-LPA | No | 0.7844 | 0.7529 | 0.7370 | 0.6925 | 0.04322 |
| NLSP-XGB-LPA | No |
|
|
|
| 0.03429 |
| NLSP-MLP-LPA | No | 0.7958 | 0.7858 | 0.7591 | 0.7090 | 0.04032 |
DL denotes whether this model is a deep learning-based method.
The bold value stads for the best value of specific metrics.
These models are trained on a modified benchmark dataset, whose metrics are not comparable to our model.
Figure 2Label relation graph. Different colors stand for different communities. The line width represents the weight between two labels. Communities are detected by multiple async label propagation method, while the weight represents the frequency of label co-occurrence.
Label-wise analysis of best-performing multilabel learning model.
| Predictive label | Accuracy | Specificity | Recall | F1 score | AUC | Evaluation method |
|---|---|---|---|---|---|---|
| Alimentary tract and metabolism | 0.9269 | 0.7312 | 0.7549 | 0.7406 | 0.9550 | 10 × 10-fold CV |
| Blood and blood forming organs | 0.9793 | 0.7754 | 0.5644 | 0.6430 | 0.9493 | 10 × 10-fold CV |
| Cardiovascular system | 0.9490 | 0.8371 | 0.8274 | 0.8306 | 0.9752 | 10 × 10-fold CV |
| Dermatologicals | 0.9403 | 0.7966 | 0.6038 | 0.6845 | 0.9472 | 10 × 10-fold CV |
| Genitourinary system and sex hormones | 0.9691 | 0.8148 | 0.6682 | 0.7294 | 0.9539 | 10 × 10-fold CV |
| Systemic hormonal preparations, excluding sex |
| 0.8227 | 0.7605 | 0.7816 | 0.9940 | 10 × 10-fold CV |
| Anti-infectives for systemic use | 0.9793 |
|
|
|
| 10 × 10-fold CV |
| Antineoplastic and immunomodulating agents | 0.9792 | 0.8683 | 0.7724 | 0.8126 | 0.9804 | 10 × 10-fold CV |
| Musculoskeletal system | 0.9820 | 0.8707 | 0.7836 | 0.8209 | 0.9842 | 10 × 10-fold CV |
| Nervous system | 0.9511 | 0.8581 | 0.8913 | 0.8733 | 0.9825 | 10 × 10-fold CV |
| Antiparasitic products, insecticides and repellents | 0.9863 | 0.8312 | 0.7358 | 0.7714 | 0.9803 | 10 × 10-fold CV |
| Respiratory system | 0.9573 | 0.8432 | 0.7516 | 0.7923 | 0.9720 | 10 × 10-fold CV |
| Sensory organs | 0.9492 | 0.8206 | 0.6367 | 0.7140 | 0.9487 | 10 × 10-fold CV |
| Various | 0.9717 | 0.7681 | 0.6997 | 0.7241 | 0.9703 | 10 × 10-fold CV |
| Cardiovascular system | 0.8947 | Not available | 100 × bootstrapping | |||
| Cardiovascular system | 0.7712 | Test set | ||||
| SuperPred ( | 0.676 | Jackknife | ||||
The bold value stands for the best value of specific metrics.
The mean accuracy of flattened 850 ATC classes.