| Literature DB >> 35721178 |
Chaokun Yan1,2, Zhihao Suo1,2, Jianlin Wang1,2, Ge Zhang1,2, Huimin Luo1,2.
Abstract
The Anatomical Therapeutic Chemical (ATC) classification system is a drug classification scheme proposed by the World Health Organization, which is widely used for drug screening, repositioning, and similarity research. The ATC system assigns different ATC codes to drugs based on their anatomy, pharmacological, therapeutics and chemical properties. Predicting the ATC code of a given drug helps to understand the indication and potential toxicity of the drug, thus promoting its use in the therapeutic phase and accelerating its development. In this article, we propose an end-to-end model DACPGTN to predict the ATC code for the given drug. DACPGTN constructs composite features of drugs, diseases and targets by applying diverse biomedical information. Inspired by the application of Graph Transformer Network, we learn potential novel interactions among drugs diseases and targets from the known interactions to construct drug-target-disease heterogeneous networks containing comprehensive interaction information. Based on the constructed composite features and learned heterogeneous networks, we employ graph convolution network to generate the embedding of drug nodes, which are further used for the multi-label learning tasks in drug discovery. Experiments on the benchmark datasets demonstrate that the proposed DACPGTN model can achieve better prediction performance than the existing methods. The source codes of our method are available at https://github.com/Szhgege/DACPGTN.Entities:
Keywords: drug ATC code; drug discovery; graph transformer network; interaction information; multi-label classification
Year: 2022 PMID: 35721178 PMCID: PMC9198367 DOI: 10.3389/fphar.2022.907676
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.988
FIGURE 1Benchmark dataset label information analysis.
The 1749 drug compounds in the benchmark dataset are broken down into 14 ATC classes.
| Subset | Name | Number of Drugs |
|---|---|---|
|
| Alimentary tract and metabolism | 221 |
|
| Blood and blood forming organs | 44 |
|
| Cardiovascular system | 287 |
|
| Dermatologicals | 182 |
|
| Genitourinary system and sex hormones | 127 |
|
| Systemic hormonal preparations, excluding sex hormones and insulins | 68 |
|
| Anti-infectives for systemic use | 273 |
|
| Antineoplastic and immunomodulating agents | 129 |
|
| Musculo-skeletal system | 91 |
|
| Nervous system | 382 |
|
| Antiparasitic products, insecticides and repellents | 48 |
|
| Respiratory system | 189 |
|
| Sensory organs | 222 |
|
| Various | 45 |
| Number of total virtual drugs | 2308 | |
| Number of total structural different drugs | 1749 | |
The number of virtual drugs is calculated as follows: when a drug belongs to two different classes at the same time, it is counted as two virtual drugs. If a drug belongs to three different classes at the same time, it is counted as three virtual drugs, and so on.
Statistics of the Benchmark standard dataset used in this study.
| Dataset | Drugs | Targets | Diseases |
|---|---|---|---|
| 1749 | 982 | 355 | |
| Interactions | Drug-Target | Drug-Disease | Target-Disease |
| 6,370 | 1,285 | 288 |
FIGURE 2Overall framework of DACPGTN. The feature information of different biomedical entities is integrated to construct a composite feature matrix as the node feature input of the prediction module (Part A). The graph transformer layer is used to obtain the potential interactions information between different biomedical entities from heterogeneous networks set (Part B). The prediction stage uses the composite feature matrix and the learned Potential Interactions Information Networks to obtain prediction results (Part C).
DACPGTN model parameter settings.
| Parameter | Detailed Settings |
|---|---|
| Number of Graph Transformer Layer | 1 |
| Number of channels | 2 |
| Training epochs | 250 |
| Learning rate | 0.005 |
| Weight decay | 0.001 |
| Number of GCN | 1 |
| Feature Input dim | 300 |
| GCN Output dim | 150 |
| FC1 | 150 |
| FC2 | 128 |
| FC3 | 64 |
| FC4 | 14 |
| Dropout | 0.2 |
Comparison with other ATC Code multi-label classifiers (10 × 10-fold CV).
| Classfier | Aiming | Coverage | Accuracy | Absolute True | Absolute False |
|---|---|---|---|---|---|
| DACPGTN | 0.8543 | 0.8517 | 0.8320 | 0.7902 | 0.0241 |
| CGATCPred | 0.7864 | 0.8022 | 0.7711 | 0.7290 | 0.0338 |
| iATC-NRAKEL | 0.7744 | 0.8020 | 0.7550 | 0.6947 | 0.0376 |
| iATC-mISF | 0.7094 | 0.7127 | 0.7036 | 0.6306 | 0.0244 |
| ML-KNN | 0.7293 | 0.7071 | 0.6861 | 0.6300 | 0.0433 |
| ML-RandomForest | 0.6723 | 0.6533 | 0.6471 | 0.6187 | 0.0368 |
FIGURE 3Boxplot showing the absolute trues and accuracies of DACPGTN with 10-fold cross-validation for 10 times.
FIGURE 4GCN network Output dimension selection.
Experimental results of single-source interaction information.
| Classfier | Aiming | Coverage | Accuracy | Absolute True | Absolute False |
|---|---|---|---|---|---|
| DACPGTN-Disease | 0.8442 | 0.8437 | 0.8231 | 0.7782 | 0.02516 |
| DACPGTN-Target | 0.8327 | 0.8307 | 0.8051 | 0.7536 | 0.02875 |
New drugs prediction experiment results.
| Interactions | Aiming | Coverage | Accuracy | Absolute True | Absolute False |
|---|---|---|---|---|---|
| None-Disease | 0.8458 | 0.8443 | 0.8233 | 0.7802 | 0.0250 |
| None-Target | 0.8439 | 0.8423 | 0.8206 | 0.7764 | 0.0252 |
| None-Target-Disease | 0.8406 | 0.8376 | 0.8175 | 0.7747 | 0.0258 |
Eight inferred drugs ATC class based on the DACPGTN model.
| Drug ID | Chemical Name | Original ATC Class | Inferred ATC Class | Evidences |
|---|---|---|---|---|
| D00302 | Dipyridamole |
|
| KEGG/CTD |
| D02070 | Homatropine methylbromide |
|
| KEGG/DrugBank |
| D00768 | Carisoprodol |
|
| DrugBank/CTD |
| D00652 | Brinzolamide |
|
| KEGG |
| D00131 | Disulfiram |
|
| KEGG/CTD |
| D01192 | Olopatadine hydrochloride |
|
| CTD |
| D00314 | Etidronate disodium |
|
| CTD |
| D00525 | Pilocarpine |
|
| CTD |
*This symbol indicates that evidences can be found to support the chemical belonging to the ATC class.