Literature DB >> 31868917

Broad-Spectrum Profiling of Drug Safety via Learning Complex Network.

Ke Liu¹, Ruo-Fan Ding¹, Han Xu², Yang-Mei Qin¹, Qiu-Shun He¹, Fei Du¹, Yun Zhang¹, Li-Xia Yao³, Pan You⁴, Yan-Ping Xiang², Zhi-Liang Ji^1,5.

Abstract

Drug safety is a severe clinical pharmacology and toxicology problem that has caused immense medical and social burdens every year. Regretfully, a reproducible method to assess drug safety systematically and quantitatively is still missing. In this study, we developed an advanced machine learning model for de novo drug safety assessment by solving the multilayer drug-gene-adverse drug reaction (ADR) interaction network. For the first time, the drug safety was assessed in a broad landscape of 1,156 distinct ADRs. We also designed a parameter ToxicityScore to quantify the overall drug safety. Moreover, we determined association strength for every 3,807,631 gene-ADR interactions, which clues mechanistic exploration of ADRs. For convenience, we deployed the model as a web service ADRAlert-gene at http://www.bio-add.org/ADRAlert/. In summary, this study offers insights into prioritizing safe drug therapy. It helps reduce the attrition rate of new drug discovery by providing a reliable ADR profile in the early preclinical stage.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 31868917 PMCID： PMC7325315 DOI： 10.1002/cpt.1750

Source DB: PubMed Journal: Clin Pharmacol Ther ISSN： 0009-9236 Impact factor: 6.875

WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC? ☑ The adverse drug reaction (ADR) is a severe clinical pharmacology/toxicology problem that has caused immense medical and social burdens. It is also one of the two major causes leading to new drug discovery failure. Unfortunately, current mechanistic understanding and profiling of drug safety are still limited in both concept and methodology. WHAT QUESTION DOES THIS STUDY ADDRESS? ☑ Most of the current experimental and computational methods primarily focus on drug toxicity in cell or tissue/organ instead of adverse consequences in the clinic. Furthermore, a general, repeatable, and systematic method to study the mechanism of ADR is still needed. WHAT DOES THIS STUDY ADDS TO OUR KNOWLEDGE? ☑ This study implements a new method for broad‐spectrum ADR profiling. It suggests a gene‐based systematic exploration of the ADR mechanism, which is applicable to most ADRs. HOW MIGHT THIS CHANGE CLINICAL PHARMACOLOGY OR TRANSLATIONAL SCIENCE? ☑ This study provides new insights to understand clinical pharmacology. This study may enhance the success rate of new drug discovery by reducing the drug safety problems in clinical trials. It also helps prioritize clinical pharmacogenetic tests for safe drug therapy. Drug safety is a severe clinical problem in drug therapy. It has caused immense medical and social burdens around the world every year. Drug safety research answers why a particular drug causes side effects or adverse drug reactions (ADRs) in a particular patient. The World Health Organization (WHO) defines ADR as “responses to a medicine which is noxious and unintended, and which occurs at doses normally used in man.” Severe ADRs (SADRs) often lead to hospitalization, prolonged hospital staying, increased cost of care, disability, and even death.1, 2, 3 It was reported that fatal ADRs answered for more than 100,000 deaths in US hospitals in 1994.4 The ADR‐related mortality increased significantly over time at a rate of 0.58% per year since 1999.5 Besides, ADRs accounted for about 24% of all failures in clinical trials of new drug discovery, second to unsatisfied efficacy.6 Therefore, it is extremely important to monitor and assess drug safety throughout the life cycle of drug development, from early drug discovery to postmarket surveillance.7 Conventionally, both in vitro and in vivo tests are undertaken before clinical trials to help rapidly remove those highly toxic drugs. Be that as it may, > 20% of drug candidates still failed in clinical trials due to their poor toxicity profiles.8 Regretfully, the toxicity information collected from cell experiments, like MTT assays and animal studies, cannot be fully transferred to humans if they are not interpreted prudently. More reliable ADR profiles are usually created in clinical trials among small but carefully recruited patient populations and from large‐scale postmarket surveillance, which is costly and time‐consuming. Sometimes, it has to compromise patients’ treatment and satisfaction. For all this, the ADR profiles present in the drug labels can still be incomplete or inaccurate due to the medical complexity.9 As a complementary solution to experimental assays, computational methods have been developed for drug toxicity evaluation. For instance, a number of computer programs built upon the quantitative structure‐activity relationship for high‐throughput assessment of drug toxicity, such as hepatotoxicity, nephrotoxicity, genotoxicity, oncogenicity, and so on. Typical applications include DEREK,10 TOPKAT,11 COMPACT,12 MULTICASE,13 HazardExpert,14 and OncoLogic. It is noteworthy that drug toxicity and ADR are two linked but different concepts. Drug toxicity is often determined under different medication dosing and timing to assess damaging effects of definite chemical on cells or organs at early drug discovery stage; whereas the ADRs are undesired clinical consequences observed in drug therapy. However, the chemical structure itself possesses some hidden features to cause cell/organ toxicity and eventually induce ADRs. Hence, linking chemical features to ADRs provide a feasible way to aid better investigation of ADRs.15 Some machine learning algorithms, like decision tree, have been applied to determine the chemical, physical, and structural properties of chemical drugs that predispose to ADRs.16, 17, 18 In a view of systems biology, the interactions between drug (or its metabolites) and proteins or pathways are likely the driving force to drug‐induced adverse events. In recent years, different research groups tried to depict ADRs by undesired drug‐protein interactions, perturbation of metabolic pathways, or organ malfunction.19 For instance, the SePreSA built chemical‐protein interactome to predict SADRs.20 LaBute et al. used logistic regression models on top of molecular docking for ADR prediction.21 Huang et al.22 proposed a framework for predicting ADR profiles by integrating protein‐protein interaction networks with drug structures. Liu et al.23 combined chemical structures, biological properties, and phenotypic characteristics of drugs for ADR prediction using the machine learning method. Cami et al.24 developed the predictive pharmacosafety networks that combined the safety, taxonomic, and biological information of specific drugs and adverse events. Pan et al.25 used a high‐throughput docking program to create proteome‐wide drug‐off‐target interaction profiles for the network understanding of SADRs. It was proposed that target/pathway‐based methods might outperform ligand‐based methods in ADR assessment as they circumvented the hurdle of linking the drug itself or its metabolites to specific ADRs.26 However, the full drug‐target interaction profiles are usually hard to obtain by current experimental and computational technologies. Although high‐throughput screening systems can make use of robotics to test, automatically and quickly, the interaction activities between drugs and protein targets, the screening is usually limited to some well‐established protein panels, like kinases and receptors. As a result, the target‐based ADR studies often demonstrated on several ADRs of interest. Recent advances in systems pharmacology and toxicology incorporate microarray and next‐generation sequencing to monitor large‐scale gene expression changes in response to outside chemical stimulus simultaneously. By comparison of gene expression profiles in different cell lines, drug efficacies to different tumor therapies can be evaluated.27 This inspires us the opportunity to directly link biological effects (i.e., gene expression changes) to clinical outcomes (i.e., ADRs) upon drug treatment,19 leaving the intermediate drug‐target/pathway interactions as a black box. Such a strategy will take advantages of systems biology in network depiction of ADRs and at the same time bypass the difficulty in acquiring a full profile of the drug‐target interactions underlying ADRs. In this study, we will construct a multilayer complex drug‐gene‐ADR interaction network by integrating heterogeneous, multiscale, and historical data. Upon the network, we aim to develop an advanced machine learning model for de novo ADR prediction via building the full spectrum of gene‐ADR associations statistically in a retrospective manner. Last, we will try to use the weighted gene‐ADR associations in unveiling ADR mechanisms systematically.

MATERIALS AND METHODS

Hypothesis

In this study, we built a model on top of a multilayer drug‐gene‐ADR complex network (Figure 1) according to the assumption as followings: As the clinical consequence of drug treatment, ADRs sometimes occur. The occurrence of ADRs could attribute to any one or combination of mechanisms, such as overdose, weak pharmacokinetic, drug‐drug interaction, off target, and so on. Underlying these mechanisms are cascade molecular events led by drug treatment, including abnormal protein activity, disturbance of biological pathways, and dysfunction of organs. Regretfully, it is hard to determine the driving molecular events and their subsequent ADR mechanisms in most cases. However, when treated with drugs, the expressions of genes also change. It is the biological outcome of protein or pathway disturbance. In particular, the gene expressions mostly change at the focus tissues where ADRs happen. Hence, associations exist between gene expression changes and the occurrence of adverse reactions. Compared with the hard‐to‐acquire drug‐protein interactions, the drug‐gene interactions are easy to determine, thanks to the wide application of transcriptome technologies. Then, the problem is a shift to building the logic associations between gene expression changes and ADRs. Taking advantage of machine learning, we can determine the association strength for every gene‐ADR pair via statistically solving the known drug‐gene‐ADR relations (Figure 1). By doing so, we leave the exact molecular mechanisms underlying the ADRs as a black box. When the gene‐ADR associations are substantially represented and measured in the model, evaluation of an ADR profile from the drug‐regulated genes becomes feasible.

Figure 1

The hypothesis of broad‐spectrum adverse drug reaction (ADR) assessment via solving multilayer drug‐gene‐ADR interaction network. We made the drug safety profiling by statistically solving the multilayer drug‐gene‐ADR interaction network on the basis of hypothesis as follows: ADRs sometimes happen during drug treatment. The adverse effects are the integrated results of multiple mechanism‐of‐actions (MOAs), such as overdose, weak pharmacokinetics, unexpected drug‐drug interaction, off‐target interaction, and so on. Beneath, the MOAs could be driven by abnormality of proteins, disturbance of pathways, dysfunction of organs, and so on. In most cases, the exact mechanisms are hard to determine by current experimental or computational methods. However, genes change expressions as the biological outcome in response to drug treatment at the same time. There surely exists some association between gene expression change and ADR occurrence. On the basis of this, we bypass the exact MOAs, leaving them as a black box; alternatively, we link gene expression perturbations to ADRs directly via machine learning the complex drug‐gene‐ADR trilateral relationship. [Colour figure can be viewed at wileyonlinelibrary.com]

Data sources and data processing

To build the model, we incorporated a number of relations in the multilayer interaction network from various sources, including drug‐ADR relations, drug‐gene regulations, gene‐gene interactions, ADR concurrence, and so on.

Drug‐ADR relations

We derived the drug‐ADR relations from the Adverse Drug Reaction Classification System (ADReCS version 1.4; http://bioinf.xmu.edu.cn/ADReCS).28The ADReCS is a comprehensive ADR ontology database that offers both standardization and hierarchical classification of ADR terms via integrating the information from multiple resources, such as MedDRA,29 WHO‐ART,30 DailyMed, and SIDER2.31 The ADReCS version 1.4 covers total 6,778 standard ADR terms, 1,378 marketed drugs, and 196,194 nonredundant drug‐ADR pairs from the drug labels. However, subject to the availability of known drug‐gene relations, the current model only included 365 drugs and 1,156 ADR high level terms (HLTs). In MedDRA and ADReCS, each HLT represents a group of related ADRs on the basis of anatomy, pathology, physiology, etiology, or function.

Drug‐gene relations

We derived 106,739 literature‐documented human drug‐gene relations from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org).32 The direction and strength of drug‐gene regulations were downloaded when available. As a complementary source, we also extracted the drug‐gene relations from the Library of Integrated Network‐based Cellular Signatures (LINCS; http://www.lincsproject.org).33, 34, 35 The LINCS was elaborately designed to determine how perturbations like drug treatment affect gene expressions across multiple cells and perturbation types.36 In this study, we obtained normalized gene expression profiles (Z‐score) from 14 cell line experiments treated with drugs at a concentration of 10 μM for 6 hours. The selection of experimental conditions was in consideration of acquiring as much as possible drug‐gene relations under the constraints of data availability. For each drug treatment, differentially expressed genes, comparing with the untreated control, with the moderated Z‐scores ≥ 2 or Z‐scores ≤ −2 over at least two experiments were taken as reliable signature genes for the drug (i.e., drug‐regulated genes). For modeling, we only adopted the consensus drug‐gene pairs that had the same regulation direction (upregulation or downregulation) in all experiments. To weight the regulation strength for a consensus drug‐gene regulation pair, we selected either the maximum positive value of Z‐scores for upregulation or the minimum negative value of Z‐scores for downregulation in all experiments. Ultimately, we obtained 25,274 drug‐gene relations from the LINCS. Compared with the CTD, the drug‐gene relations mined from the LINCS are comparatively full‐scale than those from scientific literature or individual experiments deposited in ArrayExpress37 and Gene Expression Omnibus (GEO).38 The inclusion of LINCS data may improve the integrity and reliability of the drug‐gene‐ADR trilateral network.

Gene‐gene relations

We obtained the quantitative gene‐gene relations from the GeneMANIA Cytoscape plugin.39 GeneMANIA measures gene‐gene relations using a guilt‐by‐association approach over publicly available biological big data. The big data include multiple molecule interaction networks of protein‐protein, protein‐DNA, genetic interaction, pathway, co‐expression, colocalization, and protein domain similarity from multiple organisms.40, 41 In this study, we measured the gene‐gene relations quantitatively with all Homo sapiens interaction networks and incorporated those relations with weight ≥ 0.001 into the model.

ADR‐ADR concurrence

The information ADR concurrence was determined on the basis of the ADReCS data. We calculated the concurrence rate, denoted as w, between a pair of ADR HLTs—A (consisting of x preferred terms) and A (consisting of y preferred terms) by: where D stands for the number of drugs inducing A, D stands for the number of drugs inducing A, D stands for the number of drugs inducing preferred term PT of A, D stands for the number of drugs inducing PT of A, D ∩ stands for the number of drugs inducing both PT and PT, D ∪ stands for the number of drugs inducing either PT or PT, and D stands for the total number of drugs in the model. Only ADR‐ADR pairs with concurrency ≥ 0.01 were used in the model. Eventually, we built the model using 38,761 drug‐ADR relations, 20,867 drug‐gene relations, 19,229 gene‐gene interactions, 6,195 ADR‐ADR pairs, 365 drugs, 8,571 genes, and 1,156 distinct ADR HLTs.

Construction of the machine learning model

The weighted Bayesian model

We denoted the three partners of drug‐gene‐ADR complex network as the drug set D = {D, D, …, D}, the gene set G = {G, G, …, G}, and the ADR set A = {A, A, …, A}. In the trilateral relationship, we assume that a drug in D may induce multiple ADRs in A; and vice versa, an ADR in A may be induced by multiple drugs in D. In the meantime, a drug in D may regulate multiple genes in G; and vice versa, a gene in G may be regulated by multiple drugs in D. In previous work, we constructed an unweighted naïve Bayesian model prototype42 for ADR prediction, assuming that the drug‐gene‐ADR trilateral relations were all independent, unweighted, and directionless. Such an assumption is too idealized. In the real world, all of the elements in the molecular network can actually interact with each other. Therefore, we significantly improved the model as following in this study. In the case of ADR induced by the expression change of a single gene G (G ∈ G), the drug set that regulates G (i.e., the drug‐gene pairs) were denoted as D = {D, D, …, D} (D ⊆ D). The drug set that regulates G and, thus, leads to ADR A (A ∈ A) was denoted as D = {D, D, …, D} (D ⊆ D). Accordingly, the posterior probability of A induced by the expression change of a single gene G (despite of gene‐gene regulation and ADR concurrence), denoted as P(A|G), can be calculated by: where w stands for the regulation strength of drug D (D ∈ D) on gene G which was described in above section, w stands for the frequency of ADR A in drug D treatment, w stands for the regulation strength of D (D ∈ D) on G, w stands for the frequency of any ADR ∈ A in D treatment. In the real world, an ADR can be induced by the expression changes of multiple genes, denoted as gene set G = {G, G, …, G} (G ⊆ G and G ⊆ G). As the result, the probability of A triggered by G, denoted as P(A|G), can be calculated by: where P(A|G)′ stands for the probability of A triggered by G.

Incorporation of gene‐gene regulation and ADR concurrence

In the real world, genes have interactions between each other, and ADRs often happen together. We denoted the genes that interact with G as G = {G, G, …, G} (G ⊆ G and G ∈ G). As well, we denoted the ADRs that occur concurrently with A as A = {A, A, …, A} (A ⊆ A and A ∈ A). When incorporating the gene‐gene regulation and ADR concurrence into the model, the probability of A triggered by gene set G, denoted as P(A), can be calculated by: where w stands for the weight of G‐G interaction (G ∈ G), w stands for the weight of A‐A concurrence (A ∈ A), and P(A|G)′ stands for the probability of A triggered by G. The probability P(A) can also be taken as the estimated frequency or occurrence of ADR A.

Normalization of ADR probability by adjusting the bias caused by drug‐gene regulations

As our observation, the estimated frequency was affected by the number of effective input genes. Hence, we demonstrated a retrospective ADR prediction for 324 drugs, analyzing the change of detection rates (DRs) and false‐positive rates (FPRs) by the number of input genes, ranging from 125. The results suggested three or more effective genes could yield an average DR of ≥ 70% by the estimated frequency threshold of 1%. The more effective the input genes, the higher the average DR, as well as higher FPR (Figure 2). Therefore, to eliminate the bias caused by the number of input genes and at the same time normalize the estimated ADR frequency into the same scale of the observed ADR occurrence rate, we made an empirical correction of estimated frequency P(A|G) to a normalized value, denoted as P(A|G)norm, which can be determined by:where t stands for the number of effective input genes for ADR assessment.

Figure 2

The model performance by number of drug‐gene regulations. (a) The average detection rates (DRs) and the “false positive” rates (FPRs) (above frequency threshold of 1%) change along with the number of drug‐gene relations used in model prediction. (b) The model performance evaluated by the receiver operating characteristic curve using both the internal and external dataset. [Colour figure can be viewed at wileyonlinelibrary.com]

Model evaluation

In this study, we adopted the conventional 10‐fold cross‐validation strategy to evaluate model performance on the basis of 365 marketed drugs. All of the drugs were randomly divided into 10 folds (subsamples) of nearly equal size, each of which had similar data distribution in 14 Anatomical Therapeutic Chemical (ATC) categories (Supplementary Figure ). For the 10 folds, 9 were used for model training and 1 was retained as the validation dataset for model testing. The cross‐validation process was repeated 10 times, at which each of the 10 folds was chosen as the validation data once. We also derived 18 external trial drugs extracted from the Clinical Trials database (http://clinicaltrials.gov) by the criteria of: (i) the trial drugs are not recorded in both ADReCS database and DrugBank database and (ii) the drugs have information of at least three drug‐gene regulations. Unlike most deterministic methods that make the true or false classification, the Bayesian model of this study outputs the probability for every ADR in a broad spectrum of 1,156 distinct ADR HLTs. Therefore, we adopted the DR to measure the ratio of known ADRs predicted by the model: where PKA stands for the number of known ADRs predicted by the model and KA stands for the number of known ADRs. In addition, we also adopted metrics, like accuracy, sensitivity, specificity, the receiver operating characteristic curve, and the area under the curve of ROC (AUC) for model evaluation.

The ToxicityScore for summarizing drug safety

In this study, we introduced a new parameter, the ToxicityScore, to summarize the toxicity effects of a drug. This parameter evaluates the overall drug safety by integrating the information of both ADR occurrence rate (here, the estimated ADR frequency) and ADR severity. The ToxicityScore overcomes the shortages of current methods that relied on several preset toxicity features, like hepatic toxicity and kidney toxicity, for drug safety assessment in early drug discovery. In advance, we pre‐assigned ADRs into five severity categories of mild, moderate, severe, life‐threatening, or death according to the Common Terminology Criteria for Adverse Events (CTCAE) version 4.0.43, 44 The CTCAE is a predominant system for describing the severity of adverse events commonly encountered in clinical trials. Here, we determined the severity grade for each ADR HLT used in the model manually according to the guideline of CTCAE. As each HLT may consist of multiple ADR preferred terms of different severity grades, we adopted the lowest grade of the most frequent preferred term induced by most drugs as the grade for HLT. Examples of ADR severity assignment were given in Table . To quantitatively differentiate the grades, we further assigned ADR a severity score S of 1, 10, 100, 1000, and 10,000 for grades 1 to 5, respectively. The ToxicityScore can be determined by: where F stands for the estimated/observed occurrence rate of ADR A, S stands for the severity score of ADR A, and A belongs to ADR list A {A, A, …, A} of the drug. The estimated frequency F comes from the model prediction.

RESULTS

The performance of the advanced machine learning model

According to conventional practice, we evaluated the model performance using both the internal dataset and the external dataset. The internal 10‐fold cross‐validation yielded an average DR of 91.70% if all predicted ADRs were included (frequency threshold > 0; Table 1). When setting the ADR frequency threshold to 0.1%, a frequency threshold of “less common ADRs” defined by Wooten,45 the average DR dropped to 75.10%. The drop of DR was the result of excluding the rare known ADRs with occurrence rate < 0.1%. At the same time, the threshold also helped eliminate about 65.26% of potential FPRs (newly predicted ADRs). The independent test was undertaken on 18 external real‐world trial drugs extracted from the Clinical Trials database, which yielded an average DR of 98.22% or 84.38% (estimated occurrence rate ≥ 0.1%). In addition, we also evaluated the model performance using several metrics that are often adopted in evaluating deterministic models. In general, the model achieved an accuracy of 81.11% and 75.59% for internal and external datasets at the frequency threshold of 0.1%, respectively; accordingly, the AUC value was 78.91% and 79.85%, respectively (Table 1). It should be noted that these metrics were underestimated due to excluding the rare ADRs (estimated frequency < 0.1%), which are usually counted in reality. Hence, both the internal and external evaluations consolidated the model robust in ADR prediction.

Table 1

Performance of the advanced machine learning model

Model	Internal validation (n = 365)		External validation (n = 18)
Model	Frequency > 0	Frequency ≥ 0.1%	Frequency > 0	Frequency ≥ 0.1%
Accuracy	52.54%	81.11%	34.50%	75.59%
Sensitivity	90.00%	70.62%	96.41%	76.44%
Specificity	48.78%	82.17%	25.98%	75.47%
AUC	82.71%	78.91%	82.83%	79.85%
Average DR	91.70%	75.10%	98.22%	84.38%
Average FPR	51.04%	17.56%	73.76%	23.99%

AUC, area under the curve; DR, detection rate; FPR, false‐positive rate.

Performance of the advanced machine learning model AUC, area under the curve; DR, detection rate; FPR, false‐positive rate.

The web service of ADRAlert‐gene for rapid drug safety profiling

For user convenience, we deployed the well‐established model as a web service ADRAlert‐gene at http://www.bio‐add.org/ADRAlert/ or its mirror site at http://bioinf.xmu.edu.cn/ADRAlert/gene. We constructed the server upon a Linux + Tomcat architecture and developed interactive user interfaces using JavaScript. The ADRAlert‐gene could provide not only de novo assessment of drug safety but also the quantitative measure of gene‐ADR relations. We described the details of ADRAlert‐gene access in the Supplementary Information.

ToxicityScore is a suitable parameter for evaluating overall drug safety quantitatively

In this study, we introduced a novel parameter, the ToxicityScore, to summarize the overall drug safety in a broad spectrum of 1,156 distinct ADR HLTs. To evaluate the reliability of ToxicityScore in representing the overall drug safety, we made statistical analyses on the ToxicityScores of 432 selected drugs, covering 20 over‐the‐counter drugs, 412 prescription drugs, and 1,058 distinct known ADRs. The average ToxicityScore for the known ADRs of over‐the‐counter drugs, determined by the observed ADR occurrence rate and the estimated ADR frequency, were 45.19 and 56.41, respectively (Figure 3 b,d). These values were comparatively smaller than those of the prescription drugs, which were 60.91 and 65.50 (Figure 3 b,d), respectively. We also observed a significant difference in value ranges of ToxicityScores for drugs of different ATC categories (Figure 3 a,c). The ATC category system is formulated by the WHO to divide the active substances into different categories according to the organ system on which they act as well as their therapeutic, pharmacological, and chemical properties. For instance, the antineoplastic and immunomodulation agents owned comparatively higher ToxicityScores (104.57 in average, determined by the observed ADR occurrence) than that of drugs in the category of alimentary tract and metabolism (21.15 in average). This finding indicates drug candidates with relatively high toxicity scores would still have the chance to enter the market if they meet the needs of critically ill patients. Moreover, we compared the ToxicityScores of the 432 selected drugs by ATC categories. We found the scores determined by either estimated or observed ADR frequency well correlated (R = 0.94, P < 10−6; Figure 3 a,c). This result partially proved the feasibility and reliability of ToxicityScore in the evaluation of drug safety. Therefore, the ToxicityScore of current marketed drugs can serve as a suitable reference for selecting “safe” drug candidates according to their indications in the early drug discovery stage.

Figure 3

Statistics of ToxicityScores by drug types. The ToxicityScores of different Anatomical Therapeutic Chemical (ATC) drug types, determined by the observed occurrence rate (a) and the estimated occurrence rate (c) of known adverse drug reactions (ADRs), respectively. The ToxicityScores of 20 over‐the‐counter drugs and 412 prescription drugs, determined by the observed occurrence rate (b) and the estimated occurrence rate (d) of known ADRs, respectively. The color box stands for 75% of data, the black line stands for the median value, and the asterisk stands for the mean value. [Colour figure can be viewed at wileyonlinelibrary.com]

Example applications

Example 1: Identification of novel ADRs for known drugs

Lansoprazole is usually used to inhibit the acid production of stomach. Searching “lansoprazole” via the “From Drug” view responds 8 signature genes from the LINCS project33, 34, 35 and 32 lansoprazole‐interacting genes from the CTD.32 Based on these regulated genes, the server predicted 368 ADRs for lansoprazole by the frequency threshold of 0.1%. The 368 predicted ADRs fully cover all 329 documented ADRs in the drug label in the ADReCS database; the DR is 100%. Of note, the majority of 329 known ADRs have generally higher estimated frequency than those of the remaining 39 novel ADRs. Of the 39 novel ADRs, we found 15 ADRs, including heart failures, mental disorders, and non‐site‐specific injuries, in a recent ADR collection of lansoprazole treatment in the Side Effect Resource (SIDER 4.1),31 DailyMed (updated by February 2017), and ClinicalTrials.gov databases (by January 2019).

Example 2: De novo ADR assessment

Levosalbutamol is a short‐acting β2 adrenergic receptor agonist used in the treatment of asthma and chronic obstructive pulmonary disease. As of October 2019, levosalbutamol has been tested against asthma in phase III and IV clinical trials. As documented in the Clinical Trials database, levosalbutamol might cause 71 different ADRs in 217 studies, covering a few hundred of selected patients. As well, mining of the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS)46 identified 114 potential levosalbutamol‐ADR associations in 306 reports. Using seven levosalbutamol‐regulated genes (RFC2, LOXL1, PCSK1N, BID, HOXA10, LSR, and HSPA4L) extracted from the LINCS,33, 34, 35 ADRAlert‐gene predicted total 225 ADRs via the “From Gene” view at the frequency threshold of 0.1%. Of the 71 observed ADRs, 36 (about 50.70%) were predicted by the server with comparatively high frequency (3.84% average). For the 114 possible levosalbutamol‐induced ADRs mined from the FAERS, 72 (about 63.16%) were predicted by the server. The remaining 138 predicted ADRs were either novel ADRs that have not been reported due to the limitation of trial data or potentially false positives. In the real world, the clinical safety of a new molecular entity can be evaluated in a simple way of comparing the calculated ToxicityScore against that of the marketing drugs, especially the drugs of same indication (e.g., ATC category; Figure 3). If the new molecular entity has a substantially higher ToxicityScore and consists of severe ADRs with estimated frequency ≥ 0.1% particularly, further clinical trials should be prudentially conducted. In this example, the server evaluated levosalbutamol safety with a ToxicityScore of 60.870 at the frequency threshold of 0.1%. This score was much higher than that of the marketing drugs of the Respiratory System category, which is 18.78 on average. Two common (estimated frequency ≥ 10%) severe ADRs (“cardiac signs and symptoms” and “circulatory collapse and shock”) and a less common (estimated frequency of about 0.89%) life‐threatening ADR “sepsis” may answer for the high ToxicityScore. These three severe ADRs have been reported in the Clinical Trials and FAERS. Therefore, more attention should be paid to levosalbutamol safety in future clinical trials.

Example 3: Mechanistic understanding of ADRs by network analysis of gene‐ADR associations

Allergic conditions (ACs; ADReCS ID: 10.01.03) is in a group of ADRs, including allergic reactions, hypersensitivity, asthma, Stevens‐Johnson syndrome, and so on. Up to date, the molecular mechanisms underlying ACs have not yet been fully explored. By searching “Allergic conditions” or “10.01.03” via the “Gene‐ADR From ADR” view, we extracted 8,571 AC‐associated genes (3,852 upregulated, 757 downregulated, and 3,962 both upregulated and downregulated). We selected 563 comparatively strong AC‐associated genes (the association strength ≥ 0.025) for later mechanistic study. Of them, 424 were upregulated genes and 139 were downregulated genes. We mapped these 563 genes against the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database,47, 48 which identified 209 AC‐associated pathways. Of them, we specified 34 major AC‐associated pathways (covering at least six AC‐associated genes), of which about one‐third (12 pathways) belong to the signal transduction pathways and immune system pathways (Figure 4 a). The 209 AC‐associated pathways include a total of 225 AC‐associated genes. Eight of them (ALOX5, DNMT1, HLA‐B, IFNG, IL15, ITGB2, KNG1, and TBXA2R) have literature evidence from the CTD32 and the Online Mendelian Inheritance in Man (OMIM)49 to support their associations with the allergy symptoms. On the basis of the 225 KEGG‐mapped genes, we constructed a gene‐gene interaction network (Figure 4 b), from which we identified 5 hub genes (CXCL10, HUWE1, ITGB2, RPS23, and SELL) by satisfying the criteria of: owning a connection degree ≥ 8 and neighboring to two or more known AC‐related genes. These hub genes are potential major gene players in understanding drug‐induced allergic reactions. Of them, ITGB2 was previously reported to be a biomarker in monitoring the dysregulated allergic response.50 CXCL10 exhibited significantly higher concentration in allergic patients than that of the healthy subjects.51 The protein of HUWE1 might function as an E3 ubiquitin ligase that played a role in modulating allergic responses.52

Figure 4

Mechanistic understanding of allergic conditions (ACs; ADReCS ID: 10.01.03). (a) The KEGG pathways associated with ACs. The number of genes mapped into each of these pathways is given in parentheses. (b) The allergic conditions associated gene interaction network, constructed by the GeneMANIA CytoScape plugin. The green circles and the orange circles stand for the AC‐associated genes identified in this study and the known gene markers of ACs, respectively. The node size is positively proportional to the connectivity degree of the node, and the width of edge stands for the weight of gene‐gene interaction. This network consists of 89 genes and 127 gene‐gene interactions with weight > 0.001. [Colour figure can be viewed at wileyonlinelibrary.com]

DISCUSSION

This work introduces an advanced machine learning model and its web service ADRAlert‐gene for rapid drug safety assessment via a learning drug‐gene‐ADR complex network. Compared with prior methods, it has several advantages: (i) To our limited knowledge, ADRAlert‐gene is the only method that provides broad‐spectrum ADR profiling. It allows reviewing the drug safety in a landscape of 1,156 distinct ADRs. (ii) Unlike many target‐based models that heavily rely on complete chemical‐protein interaction profile for reliable ADR prediction, ADRAlert‐gene just requests representative drug‐gene regulation profile (i.e., significantly differentiated genes of drug treatment), which is easier to obtain via state‐of‐the‐art transcriptome technology in current practice. (iii) The ADRAlert‐gene is up‐to‐date and the first tool that provides comprehensive information more than simple ADR prediction. The ToxicityScore integrates multiple information of the ADR number, ADR frequency, and ADR severity quantitatively to measure overall drug safety. As we know, the high‐incidence or SADRs usually receive more attention in the clinic and new drug discovery. (iv) Last but not least, the ADRAlert‐gene provides a new thought to reveal major gene players in ADR via genomewide quantifying gene‐ADR associations. Such a quantitative gene‐ADR association profile is hard to obtain through conventional molecular technologies. We also acknowledge several limitations of the model. First, the current version of the ADRAlert‐gene model was partially built upon drug‐gene regulations derived from the LINCS project. Unfortunately, the experimental design and data quality of the LINCS project is arguable itself because it was not particularly designed for drug toxicity research. Especially, most cells used in the LINCS project are tumor cell lines instead of primary normal cells in which most ADRs may occur. Second, the drug‐treated transcriptomes determined on various cell lines are likely to be different from those of individual patients, even though recent research suggested that the transcriptomes on human cell lines could honestly reflect human response.53 Third, the drug‐gene relations mined from heterogeneous cell transcriptomes are not always consistent, pending improvement of data analysis and new technology. Last but not least, the model was developed on the basis of monotherapy assumption. In reality, most patients are typically on a polypharmacy regiment, which sometimes confronts the ADRs caused by drug‐drug interactions. The drug‐drug interactions are so complex that they are hard to measure quantitatively. Therefore, the model has not yet integrated the information of drug‐drug interactions in drug safety evaluations in the current version. Therein, multiple factors may take part in the ADRs caused by the drug‐drug interactions, including the combination of drug dosages, the order, the interval of polypharmacy treatment, the administration routes of drugs, and so on. More efforts are expected to improve the model in the future. Nevertheless, ADRAlert‐gene can serve as a powerful, practical, and economical tool for drug safety profiling. It helps to reduce the attrition rate of new drug discovery by offering reliable ADR profile in the early preclinical stage. It also provides a shortcut for network understanding of molecular mechanisms underlying ADRs. In particular, the identification of ADR‐associated genes allows targeting the potential molecular causes of ADR directly. This could accelerate precision medicine because genetic testing would build the pharmacogenetic profile of different patients’ responses to the same drug quickly and undistractedly. Therefore, we believe that ADRAlert‐gene will benefit the communities of both clinical pharmacology and toxicology. It will be especially useful for drug design, ADR mechanism study, and individual drug therapy.

Funding

This work was supported by the National Natural Science Foundation of China (grant numbers 31671362 and 31271405).

Conflicts of Interest

All authors declared no competing interests for this work.

Author Contributions

K.L., L.X.Y., R.F.D., and Z.L.J. wrote the manuscript. Z.L.J. designed and supervised the experiment. K.L., H.X., Y.M.Q., Q.S.H., R.F.D., P.Y., F.D., Y.Z., and Y.P.X. performed the research. K.L. analyzed the data. Click here for additional data file. Click here for additional data file. Click here for additional data file.

49 in total

1. Factors associated with prolonged hospital stay in a geriatric ward of a university hospital in Japan.

Authors: Taro Kojima; Masahiro Akishita; Yumi Kameyama; Kiyoshi Yamaguchi; Hiroshi Yamamoto; Masato Eto; Yasuyoshi Ouchi
Journal: J Am Geriatr Soc Date: 2012-06 Impact factor: 5.562

2. Ten-year trends in hospital admissions for adverse drug reactions in England 1999-2009.

Authors: Tai-Yin Wu; Min-Hua Jen; Alex Bottle; Mariam Molokhia; Paul Aylin; Derek Bell; Azeem Majeed
Journal: J R Soc Med Date: 2010-06 Impact factor: 5.344

3. Rapid Assessment of Adverse Drug Reactions by Statistical Solution of Gene Association Network.

Authors: Yan-Ping Xiang; Ke Liu; Xian-Ying Cheng; Cheng Cheng; Fang Gong; Jian-Bo Pan; Zhi-Liang Ji
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2015 Jul-Aug Impact factor: 3.710

Review 4. International Commission for Protection Against Environmental Mutagens and Carcinogens. Use of SAR in computer-assisted prediction of carcinogenicity and mutagenicity of chemicals by the TOPKAT program.

Authors: K Enslein; V K Gombar; B W Blake
Journal: Mutat Res Date: 1994-02-01 Impact factor: 2.433

5. Predicting adverse drug reaction profiles by integrating protein interaction networks with drug structures.

Authors: Liang-Chin Huang; Xiaogang Wu; Jake Y Chen
Journal: Proteomics Date: 2013-01 Impact factor: 3.984

Review 6. Patient-reported outcomes and the evolution of adverse event reporting in oncology.

Authors: Andy Trotti; A Dimitrios Colevas; Ann Setser; Ethan Basch
Journal: J Clin Oncol Date: 2007-11-10 Impact factor: 44.544

7. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function.

Authors: David Warde-Farley; Sylva L Donaldson; Ovi Comes; Khalid Zuberi; Rashad Badrawi; Pauline Chao; Max Franz; Chris Grouios; Farzana Kazi; Christian Tannus Lopes; Anson Maitland; Sara Mostafavi; Jason Montojo; Quentin Shao; George Wright; Gary D Bader; Quaid Morris
Journal: Nucleic Acids Res Date: 2010-07 Impact factor: 16.971

8. Differential requirement for CD18 in T-helper effector homing.

Authors: Seung-Hyo Lee; Joseph E Prince; Muhammad Rais; Farrah Kheradmand; Felix Shardonofsky; Huifang Lu; Arthur L Beaudet; C Wayne Smith; Lynn Soong; David B Corry
Journal: Nat Med Date: 2003-09-14 Impact factor: 53.440

9. GeneMANIA prediction server 2013 update.

Authors: Khalid Zuberi; Max Franz; Harold Rodriguez; Jason Montojo; Christian Tannus Lopes; Gary D Bader; Quaid Morris
Journal: Nucleic Acids Res Date: 2013-07 Impact factor: 16.971

10. ArrayExpress update--simplifying data submissions.

Authors: Nikolay Kolesnikov; Emma Hastings; Maria Keays; Olga Melnichuk; Y Amy Tang; Eleanor Williams; Miroslaw Dylag; Natalja Kurbatova; Marco Brandizi; Tony Burdett; Karyn Megy; Ekaterina Pilicheva; Gabriella Rustici; Andrew Tikhonov; Helen Parkinson; Robert Petryszak; Ugis Sarkans; Alvis Brazma
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971