Literature DB >> 31928967

Predicting Human Clinical Outcomes Using Mouse Multi-Organ Transcriptome.

Satoshi Kozawa¹, Fumihiko Sagawa¹, Satsuki Endo¹, Glicia Maria De Almeida², Yuto Mitsuishi², Thomas N Sato³.

Abstract

Approximately 90% of pre-clinically validated drugs fail in clinical trials owing to unanticipated clinical outcomes, costing over several hundred million US dollars per drug. Despite such critical importance, translating pre-clinical data to clinical outcomes remain a major challenge. Herein, we designed a modality-independent and unbiased approach to predict clinical outcomes of drugs. The approach exploits their multi-organ transcriptome patterns induced in mice and a unique mouse-transcriptome database "humanized" by machine learning algorithms and human clinical outcome datasets. The cross-validation with small-molecule, antibody, and peptide drugs shows effective and efficient identification of the previously known outcomes of 5,519 adverse events and 11,312 therapeutic indications. In addition, the approach is adaptable to deducing potential molecular mechanisms underlying these outcomes. Furthermore, the approach identifies previously unsuspected repositioning targets. These results, together with the fact that it requires no prior structural or mechanistic information of drugs, illustrate its versatile applications to drug development process.

Entities: Chemical Disease Gene Species

Keywords: Biocomputational Method; Bioinformatics; Biological Sciences; Computational Bioinformatics; Pharmacoinformatics

Year: 2020 PMID： 31928967 PMCID： PMC7033637 DOI： 10.1016/j.isci.2019.100791

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

Unexpected adverse events (AEs) and/or the lack of the expected efficacy in human subjects fails drug development. The average success rate of drug candidates through clinical trials is 13.8% (Wong et al., 2019). This low success rate costs US$161M–1.8B per drug candidate on drug developers (e.g., pharmaceutical and biotech companies) (Morgan et al., 2011), leading to drug price hike and rising medical cost. Hence, effective prediction of clinical outcomes from the pre-clinical studies improves the success rate of drug development and reduces the drug price and medical cost. A major impediment in drug development is the unpredictability of human outcomes on the basis of the pre-clinical results using cells, organs, and/or animal models. To overcome this problem, numerous technological solutions, both experimental and computational approaches, are currently undertaken. The uses of human cells and organs in vitro and in vivo models are among such approaches. Human cells such as induced human pluripotent cells (human iPSCs) (Elitt et al., 2018, Ko and Gelb, 2014, Meseguer-Ripolles et al., 2018) and an organ(s)-on-a-chip consisting of human cells (Oleaga et al., 2016, Rezaei Kolahchi et al., 2016) are currently utilized as drug screening and in vitro toxicity assays. As in vivo models, partially humanized mouse models, such as those where the liver is nearly 100% composed of human cells, is exploited (Tateno et al., 2004). In addition to such experimental approaches, computational tools are also invented and used. In particular, applications of machine learning algorithms for predicting clinical outcomes are in fashion (Shah et al., 2019, Vamathevan et al., 2019). They exploit the big-data sets representing structural and functional features of drugs and their target information. Although both of such experimental and computational approaches have shown some success and promise, there are certain limitations with these existing approaches. The existing machine learning approaches require prior knowledge about the characteristics (such as structure) of drugs and their mechanisms of actions (such as target molecules). In addition, many of the computational approaches are specialized for the drugs of specific modality such as small-molecule compounds (i.e., mixed structures and mechanisms). Hence, they have difficulty dealing with the mixture of compounds. The existing experimental approaches are often “biased”, as they can design assay system according to what the testers want to examine. For example, the testers need to know which organ(s) or phenotype(s) to examine for the drug effect(s) prior to designing the experiment(s). Furthermore, these presently available experimental and computational approaches fail to recapitulate the organismal level body-wide drug effects in human. Hence, an approach that recapitulates organismal biology but does not require any prior knowledge about the drug structure or mechanisms (i.e., modality-independent and unbiased) is expected to advance and facilitate the drug development process. At the whole-body level, drugs could act on not only one or a few specific organ(s), but also multiple organs showing complex effects (Berger and Iyengar, 2011). Drugs could also act on one organ that influences the functions of other organs via inter-organ cross talks (Droujinine and Perrimon, 2013, Droujinine and Perrimon, 2016). Hence, measuring the drug effects on large number of organs may provide useful features that could be exploited in predicting the drug effects at the whole-body level. However, this assay is only possible with animal models but not with human subjects. We previously conducted multi-organ multi-model transcriptome studies and identified model-specific and organ-specific transcriptome patterns in mice (Kozawa et al., 2018). Furthermore, weighted correlation network analysis (WGCNA) of the gene expression across multiple organs identified a number of putative organ-to-organ cross talks, some of which we genetically validated (Kozawa et al., 2018). On the basis of this previous study, we considered that such multi-organ transcriptome patterns could be exploited as the unique features of drugs at the whole-body level to predict its clinical outcomes. Furthermore, if successful, the approach provides a modality-independent and unbiased system requiring no prior knowledge of the structures/mechanisms of drugs, as the only requirement is that the drugs can be administered to mouse. In the current study, we develop and show the performance of such system. We design a hybrid approach of experimental and machine-learning methods to utilize the multi-organ transcriptome patterns induced by drugs in mice to predict their clinical outcomes in human. We evaluate its prediction performance for both AEs and therapeutic indications by cross-validation. In addition, we also test whether the approach could be adapted to deducing molecular and cellular mechanisms underlying such predictions. Furthermore, we illustrate a possibility of applying this approach to identifying drug repositioning targets.

Results

“Humanizing” Mouse Datasets

The concept of our approach is described in Figure 1. First, we generate transcriptome data across 24 organs from the mice to which we administer each drug with known clinical outcomes. Next, for each human clinical criterion (e.g., AE, therapeutic indication), we train a machine learning model with the 24-organ transcriptome data induced by the drug in mice and its known outcomes in human. The outcome data are classified according to the demographic profiles (e.g., sex, age) of the individuals. Consequently, the drug-induced 24-organ mouse transcriptome patterns are associated with the individualized human clinical outcomes in the machine learning model/database, hence referred to as “humanized Mouse-DataBase, individualized (hMDB-i).” To predict clinical outcomes of a new drug candidate X, we generate the 24-organ transcriptome data from the mice to which X is administered. Such data are used as the input data into hMDB-i to predict X's putative individualized clinical outcomes (i.e., the output).

Figure 1

Conceptual Framework of hMDB-i

See text for the description.

Conceptual Framework of hMDB-i See text for the description.

AEs Prediction by Support Vector Machine Algorithm

We generated the 24-organ transcriptome datasets with 15 drugs of various modalities (i.e., small molecules, antibody, and peptide) with diverse known therapeutic indications (see Transparent Methods, Figure 1 and Table S1). The incidence of AEs reported for each drug in each sex and each age group were compiled from US Food and Drug (FDA) Adverse Event Reporting System (FAERS). For each AE (a total of 5,519 AEs is reported for one or more of the 15 drugs), we train a support vector machine (SVM) algorithm with the drug-induced 24-organ transcriptome patterns to predict the individualized outcomes. We performed cross-validation to evaluate the effectiveness of this approach to predict the outcomes of the AEs (see Transparent Methods, Figure 2A). We first train an SVM algorithm with the 14 drug-induced 24-organ mouse transcriptome data. We then input the 24-organ mouse transcriptome data of the omitted drug as “an unseen drug data” (see Transparent Methods, Figure 2A). We performed such evaluations for all 15 drugs by omitting one drug at a time from the training data. We used an SVM classifier to predict whether each AE would be observed by each drug for each sex and age-group (see Transparent Methods). The scores of accuracy, precision, recall, and F-measure of the prediction result of the AEs reported for each sex and age group for each drug are summarized in Figure 2B and Table S2.

Figure 2

Summary of the AE Prediction Results by SVM/hMDB-i

(A) Hold-out scheme.

(B) Confusion matrix summarizing the AE prediction results for each drug. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each age group (Age) and sex (Sex). The scale of the scores (0.0–1.0) are indicated at the bottom. The black filledboxes across all scores indicate that no AEs are reported for this drug in this sex/age group. When the denominator is 0 in the F-measure calculation, the score is incalculable, hence shown as black filled box.

(C) The prediction results of death event for each drug (Drug) are summarized for each age group (Age) and sex (Sex) using FAERS. The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating death event is reported and not-reported, respectively. The prediction results (Predicted) are shown as orange and light-blue filled boxes indicating death event is predicted to occur and not-to-occur, respectively. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old, All: 10–99 years old. F, females; M, males; APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The raw data are available as Table S2.

(D) The confusion matrix showing the prediction result using SIDER4.1 reference database. The raw data are available as Table S3.

Summary of the AE Prediction Results by SVM/hMDB-i (A) Hold-out scheme. (B) Confusion matrix summarizing the AE prediction results for each drug. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each age group (Age) and sex (Sex). The scale of the scores (0.0–1.0) are indicated at the bottom. The black filledboxes across all scores indicate that no AEs are reported for this drug in this sex/age group. When the denominator is 0 in the F-measure calculation, the score is incalculable, hence shown as black filled box. (C) The prediction results of death event for each drug (Drug) are summarized for each age group (Age) and sex (Sex) using FAERS. The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating death event is reported and not-reported, respectively. The prediction results (Predicted) are shown as orange and light-blue filled boxes indicating death event is predicted to occur and not-to-occur, respectively. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old, All: 10–99 years old. F, females; M, males; APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The raw data are available as Table S2. (D) The confusion matrix showing the prediction result using SIDER4.1 reference database. The raw data are available as Table S3. The result shows that the accuracy scores are over 0.7 for the 200 of 264 sex/age-groups across the 15 drugs and over 0.9 for the 52 sex/age-groups. This indicates that more than 70% of the AEs predicted to either appear or not appear for the drug are indeed reported or not reported, respectively, in these sex/age-groups. The precision scores are over 0.5 for the 156 of the 264 sex/age-groups, indicating that more than 50% of the AEs predicted to appear for the drug are indeed reported in these sex/age-groups. If the question is whether an AE predicted to appear for a drug is indeed reported in at least one or more of the sex/age group, 13 drugs (alendronate, acetaminophen, aripiprazole, cisplatin, clozapine, doxycycline, empagliflozin, lenalidomide, olanzapine, evolocumab [Repatha], risedronate, sofosbuvir, teriparatide) show the precision scores of over 0.5 (see “All” row in each drug in Figure 2B), indicating that more than 50% of the AEs predicted to appear for these drugs are indeed reported at some sex/age groups. Furthermore, for nine drugs (alendronate, acetaminophen, aripiprazole, cisplatin, clozapine, doxycycline, lenalidomide, olanzapine, teriparatide), the precision scores of 0.9–1.0 are found at least in one or more the sex/age groups. The recall score indicates how many of the reported AEs are indeed predicted for the drug in each sex/age group. The recall scores are relatively lower across all drugs and sex/age groups, indicating that the approach misses many of the reported AEs. However, for such drugs (asenapine and lurasidone), the recall scores of 0.7–1.0 are found for certain sex/age groups. Sample size differences among different sex/age groups and the number of the reports for each AE may significantly impact on the accuracy, precision, and recall scores. In fact, the number of the reported AEs for asenapine and lurasidone is significantly smaller than those of the other drugs and their precision is lower. In FAERS, many of the AEs (e.g., diarrhea, abdominal pain) are common among many drugs. The outcomes of rare but serious AEs (SAEs) are often more important to know in advance to clinical trials. Hence, we assessed the prediction performance on SAEs. The top of such SAE hierarchy is “death event” (Sonawane et al., 2018). The result shows that the correct outcomes of death event are predicted for 148 of 204 sex/age groups across the 15 drugs (Figure 2C). The death events are reported in a total of 141 sex/age groups, and our prediction approach missed its incidence only in 12 groups (Figure 2C). To examine potential influences of reference-specific errors, biases, and/or cofounders that are often found in any real-world data such as FAERS (Harpaz et al., 2013, Tatonetti et al., 2012), we validated our prediction with another reference, Side Effect Resource (SIDER) (http://sideeffects.embl.de). Several differences between FAERS and SIDER must be noted. SIDER4.1 version does not include evolocumab (Repatha). The outcomes reported in SIDER4.1 are not classified according to sex/age groups. The SEs and AEs are not necessarily represented by the same terms. Considering these differences, we examined the outcome predictions for 14 drugs (i.e., 15 drugs minus evolocumab) without the sex/age group classifications. In addition, the SEs/AEs common to both reference databases were analyzed. The result, 0.677–0.827 accuracy, 0.023–0.523 precision, and 0.571–0.963 recall, is comparable with that with FAERS (Figure 2D, the full and raw data for the confusion matrix is available as Table S3), further supporting the effectiveness of our method. Next, we evaluated the necessity of multiple organ transcriptome. Figure 3 shows the SAEs that the 24-organ transcriptome correctly predicts the outcome for each drug (Figure 3 and Table S4). The predictions of these SAEs by individual-organ transcriptome data in the cross-validation scheme show variable results among different organs (Figure 3). Although this result indicates that some of the individual-organ datasets are sufficient to predict the correct outcomes, which individual-organ dataset(s) is(are) necessary varies among which SAEs to be predicted (Figure 3). Hence, it appears beneficial to collect all 24-organ transcriptome data.

Figure 3

The Comparison of the Prediction Results by the 24-Organ and Individual-Organ Transcriptome Data

Each serious adverse event is described (SAE). The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating the SAE is reported and not-reported, respectively. The prediction results (Predicted) are shown as orange and light-blue filled boxes indicating the SAE is predicted to occur and not-to-occur, respectively. The prediction results of with the 24-organ (Predicted by all 24 organs) and individual-organ (Predicted by organ name) data are shown. AdrenalG, adrenal gland; BM, bone marrow; WAT, gonadal white adipose tissue; ParotidG, parotid gland; PituitaryG, pituitary gland; SkMuscle, skeletal muscle; ThyroidG, thyroid gland. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old. F, females; M, males; APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The result is also included in Table S4.

The Comparison of the Prediction Results by the 24-Organ and Individual-Organ Transcriptome Data Each serious adverse event is described (SAE). The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating the SAE is reported and not-reported, respectively. The prediction results (Predicted) are shown as orange and light-blue filled boxes indicating the SAE is predicted to occur and not-to-occur, respectively. The prediction results of with the 24-organ (Predicted by all 24 organs) and individual-organ (Predicted by organ name) data are shown. AdrenalG, adrenal gland; BM, bone marrow; WAT, gonadal white adipose tissue; ParotidG, parotid gland; PituitaryG, pituitary gland; SkMuscle, skeletal muscle; ThyroidG, thyroid gland. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old. F, females; M, males; APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The result is also included in Table S4.

AE Prediction by Random Forest Algorithm

We tested another machine learning algorithm, random forest (RF), to predict the AE outcomes. We compared the prediction results for death event outcomes (Figure 4 and Table S5). The result shows that both SVM and RF provides the same predictions for all sex/age groups (Figure 4A). The calculations of accuracy, precision, recall, and F-measure scores by both algorithms show the similar results for all 15 drugs (Figure 4B). For all drugs except asenapine, empagliflozin, and lurasidone, all scores of accuracy/precision/recall are 0.5–1.0. Although the accuracy and the precision scores for asenapine, empagliflozin, and lurasidone are 0.2–0.4, the recall scores for these three drugs by both SVM and RF are 1.0 (asenapine, empagliflozin) and 0.667 (lurasidone), indicating that the occurrence of death event by these three drugs are efficiently predicted across all sex/age groups (i.e., not missing the possible occurrence of death event). The results indicate that both algorithms are equivalent in predicting AE outcomes.

Figure 4

The Comparison of Death Event Prediction Results by SVM and RF Algorithms

(A) The prediction results of death event for each drug by SVM and RF algorithms are summarized for each age group (Age) and sex (Sex). The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating death event is reported and not-reported, respectively. The prediction results (Predicted by SVM/RF) are shown as orange and light-blue filled boxes indicating death event is predicted to occur and not-to-occur, respectively. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old. F, females; M, males.

(B) The prediction results for each drug by SVM and RF are summarized as a confusion matrix. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each age group (Age) and sex (Sex). The scale of the scores (0.0–1.0) are indicated at the bottom. APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The results of the RF prediction are also available as Table S5.

(C) Majority decision approach for death event prediction. “Voting” (i.e., majority decision) by both SVM and RF methods using the 24-organ and all individual-organ transcriptome data are conducted. The reported outcomes (Reported) are shown as orange filled box indicating death event is reported. The prediction results with the 24-organ data (Predicted by SVM with all 24 organs) are shown as light-blue filled boxes indicating “death event” is predicted to not-to-occur. The majority decisions (Majority decision) by counting all votes are shown as orange filled box voting for death event to occur. The number of “votes” for “to-occur” (Predicted results) and “not-to-occur” (Predicted results) is indicated by the numbers.

The Comparison of Death Event Prediction Results by SVM and RF Algorithms (A) The prediction results of death event for each drug by SVM and RF algorithms are summarized for each age group (Age) and sex (Sex). The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating death event is reported and not-reported, respectively. The prediction results (Predicted by SVM/RF) are shown as orange and light-blue filled boxes indicating death event is predicted to occur and not-to-occur, respectively. The age groups are 10: 10–19 years old, 20: 20–29 years old, 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old, 90: 90–99 years old. F, females; M, males. (B) The prediction results for each drug by SVM and RF are summarized as a confusion matrix. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each age group (Age) and sex (Sex). The scale of the scores (0.0–1.0) are indicated at the bottom. APAP, Acetaminophen; Dox, doxycycline; EMPA, empagliflozin; Repatha, evolocumab. The results of the RF prediction are also available as Table S5. (C) Majority decision approach for death event prediction. “Voting” (i.e., majority decision) by both SVM and RF methods using the 24-organ and all individual-organ transcriptome data are conducted. The reported outcomes (Reported) are shown as orange filled box indicating death event is reported. The prediction results with the 24-organ data (Predicted by SVM with all 24 organs) are shown as light-blue filled boxes indicating “death event” is predicted to not-to-occur. The majority decisions (Majority decision) by counting all votes are shown as orange filled box voting for death event to occur. The number of “votes” for “to-occur” (Predicted results) and “not-to-occur” (Predicted results) is indicated by the numbers.

AE Prediction by Majority Decision Framework

Although the performance with two different algorithms (SVM and RF) were equivalent (Figure 4), we obtained differential results with the 24-organ and the individual-organ approaches (Figures 2 and 3). In the case of the individual-organ approach, which individual-organ dataset(s) to be used was critical for predicting the correct outcomes (Figure 3). Hence, it may be beneficial to conduct predictions with both the 24-organ and all individual-organs to maximize the effectiveness of the hMDB-i approach. For this purpose, we designed a majority decision framework and evaluated its effectiveness for predicting death event outcomes with alendronate and lenalidomide in female/20s group where the SVM approach with the 24-organ transcriptome dataset alone failed to predict its occurrence (see Methods for the detailed description, Figure 4C). The result shows that this majority decision framework effectively predicts the correct outcomes (Figure 4C).

AE Prediction by Link-Prediction Framework

Next, we tested the power of hMDB-i for the prediction of AE outcomes in a different framework. The relationship between the adverse effects and therapeutic indications has previously been exploited for clinical outcome predictions (Zhang et al., 2013). Hence, we examined whether such data could be utilized to enhance the prediction capability of hMDB-i. For this purpose, we devised a link-prediction (LP) framework as shown in Figure 5A (see Transparent Methods for the detailed description). Three training datasets are used: (1) the 24-organ transcriptome data of each of the 15 drugs, (2) AEs reported for each of all drugs in FAERS, and (3) all AEs reported in FAERS for each of all indications in FAERS (Figure 5A). In the LP framework, one-class SVM algorithm is used to train the models and the presence/absence of a link of an untrained drug (Drug Candidate X in Figure 5A), based on its 24-organ transcriptome pattern, to each AE is determined (see Transparent Methods for the detailed description).

Figure 5

AE Prediction by LP Framework

(A) LP framework to predict AEs. See the text for the description.

(B) The comparison of all AEs prediction by SVM/hMDB-i and the LP approaches. The results are summarized as a confusion matrix. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each method (Methods). The scale of the scores (0.0–1.0) are indicated at the bottom. SVM, SVM/hMDB-i framework; LP, LP framework. The raw data are also available as Table S6.

(C) The list of SAEs that are correctly predicted for alendronate, clozapine, and evolocumab (Repatha) by the LP but not by SVM/hMDB-i framework. The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating the SAE is reported and not-reported, respectively. The prediction results (Predicted by SVM/LP) are shown as orange and light-blue filled boxes indicating the SAE is predicted to occur and not-to-occur, respectively. The results are also available as Table S6.

AE Prediction by LP Framework (A) LP framework to predict AEs. See the text for the description. (B) The comparison of all AEs prediction by SVM/hMDB-i and the LP approaches. The results are summarized as a confusion matrix. Scores of accuracy (Accuracy), precision (Precision), recall (Recall), and F-measure (F measure) for each drug (Drug) are indicated by bars (light blue) for each method (Methods). The scale of the scores (0.0–1.0) are indicated at the bottom. SVM, SVM/hMDB-i framework; LP, LP framework. The raw data are also available as Table S6. (C) The list of SAEs that are correctly predicted for alendronate, clozapine, and evolocumab (Repatha) by the LP but not by SVM/hMDB-i framework. The reported outcomes (Reported) are shown as orange and light-blue filled boxes indicating the SAE is reported and not-reported, respectively. The prediction results (Predicted by SVM/LP) are shown as orange and light-blue filled boxes indicating the SAE is predicted to occur and not-to-occur, respectively. The results are also available as Table S6. The LP prediction of all AEs for three drugs, alendronate, clozapine, and evolocumab (Repatha), was conducted, and the result is summarized as a confusion matrix (Figure 5B, Table S6). Although the accuracy and precision scores are slightly better with hMDB-i alone (SVM), the LP framework (LP) improves hMDB-i in the recall scores (Figure 5B). This result indicates two properties of these two approaches: (1) The AE outcomes predicted by hMDB-i alone is more likely to be observed than those with the LP framework; (2) The LP framework is superior to hMDB-i alone in not missing the occurrence of potential AEs. The examples of such superior property of the LP framework are shown in Figure 5C listing some of the reported SAEs that are missed by the hMDB-i alone but captured by the LP framework. The full list of such can be found in Table S6.

Biological Insights into the AE Mechanisms

Insights into the biological mechanisms underlying AEs provide opportunities for designing the strategies to reduce the incidence of AEs during the drug development processes. For this purpose, RF algorithm is useful as it calculates feature importance of organ-gene datasets for the correct outcome predictions. Tables 1 and 2 show organs and cellular/molecular pathways that were found important for predicting the correct outcomes of the death event for males at 50s for aripiprazole. Top 5 and Top 8 organ-pathway combinations identified by REACTOME (Table 1) and KEGG (Table 2), respectively, are shown. The results for this and other drugs and age/sex groups are summarized in Tables S7 and S8. The result indicates a possibility that the hMDB-i is also useful for deducing mechanisms underlying the predicted AE outcomes.

Table 1

Enriched Pathways in REACTOME

Organ	Pathway	Entities FDR
Stomach	Detoxification of reactive oxygen species	0.00038
AdrenalG	PPARA activates gene expression	0.00062
AdrenalG	Regulation of lipid metabolism by Peroxisome proliferator-activated receptor alpha (PPARalpha)	0.00062
AdrenalG	Lipophagy	0.00282
WAT	Chaperone mediated autophagy	0.00282

The top 5-organ pathways for aripiprazole/male/50s are shown.

Table 2

Enriched Pathways in KEGG

Organ	Pathway	P, Adjusted
Spleen	Glycosphingolipid biosynthesis—ganglio series	0.00174
Eye	Salivary secretion	0.00904
AdrenalG	PPAR signaling pathway	0.01008
Stomach	Glutathione metabolism	0.01032
Stomach	Thyroid hormone synthesis	0.01032
Stomach	Arachidonic acid metabolism	0.01032
ThyroidG	Proteasome	0.01066
ThyroidG	Epstein-Barr virus infection	0.02666

The top 8-organ pathways for aripiprazole/male/50s are shown.

Enriched Pathways in REACTOME The top 5-organ pathways for aripiprazole/male/50s are shown. Enriched Pathways in KEGG The top 8-organ pathways for aripiprazole/male/50s are shown.

Predicting the Outcome Dynamics over Ages

Information regarding the quantitative differences of the AE occurrence among the sex/age group(s) would be beneficial for designing efficient clinical trials. It allows for selecting target subjects with less chance of observing an AE in the trial. Hence, we next evaluated the performance for predicting quantitative changes for the outcome incidences of AEs (see Transparent Methods for the detailed description, Figure 6, Table S9). For this purpose, we applied support vector regressor (SVR) and random forest regressor (RFR) algorithms to the hMDB-i framework. The number of death event reports changes over age to variable degrees for each drug (“Reported” in Figure 6). There also appears that the sex difference exists (F versus M in Figure 6). The comparison of the predictions by both algorithms (SVR and RFR) to the reported (Reported) incidence (death/all reported AEs) shows that the predictions by both algorithms efficiently capture the general trends of the quantitative changes of death event incidence over age for drugs/sex such as alendronate/female, doxycycline/male, aripiprazole/female (Figure 6, Table S9). On the other hand, the predictions for other drugs (e.g., clozapine: both sexes, empagliflozin:female, teriparatide:female) appear less efficient (Figure 6, Table S9). Although further improvements are necessary, the result suggests that the approach could be exploited to select specific sex/age groups as target subjects in clinical trials of at least some drugs.

Figure 6

Prediction of Quantitative Dynamics of Death Events Over Age

The results of predicting quantitative dynamics of death events at each age group for each drug are summarized. The results by SVM regressor (SVM) and RF regressor (RF) are shown. The age groups are 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old. The results are also available as Table S9.

Prediction of Quantitative Dynamics of Death Events Over Age The results of predicting quantitative dynamics of death events at each age group for each drug are summarized. The results by SVM regressor (SVM) and RF regressor (RF) are shown. The age groups are 30: 30–39 years old, 40: 40–49 years old, 50: 50–59 years old, 60: 60–69 years old, 70: 70–79 years old, 80: 80–89 years old. The results are also available as Table S9.

Predicting Therapeutic Indications

Identifying an appropriate therapeutic indication(s) (TIs) is also critical in drug development. Hence, we evaluated the utility of the multi-organ transcriptome datasets for predicting TIs. For this purpose, we adapted the LP framework (Figure 7A). Three training datasets are used: (1) the 24-organ transcriptome data of each drug of the 15 drugs, (2) all indications reported for each of the 15 drugs in FAERS, and (3) incidence of all AEs reported in FAERS for each of indications in FAERS (Figure 7A). In the LP framework, one-class SVM algorithm is used to train the models and the presence/absence of a link of an untrained drug (Drug Candidate X in Figure 7A), based on its 24-organ transcriptome data, to each TI is determined (see Transparent Methods for the detailed description, Figure 7A).

Figure 7

Prediction of Therapeutic Indications

(A) LP framework to predict therapeutic indications. See text for the detailed description.

(B) Cross-validation with FAERS and SIDER references. The hold-out scheme is shown at the top. The results are shown as a confusion matrix. The scores are indicated as light blue bars. Scores of precision and F-measure for cisplatin and lenalidomide are incalculable (black filled cells) as both TP and FP scores are 0. The results are also available as Tables S10 (FAERS) and S12 (SIDER).

(C) Repositioning prediction. The scheme is shown at the top. The results with FAERS and SIDER references are shown as a confusion matrix. The scores are indicated as light blue bars. TP, true positive; TN, true negative; FP, false positive; FN, false negative. The scale of the scores (0.0–1.0) are indicated at the bottom. FP as potential repositioning targets are yellow highlighted. The results are also available as Tables S11 (FAERS) and S13 (SIDER).

Prediction of Therapeutic Indications (A) LP framework to predict therapeutic indications. See text for the detailed description. (B) Cross-validation with FAERS and SIDER references. The hold-out scheme is shown at the top. The results are shown as a confusion matrix. The scores are indicated as light blue bars. Scores of precision and F-measure for cisplatin and lenalidomide are incalculable (black filled cells) as both TP and FP scores are 0. The results are also available as Tables S10 (FAERS) and S12 (SIDER). (C) Repositioning prediction. The scheme is shown at the top. The results with FAERS and SIDER references are shown as a confusion matrix. The scores are indicated as light blue bars. TP, true positive; TN, true negative; FP, false positive; FN, false negative. The scale of the scores (0.0–1.0) are indicated at the bottom. FP as potential repositioning targets are yellow highlighted. The results are also available as Tables S11 (FAERS) and S13 (SIDER). We evaluated the performance of this framework by omitting the data of one drug from all training datasets at a time and repeating this for all 15 drugs to evaluate the performance of predicting potential TIs of each omitted drug (Figure 7B). The result is shown as a confusion matrix (Figure 7B), and the full list is available as Table S10. The accuracy scores are high (>0.78) for all 15 drugs (Figure 7B), indicating that more than 78% of the indications or non-indications predicted are indeed reported or not to be reported as the indications for each drug, respectively. The recall scores are also high (>0.8) for alendronate, aripiprazole, asenapine, clozapine, empagliflozin, lurasidone, olanzapine, evolocumab (Repatha), risedronate, sofosbuvir, and teriparatide (Figure 7B), indicating that over 80% of the reported indications are predicted for these 11 drugs by the method. The recall score of doxycycline is 0.527 (Figure 7B), indicating approximately 50% of the reported indications are predicted for this drug by the method. The recall scores are low for acetaminophen (0.141), cisplatin (0), and lenalidomide (0) (Figure 7B), indicating that this method fails to capture many reported indications for these drugs, as expressed by relatively smaller number of false-negatives (FNs) as compared with that of true-positives (TPs) for these drugs (Figure 7B). Only acetaminophen (APAP) shows high precision score (1.000), and all the others show low precision scores (<0.35). Both cisplatin and lenalidomide show 0 TP and 0 FN, thus the precision and the F-measure scores were unable to be calculated (Figure 7B). Such low precision scores for many drugs are mainly due to a large number of false-positives (FPs) as compared with that of TPs (Figure 7B). The result illustrates a possibly useful application of this LP framework to the prediction of drug TIs (see further in Discussion).

Application to Drug Repositioning

We also evaluated the utility of the multi-organ transcriptome datasets for drug repositioning. In the scheme described in Figure 7B, the FP TIs (yellow highlight in Figure 7B) (the full list as Table S10) could include repositioning targets for the drugs (see further in Discussion). However, in drug repositioning, drug X does not exist. Hence, we used the same LP framework but did not omit the data of any drugs, i.e., the datasets of the drug of the prediction target is also included in the training datasets (Figure 7C). The result is shown as a confusion matrix (Figure 7C) and the full list is available as Table S11. The result shows the increased number of TPs and decreased number of FNs for all drugs, resulting in the improved recall scores (Figure 7C). In this scheme, both accuracy and recall scores for all drugs are 0.770–1.000 (Figure 7C), indicating the approach can capture over 77% of both reported and non-reported indications. The precision scores for all drugs remain relatively low due to the large number of FPs as compared with that of TPs (Figure 7C and Table S11). The indications found in FP (yellow highlight in Figure 7C) (the full list as Table S11) could include repositioning targets (see further in Discussion). As the number of FPs are relatively large, calculating decision function values of each indication found in FP in both schemes could facilitate the ranking of each indication according to their likeliness as repositioning targets. Hence, the decision function values of all indications in both Figures 7B and 7C are shown in Tables S10 and S11. These tables could serve as a useful reference for selecting potential repositioning targets for further investigational evaluations (see further in Discussion). We also validated our indication prediction with SIDER4.1 version. The result, 0.402–0.998 accuracy and 0.135–1 recall (omitting cisplatin and lenalidomide for the reason described above for the FAERS result in the leave-one-out cross-validation), is comparable with that with FAERS (Figures 7B and 7C, the full and raw data for the confusion matrix are available as Tables S12 and S13). The precision is 0.003–0.069, appears slightly lower than those with FAERS (Figures 7B and 7C), owing to the larger number of FPs. This may be caused by the larger total number of indications reported in FAERS (11,312 indications) as compared with that in SIDER (2,088 indications), contributing to the reduced precision with SIDER reference.

Side-by-Side Comparisons of Our Methods to Other Multi-Features-Based Prediction Methods

Some other in silico approaches predicting SEs/AEs are previously reported (Pauwels et al., 2011, Wang et al., 2016). In particular, Wang et al. reported an SEs/AEs prediction method by combining multi-features (cell morphological features, MACCS chemical fragments, L1000 landmark genes) with adverse drug reactions labels (http://maayanlab.net/SEP-L1000/index.html#) (Wang et al., 2016). We compared the SEs/AEs prediction performance of our method with theirs side by side (Figure 8A, Table S14). Of 15 drugs that we analyzed, 9 drugs (alendronate, aripiprazole, acetaminophen, asenapine, cisplatin, clozapine, doxycycline, lenalidomide, olanzapine) are covered by their method. Their method is applicable to only small-molecule drugs, hence SEs/AEs of biologics/antibody drugs (evolocumab/Repatha in our study) or peptide drugs (teriparatide in our study) can be predicted only by our method, illustrating the advantage of our drug modality-independent approach.

Figure 8

Side-by-Side Comparisons of Our Methods with Other Multi-Features-Based Prediction Methods

The comparisons are shown as Venn diagrams for SE/AE (A) and TI (B) predictions. The description of the diagrams is shown as “Legend” diagram located at the right bottom in each panel. The number of the SEs/AEs and TIs predicted only by our method (A: hMDB, B: hMDB/LP), only by the existing methods (A: multiple-features/L1000, B: fingerprints/targets/Interactions), and by both methods are indicated accordingly in the diagrams. Our methods (A: hMDB, B: hMDB/LP) are compared with multiple-features/L1000 method (A) and fingerprints/targets/Interactions method (B), respectively. The predictions for the drugs that are not represented or cannot be predicted by the other methods are indicated as 0. Of SEs/AEs and TIs predicted only by our methods, those labeled in SIDER, FAERS, and the processed FAERS databases are also indicated. SEs for evolocumab/Repatha do not exist in SIDER; hence, it is indicated as 0. The table forms of the Venn diagrams and their full searchable raw data are available as Tables S14 (A) and S15 (B).

Side-by-Side Comparisons of Our Methods with Other Multi-Features-Based Prediction Methods The comparisons are shown as Venn diagrams for SE/AE (A) and TI (B) predictions. The description of the diagrams is shown as “Legend” diagram located at the right bottom in each panel. The number of the SEs/AEs and TIs predicted only by our method (A: hMDB, B: hMDB/LP), only by the existing methods (A: multiple-features/L1000, B: fingerprints/targets/Interactions), and by both methods are indicated accordingly in the diagrams. Our methods (A: hMDB, B: hMDB/LP) are compared with multiple-features/L1000 method (A) and fingerprints/targets/Interactions method (B), respectively. The predictions for the drugs that are not represented or cannot be predicted by the other methods are indicated as 0. Of SEs/AEs and TIs predicted only by our methods, those labeled in SIDER, FAERS, and the processed FAERS databases are also indicated. SEs for evolocumab/Repatha do not exist in SIDER; hence, it is indicated as 0. The table forms of the Venn diagrams and their full searchable raw data are available as Tables S14 (A) and S15 (B). We performed side-by-side comparisons of our method with theirs with these nine drugs represented by both methods (Figure 8A, Table S14). With seven drugs (alendronate, acetaminophen, asenapine, cisplatin, clozapine, doxycycline, olanzapine), our method detects far more SEs/AEs labeled in SIDER or the processed FAERS (Figure 8A, Table S14). Even with the other two drugs (aripiprazole, lenalidomide), our method identifies a large number of SEs/AEs (aripiprazole: 93 in SIDER and 37 in the processed FAERS, lenalidomide: 71 in SIDER and 50 in processed FAERS) that are missed by their method (Figure 8A, Table S14). Prediction of TIs of small-molecule drugs on the basis of their structural information, targets, and interactions is previously reported (Li and Lu, 2012). We performed side-by-side comparisons of our method to theirs in predicting TIs (Figure 8B, Table S15). Of 15 drugs that we studied, 6 drugs (alendronate, acetaminophen, cisplatin, clozapine, doxycycline, risedronate) are covered by their method. Like the SEs/AEs prediction, our method is modality free; hence, we can predict TIs of biologics/antibody drugs (evolocumab/Repatha) and peptide drugs (teriparatide), which their small-molecule-based prediction method fails. This again illustrates the advantage of our drug modality-independent approach. We performed side-by-side comparisons of our method to theirs with these six drugs represented by both methods (Figure 8B, Table S15) and show that our method detects far more TIs labeled in SIDER or FAERS with all six drugs (Figure 8B, Table S15).

Discussion

The difficulty of predicting human clinical outcomes during pre-clinical studies is the major hindrance in drug development. This is mainly due to the animal models versus human difference in the outcomes. To overcome this problem, numerous experimental and computational approaches are currently used (Elitt et al., 2018, Ko and Gelb, 2014, Meseguer-Ripolles et al., 2018, Oleaga et al., 2016, Rezaei Kolahchi et al., 2016, Shah et al., 2019, Tateno et al., 2004, Vamathevan et al., 2019). Some approaches use transcriptome and/or other omics data to predict clinical outcomes (Ho et al., 2016). However, they use only cells and limited types of organs/tissues and none exploit the whole-body level biological features such as the body-wide multi-organ transcriptome data for clinical outcome prediction. In the current study, we exploit multi-organ transcriptome datasets in mice to predict human clinical outcomes (Figure 1). We “humanize” the mouse data by training machine learning models with multi-organ transcriptome data derived from the mouse and human clinical datasets (Figure 1). Our evaluation illustrates advantages and effectiveness of this approach in several applications to the outcome prediction of both AEs and TIs (Figures 2, 3, 4, 5, 6, and 7). Several other in silico approaches predicting SEs/AEs and TIs have been previously reported (Li and Lu, 2012, Pauwels et al., 2011, Wang et al., 2016). The side-by-side comparisons of our methods with theirs shows that our method detects far more SEs/AEs and TIs than theirs with most of the 15 drugs evaluated here (Figure 8). Even with the drugs that both methods are comparable (aripiprazole and lenalidomide in the SE/AE prediction), our method detected the outcomes that are missed by theirs (Figure 8A). These results illustrate a usefulness of exploiting multi-organ transcriptome datasets in predicting drug SEs/AEs and TIs. Potential associations of SEs with diseases are computed on the basis of drug fingerprints/targets and their known SEs/potential disease indications and reported as an integrated table (https://astro.temple.edu/∼tua87106/druganalysis.html) (Zhang et al., 2013). Although, by using this table, they suggest that drug-repositioning candidates could be identified, they fail to explicitly tabulate specific drug-repositioning targets or validate them. In this paper, we exploit potential SE/AE-TI association and combined it with our body-wide multi-organ transcriptome approach in the LP framework to compute drug TIs and explicitly tabulate them (Figure 7). We, furthermore, evaluate the effectiveness and the efficiency of our method by cross-validation (Figure 7B). There are two major advantages of our approach over the other in silico prediction methods. The first is its drug-modality independence. This is illustrated by the effective predictions for small-molecule (alendronate, acetaminophen, aripiprazole, asenapine, cisplatin, clozapine, doxycycline, empagliflozin, lenalidomide, lurasidone, olanzapine, risedronate, sofosbuvir), antibody (evolocumab/Repatha), and peptide (teriparatide) drugs. In principle, our approach is expected to be as effective for other modalities such as nucleic acid, gene, and cell therapeutics and also for their mixtures, as the approach does not use any structural features of the drugs for the predictions. The other advantage is the fact that the approach requires no prior knowledge about the molecular or cellular targets of the drugs or the mechanisms of their actions. Hence, the only requirement of our approach is that the test drug can be administered to the mouse. In our approach, the effective prediction is critically dependent on identifying common transcriptome features between the drugs in the models and the target/test drug (i.e., drug X). The 15-drug group in the current study consists of those with diverse AE and indication patterns. Despite such complex clinical outcome patterns, the datasets from such a small number of drugs are surprisingly sufficient to effectively predict the outcomes of many AEs and TIs for most of the drugs in the cross-validation studies (Figures 2, 3, 4, 5, 6, 7, and 8). This may be due to the presence of many possibly useful features in the multi-organ transcriptome data for the predictions that could not be identified in the transcriptome or other omics data of the cells and the limited types of organs/tissues. Although the use of the 24-organ transcriptome data (i.e., hMDB-i approach) is generally effective in predicting AEs, we find the differential performance by using all 24-organ data together or using them as individual-organ datasets (Figure 3). We also observed differential performance depending on which of the 24 individual-organ datasets is used (Figure 3). To address these issues, we suggest that majority decision approach could be beneficial and illustrate such examples (Figure 4C). Furthermore, the use of hMDB-i approach without and within the LP framework appears to influence the prediction consequence (Figures 5B and 5C). The accuracy and precision scores are better without the LP framework, but the recall scores are better with the LP framework. Hence, it may be beneficial to select the framework (i.e., with/without the LP framework) according to the objective of the prediction (e.g., to be more confident about the predicted AEs versus to capture all possible AEs). Drug repositioning is a strategy to develop the existing drug for an additional therapeutic indication(s) (Pushpakom et al., 2019). The advantage is that the properties (such as pharmacokinetics) of such drugs are already determined. Furthermore, their safety in human is secured; hence, phase I trials could often be bypassed (Pushpakom et al., 2019). Such advantage of drug repositioning results in the overall cost- and time-saving in the development, as compared with new drug development (Pushpakom et al., 2019). We examined the potential utility of our approach to predict TIs for drug repositioning (Figure 7). The potential therapeutic targets for the repositioning of each drug are those that are labeled as “FP” in Figures 7B and 7C and Tables S10 and S11. The examination of the list identified many common FP indications shared among the drugs of the same or related mechanisms of actions (Tables S10 and S11). For alendronate and risedronate, both bisphosphonate medications for treating osteoporosis, Addison's disease and amyloidosis are among those predicated as FP targets with high decision function values. For clozapine and aripiprazole, both atypical anti-psychotic medications antagonizing related neurotransmitter receptors, coagulopathy and ulcer are examples of such FP targets. Although their investigational validations must await the future clinical trials, such targets shared among the related medications could be among the leading candidates for the repositioning. In the study of predicting TIs, we show that both cisplatin and lenalidomide failed to identify true positive targets (Figure 7B). On the other hand, including these two drugs in the training data for the same prediction framework resulted in the improved identification of the true-positive targets (Figure 7C). These results may suggest that the gene expression patterns of these two drugs include distinguishing features that are unique to each of these two drugs; hence, such features could not be exploited by the scheme shown in Figure 7B (i.e., the data of these drugs are omitted from the training data for their predictions). This point is supported by the fact that the inclusion of the transcriptome data of these two drugs in the training datasets lead to the effective outcome predictions for the other drugs (Figure 7B). Furthermore, such unique features can be effectively exploited when included in the training data, resulting in the successful identification of the true-positives (Figure 7C). In drug development and clinical trials, it would be important and beneficial to understand the mechanisms of the outcomes. Such mechanistic understanding could serve as a basis for further improvement on the drug designs and also designing the pre-clinical and clinical studies. Furthermore, availability of the mechanisms facilitates the regulatory approval processes and also improves ethical responsibilities providing an explanation to the study subjects and patients. Using RF algorithm allows us to identify specific molecular/cellular pathways that contributed to the AE predictions of the drug, and the examples are shown in Tables 1 and 2 and the full lists are available in Tables S7 and S8. Although such pathways are those modulated by the drugs in mice and may not necessarily be the same in human, many are conserved across animal species. Hence, they could serve as at least, if not at all, as clues for deducing possible mechanisms underlying the AEs and/or generating working hypothesis for further experimental and clinical studies for the validation. The current study illustrates beneficial utilities of applying body-wide multi-organ transcriptome data obtained from mice to predict human clinical outcomes. The further inclusion of other drugs' multi-organ transcriptome and their clinical outcome data in hMDB-i and other relevant data, such as individuals' genotypes and lifestyle information, as training datasets may improve the current performance where some limitations are observed with some drugs and/or some outcomes. With such refinements in the training datasets, further improvements in the individualized predictions is expected in the future, contributing to the realization of the ultimate virtual precision medicine.

Limitations of the Study

In the prediction study of therapeutic indications, we used all therapeutic indications associated with each drug in FAERS as training data. Thus, no distinction can be made between the effective and ineffective treatments. Furthermore, all indications for which each drug was used are included as the training data for that particular drug, i.e., the indications for which other drug(s) was/were used for the purpose of treatments and drug X as simply an accompanying drug were included as “reported” therapeutic indications for drug X in the training data. For example, acetaminophen is used to attenuate pain associated with osteoporosis; hence, both pain and osteoporosis were included as the indications for this drug. Furthermore, we also used all AE labels associated with each drug and/or indication. Hence, no distinctions were made between those of causative relationships and of simple association. Currently, using FAERS, it is difficult to make such distinctions for all drugs in a systematic manner. Some distinction may be possible by using other public database such as clinicaltrials.gov, but the number of reports is limited and not sufficient for our machine-learning approach. Despite such compromise, the fact that our approach predicts many human clinical outcomes according to the quality of the input training data indicates that, with further refinement in the input training datasets, it is reasonable to expect that our method generates more sophisticated predictions. In fact, we validated our predictions with a drug-label based reference, SIDER. The results show that both SE/AE and TI predictions with SIDER are comparable with those with FAERS (Figures 2D, 7B, and 7C, Tables S3, S12, and S13). These results further support the effectiveness and usefulness of our approach. Owing to the experimental cost necessary to perform the 24-organ transcriptome analyses, the current study is limited to the evaluation of 15 drugs. There are 3,732 approved drugs (https://www.drugbank.ca/stats), and it would be extremely expensive to perform the 24-organ transcriptome analyses with most, if not all, of these drugs. With the current cost for the transcriptome analyses, we estimate such cost would be US$20–30 million. This is expensive for one laboratory scale but possibly realistic as a consortium effort. Considering the expensiveness of drug development, this might be a worthy investment. In fact, we may not need all drug data, as we show even the 15-drug datasets were sufficient for many of the outcome predictions, despite their diverse modalities and TIs (Figures 2, 3, 4, 5, 6, 7, and 8).

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

22 in total

Review 1. The cost of drug development: a systematic review.

Authors: Steve Morgan; Paul Grootendorst; Joel Lexchin; Colleen Cunningham; Devon Greyson
Journal: Health Policy Date: 2011-01-21 Impact factor: 2.980

2. Data-driven prediction of drug effects and interactions.

Authors: Nicholas P Tatonetti; Patrick P Ye; Roxana Daneshjou; Russ B Altman
Journal: Sci Transl Med Date: 2012-03-14 Impact factor: 17.956

3. Drug-induced adverse events prediction with the LINCS L1000 data.

Authors: Zichen Wang; Neil R Clark; Avi Ma'ayan
Journal: Bioinformatics Date: 2016-04-01 Impact factor: 6.937

4. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system.

Authors: R Harpaz; W DuMouchel; P LePendu; A Bauer-Mehren; P Ryan; N H Shah
Journal: Clin Pharmacol Ther Date: 2013-02-11 Impact factor: 6.875

Review 5. Drug repurposing: progress, challenges and recommendations.

Authors: Sudeep Pushpakom; Francesco Iorio; Patrick A Eyers; K Jane Escott; Shirley Hopper; Andrew Wells; Andrew Doig; Tim Guilliams; Joanna Latimer; Christine McNamee; Alan Norris; Philippe Sanseau; David Cavalla; Munir Pirmohamed
Journal: Nat Rev Drug Discov Date: 2018-10-12 Impact factor: 84.694

6. Predicting drug side-effect profiles: a chemical fragment-based approach.

Authors: Edouard Pauwels; Véronique Stoven; Yoshihiro Yamanishi
Journal: BMC Bioinformatics Date: 2011-05-18 Impact factor: 3.169

Review 7. Defining the interorgan communication network: systemic coordination of organismal cellular processes under homeostasis and localized stress.

Authors: Ilia A Droujinine; Norbert Perrimon
Journal: Front Cell Infect Microbiol Date: 2013-11-19 Impact factor: 5.293

8. The Body-wide Transcriptome Landscape of Disease Models.

Authors: Satoshi Kozawa; Ryosuke Ueda; Kyoji Urayama; Fumihiko Sagawa; Satsuki Endo; Kazuhiro Shiizaki; Hiroshi Kurosu; Glicia Maria de Almeida; Sharif M Hasan; Kiyokazu Nakazato; Shinji Ozaki; Yoshinori Yamashita; Makoto Kuro-O; Thomas N Sato
Journal: iScience Date: 2018-03-29

9. Serious Adverse Drug Events Reported to the FDA: Analysis of the FDA Adverse Event Reporting System 2006-2014 Database.

Authors: Kalyani B Sonawane; Ning Cheng; Richard A Hansen
Journal: J Manag Care Spec Pharm Date: 2018-07

Review 10. Artificial intelligence and machine learning in clinical development: a translational perspective.

Authors: Pratik Shah; Francis Kendall; Sean Khozin; Ryan Goosen; Jianying Hu; Jason Laramie; Michael Ringel; Nicholas Schork
Journal: NPJ Digit Med Date: 2019-07-26