Literature DB >> 27940607

Synergistic drug combinations from electronic health records and gene expression.

Yen S Low1, Aaron C Daugherty2, Elizabeth A Schroeder2, William Chen1, Tina Seto3, Susan Weber3, Michael Lim4, Trevor Hastie4,5, Maya Mathur6, Manisha Desai6, Carl Farrington2, Andrew A Radin2, Marina Sirota2, Pragati Kenkare7, Caroline A Thompson7, Peter P Yu7, Scarlett L Gomez5,8, George W Sledge9, Allison W Kurian5,9, Nigam H Shah1.   

Abstract

OBJECTIVE: Using electronic health records (EHRs) and biomolecular data, we sought to discover drug pairs with synergistic repurposing potential. EHRs provide real-world treatment and outcome patterns, while complementary biomolecular data, including disease-specific gene expression and drug-protein interactions, provide mechanistic understanding.
METHOD: We applied Group Lasso INTERaction NETwork (glinternet), an overlap group lasso penalty on a logistic regression model, with pairwise interactions to identify variables and interacting drug pairs associated with reduced 5-year mortality using EHRs of 9945 breast cancer patients. We identified differentially expressed genes from 14 case-control human breast cancer gene expression datasets and integrated them with drug-protein networks. Drugs in the network were scored according to their association with breast cancer individually or in pairs. Lastly, we determined whether synergistic drug pairs found in the EHRs were enriched among synergistic drug pairs from gene-expression data using a method similar to gene set enrichment analysis.
RESULTS: From EHRs, we discovered 3 drug-class pairs associated with lower mortality: anti-inflammatories and hormone antagonists, anti-inflammatories and lipid modifiers, and lipid modifiers and obstructive airway drugs. The first 2 pairs were also enriched among pairs discovered using gene expression data and are supported by molecular interactions in drug-protein networks and preclinical and epidemiologic evidence.
CONCLUSIONS: This is a proof-of-concept study demonstrating that a combination of complementary data sources, such as EHRs and gene expression, can corroborate discoveries and provide mechanistic insight into drug synergism for repurposing.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Entities:  

Keywords:  breast cancer; combination therapies; drug discovery; drug interactions; drug repurposing; electronic health records

Mesh:

Year:  2017        PMID: 27940607      PMCID: PMC6080645          DOI: 10.1093/jamia/ocw161

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


INTRODUCTION

Electronic health records (EHRs) reflect real-world treatment patterns including polypharmacy, offering a unique opportunity to study drug-associated outcomes for drug safety and repurposing efforts. Molecular data, such as gene expression, and drug-protein interactions offer possible mechanistic insight into drug-disease relationships. These 2 types of data strongly complement each other, for example, in assessing the repurposing potential of existing drug combinations. Prior studies have mainly focused on discovering adverse effects of single or combined drugs (ie, drug-drug interactions,) or repurposing single drugs,, such as metformin for breast cancer. Although there have been mixed results in replicating metformin’s apparent anticancer benefit,, preliminary results from ongoing clinical trials appear promising. Beyond repurposing individual drugs, combinations of drugs may yield adjuvant therapeutic effects or allow lower doses to achieve the same therapeutic effects while minimizing the undesirable side effects triggered at higher doses. Drugs can interact with each other such that the bioavailability of one drug is increased or prolonged (pharmacokinetic interaction) or the target receptor or pathway is modulated to elicit a stronger therapeutic response (pharmacodynamic interaction). Additionally, finding beneficial combinations of approved or investigational drugs can save considerable cost and time, because some safety assessments have already been performed., Such multidrug synergism is currently discovered experimentally through large-scale target screening and theoretically through reasoning based on known pharmacokinetic or pharmacodynamic interactions. This study demonstrates the novel use of both EHRs and molecular data to discover and validate pairs of drugs whose combined therapeutic effect on mortality among breast cancer patients appears to be greater than that of the individual drugs alone. Our approach for eliciting beneficial pairs of drugs is a first step toward discovering more complex multidrug combinations that can optimize the use of existing drugs.

METHODS

Analysis of electronic health records

EHR data sources

We used Oncoshare, a breast cancer database linking long-term survival outcomes from the California Cancer Registry with EHRs detailing patient, tumor, and treatment information from a tertiary cancer care center (Stanford Hospital) and a neighboring community health system (Palo Alto Medical Foundation, PAMF). Oncoshare followed 14 885 female patients (at least 18 years old) with a breast cancer diagnosis in the registry and sought treatment at Stanford Hospital or PAMF between January 2000 and April 2013. To determine 5-year mortality, we included only patients who were followed for at least 5 years starting from the index date of breast cancer diagnosis or who died within the follow-up period. By excluding patients who were followed <5 years (ie, diagnosed after April 2008), we minimized the loss to follow-up; this process was already facilitated through statewide mortality tracking by the California Cancer Registry. We extracted 1531 demographic, tumor (eg, stage, hormone subtype), and treatment (eg, prescriptions, radiotherapy, diagnostic imaging) variables. Data and methodological details on Oncoshare can be found in Kurian et. al. Individual drugs were analyzed as generic ingredients according to RxNORM as well as aggregated into 95 drug classes according to the Anatomical Therapeutic Chemical (ATC) system (Supplementary Table S1). Demographic and tumor variables, if missing and comprising at least 10% of the cohort, were coded as “unknown” and analyzed as a separate category, or otherwise (if <10% were missing) replaced by mode imputation (for categorical variables) or mean imputation (for continuous variables). To examine concomitant drug exposures, we set up a data matrix in which each row is an exposure period for every unique drug combination (Figure 1A). This matrix contains 171 940 unique exposure periods derived from 9945 eligible patients. A cumulative exposure variable measures the duration patients were exposed to that drug combination during their follow-up time. Drugs and drug classes used for fewer than 14 days (cumulative) or present in less than 0.5% of the exposure periods were removed, leaving 294 drugs (Supplementary Table S1) for analysis of 43 071 possible pairwise combinations. Variables entered into the logistic regression included all demographic, tumor, and treatment variables. We examined for association at both the individual drug level and the ATC drug class level.
Figure 1.

Method overview of (A) scoring EHR-based synergistic drug pairs, (B) scoring gene expression–based synergistic drug pairs, and (C) gene set enrichment analysis–like analysis of enrichment of EHR-based drug class pairs among gene expression–based drug pairs.

Method overview of (A) scoring EHR-based synergistic drug pairs, (B) scoring gene expression–based synergistic drug pairs, and (C) gene set enrichment analysis–like analysis of enrichment of EHR-based drug class pairs among gene expression–based drug pairs.

Synergism score from EHR

To identify potentially interesting associations between 5-year mortality and pairwise combinations between drugs and drug classes, we used lasso regularization on a logistic regression model with pairwise interactions (Equation 1). Drug interactions were modeled as statistical interactions. Here, we used Group-Lasso INTERaction NETwork (glinternet), an overlap group lasso designed to select a pairwise interaction effect β along with its constituting main effects β and β. A main effect β refers to the contribution of an independent variable x toward the response (log odds of 5-year mortality where p is the probability of 5-year mortality) while ignoring all other independent variables. An interaction effect β arises when 2 independent variables, x and x, influence each other’s contribution toward the response. For example, although drug i and drug j are individually associated with an outcome by β and β, respectively, when used together, they may modify each other’s contribution toward the response such that the combined response (β + β + β) is not simply the sum of their individual parts (β + β). An interaction effect is termed synergistically beneficial when the combined effect is more negatively correlated with mortality than the most negative association of individual drugs. An interaction effect is synergistically adverse when the combined effect is more positively associated with mortality than the maximum positive association of individual drugs. An interaction may also be antagonistic when the combined effect is closer to null than either drug’s effect. Note that these terms describe the net association of the drug combination relative to that of individual drugs instead of the sign of the interaction coefficient.

Modeling implementation

Interactions involving categorical and continuous variables were handled differently in the glinternet R package (version 3.1.0). For each categorical variable (eg, tumor stage), all possible levels (0 to IV, unknown) and their pairwise interactions with another variable (eg, received zoledronate or not) were considered in a group lasso. Modeling parameters were set to select up to 300 interaction terms for computational tractability. We set aside a 10% hold-out set for model validation and a 10% tuning set for tuning the lasso penalty factor, λ. After obtaining the optimal λ from the 10% tuning set by 3-fold cross-validation, we trained the regression model on the full non–hold-out set. Finally, trained models were then validated on the 10% hold-out set. All reported performance measures (eg, sensitivity, specificity, area under curve) were from validating the models in the hold-out set. We generated 95% confidence intervals (CIs) for the beta coefficients of the regression model by bootstrapping 500 times, fitting a regression model to each bootstrapped sample. Bootstrap resampling was performed at the patient level instead of the exposure period level to account for within-patient correlated periods. In other words, patients were randomly sampled with replacement such that all their exposure periods were also sampled together. Each bootstrap sample also maintained the case-control ratio (at the patient level) and had approximately the same number of periods as the original training sample. This generated 500 different values for each beta coefficient, where the 2.5th and 97.5th percentiles were taken as the limits of the 95% CI.

Analysis of gene expression data

Molecular data sources

From ArrayExpress and Gene Expression Omnibus, 14 gene expression datasets from breast tissue of patients matched to healthy controls, or of tumor tissue matched to normal tissue within the same patient, were appropriately formatted for use in the analysis (Supplementary Table S2). Raw data were downloaded and normalized using Robust Multi-array Average (RMA) (R package Affy) after low-quality samples were removed by ArrayQualityMetrics. When raw data were unavailable, processed data were used instead. For microarray data, GeoDE was used to identify significant differentially expressed genes. For RNA-seq data, raw reads were downloaded and quality trimmed (trimGalore), and transcripts quantified (kallisto). Default settings were used for all packages.

Breast cancer association score of drug pairs from gene expression data

Differentially expressed genes were mapped to proteins using UniProt identifiers. Differentially expressed proteins in breast cancer, drugs linked by drug-protein interactions (DPIs; DrugBank.ca v4.0), and proteins linked by protein-protein interactions (PPIs; Dr PIAS) were integrated in a network (Figure 1B). Inclusion of PPI data in this network captures potentially relevant drug-protein relationships in which a drug’s direct interacting protein, or target, may not itself be differentially expressed, but may have altered activity in breast cancer (eg, drug interacting with a transcriptional regulator). Additionally, including PPI can improve the reproducibility of molecular models of cancer., Drugs were scored according to: (1) the number of proteins differentially expressed in breast cancer with which that drug’s targets interact, (2) the confidence and directionality of those interactions, and (3) the consistency of differential protein expression across individual breast cancer datasets. Higher scores indicate increased molecular association with breast cancer and potential therapeutic efficacy. After scoring drugs individually, scores were assigned to over 10 million possible drug pairs. Synergistically beneficial pairs were defined as those in which the union of the 2 drugs’ protein interactions resulted in a higher association score compared to the maximum of either drug’s individual score. Scores for all nonsynergistic pairs were set to 0.

Enrichment of EHR-derived drug pairs among gene expression–derived drug pairs

Using a method similar to gene set enrichment analysis (GSEA), we determined the enrichment of drug pairs coming from classes identified as synergistically interacting from the EHR among the highest scoring drug pairs identified using gene expression. First, all synergistic drug pairs identified using gene expression were ranked by their synergism score (Figure 1C, shaded area). Then, starting with the highest ranked gene expression–derived drug pair, a cumulative sum (Figure 1C, Enrichment score) was either increased (if the pair consisted of drugs present in EHR-derived synergistic class pairs) or decreased (if both drugs were not present in the EHR-derived synergistic drug classes). The value added to the cumulative sum was proportional to the drug pair’s breast cancer association score, while the value subtracted was dependent on the number of total drug pairs examined, such that the cumulative sum was normalized between −1 and 1. For drug pairs with tied synergism scores, the value computed for all tied pairs was added to or subtracted from the cumulative sum at the first drug pair in the tie. Subsequent pairs in the tie did not affect the cumulative sum. A raw enrichment score was derived based on the maximum deviation of the cumulative sum from 0. To determine statistical significance, we obtained a median baseline from 10 000 bootstrap samples of random drug pairs. A normalized enrichment score (NES) ratio (ie, raw enrichment score divided by median baseline) greater than 1 with low P value indicates significant enrichment.

Results and Discussion

Study cohort

To discover synergistic drug combinations from EHRs, we used a final study cohort (Table 1 and Figure 1A) consisting of 9945 patients who either died within 5 years starting from the index date of breast cancer diagnosis (1212 cases) or were followed for at least 5 years (8733 controls).
Table 1.

Patients who died within (cases) or survived (controls) 5 years of breast cancer diagnosis

Patient CharacteristicCases/Dead (n = 1212)
Controls/Alive (n = 8733)
Total (n = 9945)
N or mean% or SDN or mean% or SDN or mean% or SD
Agea
 <4012110%7879%9089%
 40–4922118%240328%2,62426%
 50–5925121%249029%2,74128%
 60–6921918%179421%2,01320%
 ≥7040033%125914%1,65917%
Year of diagnosis
 2000–200333428%298834%3,32233%
 2004–200636630%322237%3,58836%
 2007–200940233%252329%2,92529%
 2010–20111109%00%1101%
Race
 White/unknown99782%710981%8,10682%
 Black625%1922%2543%
 Asian/Pacific islander15213%142316%1,57516%
 Native American<100.1%<100.1%<100.1%
Marrieda66155%581367%6,47465%
Socioeconomic statusa
 Lowest 20%746%2663%3403%
 21st–40th percentile14212%6077%7498%
 41st–60th percentile17414%97511%1,14912%
 61st–80th percentile24520%173920%1,98420%
 Top 20%57748%514659%5,72358%
Hormone receptor subtype
 ER+ only988%6127%7107%
 ER+/PR+ and HER2+1159%8199%9349%
 HER2+ only1018%3624%4635%
 PR+ only42035%440650%4,82649%
 TNBC26422%5897%8539%
 Unknown21418%194522%2,15922%
Stagea
 Stage 0494%173620%1,78518%
 Stage I21918%326037%3,47935%
 Stage II35129%257629%2,92729%
 Stage III26222%5486%8108%
 Stage IV25221%891%3413%
 Unknown797%5246%6036%
Gradea
 Grade I1018%171420%181518%
 Grade II32126%345140%377238%
 Grade III52743%203123%255826%
 Grade IV434%5016%5445%
 Unknown22018%103612%125613%
Ductal tumora1,03385%745985%849285%
Behavior of tumora
In situ625%205824%212021%
 Malignant115095%667576%782579%
 Bilaterala116496%858698%975098%
 Lymph vascular invasiona353%<100.1%410.4%
Comorbiditiesa
 Myocardial infarction<100.7%170.2%260.3%
 Congestive heart failure151.2%110.1%260.3%
 Peripheral vascular disease262%280.3%540.5%
 Cerebrovascular disease343%660.8%1001%
 Dementia<100.1%<100.01%<100.02%
 Chronic obstructive pulmonary disease746%2152%2893%
 Rheumatic disorders<100.6%150.2%220.2%
 Peptic ulcer disease<100.0%<100.01%<100.01%
 Liver, mild<100.7%<100.08%160.2%
 Liver, severe<100.5%<100.02%<100.08%
 Diabetes (uncomplicated)252%440.5%690.7%
 Diabetes (complicated)<100.7%<100.09%170.2%
 Plegia<100.0%<100.03%<100.03%
 Renal disease171.4%<100.1%260.3%
 Malignancy28624%158418%187019%
 Metastasis615%571%1181%
 HIV<100.4%100.1%150.2%
Charlson Comorbidity Scorea2.42.61.11.41.21.7

aAt: time of diagnosis; ER: estrogen receptor; PR: progesterone receptor; HER2: human epidermal growth factor receptor 2; TNBC: triple-negative breast cancer.

Patients who died within (cases) or survived (controls) 5 years of breast cancer diagnosis aAt: time of diagnosis; ER: estrogen receptor; PR: progesterone receptor; HER2: human epidermal growth factor receptor 2; TNBC: triple-negative breast cancer. Small values are stated as “<10” for privacy purposes, in accordance with California Cancer Registry guidelines.

Main factors associated with survival from EHR

Comparing cases against controls, our logistic models, using 5-year mortality as the binary response, achieved satisfactory classification performance (90% area under the curve, 40% sensitivity, 99% specificity) on a 10% hold-out validation set. Consistent with well-established breast cancer prognostic factors, the main factors associated with lower mortality identified in our model (Figure 2 and Supplementary Table S3) are lower stage and living in a neighborhood or census block in the top 20% of socioeconomic status in California. In contrast, factors such as advanced stage, older age at diagnosis, and the triple-negative breast cancer subtype were associated with higher mortality.
Figure 2.

Odds ratios of factors (excluding pairwise interactions) most associated with 5-year mortality (see also Supplementary Table S3).

Odds ratios of factors (excluding pairwise interactions) most associated with 5-year mortality (see also Supplementary Table S3).

Synergistic interactions from EHRs

Variables that consistently formed synergistic interactions associated with lower mortality (nodes with mostly blue edges, Figure 3) coincided with the main effects associated with lower mortality described above (eg, being diagnosed at a lower stage, having a higher socioeconomic status). In contrast, variables that consistently formed synergistic interactions associated with higher mortality (nodes with mostly red edges, Figure 3) coincided with the main effects associated with higher mortality (eg, older age at diagnosis, advanced stage). In addition to patient and tumor characteristics that synergistically influence mortality, we identified drug pairs that are synergistically associated with lower mortality (Table 2 and Figure 3, bold blue edges).
Figure 3.

Variables (nodes) that synergistically interact such that they are associated with lower mortality (blue edges) or higher mortality (red edges, also see Table 2). Variable nodes that tend to have synergistically beneficial interactions (blue edges) also tend to be factors associated with lower mortality (eg, Stage I), while those with synergistically risky interactions (red) tend to be risk factors on their own (eg, Stage IV). Nodes are grouped together (eg, by categorical level, ATC class) to facilitate visual comparison within a group (eg, Stages I and II have many synergistically beneficial interactions while Stages III and IV have many synergistically risky interactions). Case studies described in the Discussion section are highlighted with thicker edges.

Table 2.

Synergistic drug pairs discovered

OverallER or PR without HER2 expressionHER2 expressionTNBC

Nasal_preparations + lactate

hormone antagonists and related agents + vitamins

anti-inflammatory and antirheumatic products +  lipid_modifying_agents

drugs_for_obstructive_ airway_diseases +  lipid_modifying_agents

hormone_antagonists_and_ related_agents + anti- inflammatory _and_anti-rheumatic_products

Tretinoin + epinephrine

ondansetron + pantoprazole

tazobactam + lansoprazole

lidocaine + atropine

hydrocodone + ondansetron

anti-estrogens + ondansetron

aromatase_inhibitors + granisetron

mupirocin + ergocalciferol

naloxone + heparin

glucose + aspirin

meperidine + glucose

hydrocodone + glucose

anti-metabolites + glucose

cephalexin + hydrochlorothiazide

fentanyl + hydrochlorothiazide

nitrofurantoin + lisinopril

celecoxib + losartan

tretinoin + clobetasol

meperidine + dexamethasone

fentanyl + dexamethasone

tretinoin + phenazopyridine

letrozole + amoxicillin

hydrocodone + amoxicillin

anti-metabolites + cephalexin

naloxone + cefazolin

celecoxib + tretinoin

glycopyrrolate + tretinoin

neostigmine + propofol

hydrocodone + bupivacaine

lidocaine + bupivacaine

naloxone + fentanyl

iso_sulfan_blue + fentanyl

acetic_acid_derivatives_and_related_ substances + fentanyl

ciprofloxacin + guaifenesin

naloxone + hydrocodone

nitrogen_mustard_analogues +  hydrocodone

lactate + simethicone

docusate + simethicone

Heparin + famotidine

rocuronium + aprepitant

acetaminophen + prednisone

venlafaxine + atorvastatin

morphine + promethazine

trazodone + promethazine

hydrocodone + promethazine

cefazolin + dexamethasone

immunostimulants +  dexamethasone

colony_stimulating_factors +  dexamethasone

escitalopram + clindamycin

venlafaxine + clindamycin

propofol + estradiol

venlafaxine + estradiol

rocuronium + estradiol

propofol + phenazopyridine

bupivacaine + phenazopyridine

escitalopram + phenazopyridine

mometasone + phenazopyridine

neostigmine + phenazopyridine

rocuronium + phenazopyridine

acetaminophen + doxorubicin

escitalopram + propofol

venlafaxine + propofol

mometasone + propofol

escitalopram + bupivacaine

venlafaxine + bupivacaine

olopatadine + bupivacaine

mometasone + bupivacaine

desonide + bupivacaine

neostigmine + bupivacaine

anti-estrogens + acetaminophen

venlafaxine + escitalopram

neostigmine + escitalopram

olopatadine + venlafaxine

mometasone + venlafaxine

desonide + venlafaxine

neostigmine + venlafaxine

rocuronium + olopatadine

neostigmine + mometasone

neostigmine + desonide

Fentanyl + metoclopramide

metronidazole + hydrochlorothiazide

naproxen + simvastatin

valacyclovir + simvastatin

venlafaxine + simvastatin

fluticasone + simvastatin

rocuronium + simvastatin

doxorubicin + dexamethasone

estradiol + naproxen

valacyclovir + naproxen

fluticasone + naproxen

fluticasone + estradiol

anti-inflammatory and anti-rheumatic_ products + sulfamethoxazole

fluticasone + azithromycin

venlafaxine + valacyclovir

mometasone + valacyclovir

rocuronium + valacyclovir

hydrocodone + acetaminophen

fluticasone + venlafaxine

thyroxine + mometasone

rocuronium + mometasone

rocuronium + fluticasone

aromatase_inhibitors + rocuronium

Variables (nodes) that synergistically interact such that they are associated with lower mortality (blue edges) or higher mortality (red edges, also see Table 2). Variable nodes that tend to have synergistically beneficial interactions (blue edges) also tend to be factors associated with lower mortality (eg, Stage I), while those with synergistically risky interactions (red) tend to be risk factors on their own (eg, Stage IV). Nodes are grouped together (eg, by categorical level, ATC class) to facilitate visual comparison within a group (eg, Stages I and II have many synergistically beneficial interactions while Stages III and IV have many synergistically risky interactions). Case studies described in the Discussion section are highlighted with thicker edges. Synergistic drug pairs discovered Nasal_preparations + lactate hormone antagonists and related agents + vitamins anti-inflammatory and antirheumatic products +  lipid_modifying_agents drugs_for_obstructive_ airway_diseases +  lipid_modifying_agents hormone_antagonists_and_ related_agents + anti- inflammatory _and_anti-rheumatic_products Tretinoin + epinephrine ondansetron + pantoprazole tazobactam + lansoprazole lidocaine + atropine hydrocodone + ondansetron anti-estrogens + ondansetron aromatase_inhibitors + granisetron mupirocin + ergocalciferol naloxone + heparin glucose + aspirin meperidine + glucose hydrocodone + glucose anti-metabolites + glucose cephalexin + hydrochlorothiazide fentanyl + hydrochlorothiazide nitrofurantoin + lisinopril celecoxib + losartan tretinoin + clobetasol meperidine + dexamethasone fentanyl + dexamethasone tretinoin + phenazopyridine letrozole + amoxicillin hydrocodone + amoxicillin anti-metabolites + cephalexin naloxone + cefazolin celecoxib + tretinoin glycopyrrolate + tretinoin neostigmine + propofol hydrocodone + bupivacaine lidocaine + bupivacaine naloxone + fentanyl iso_sulfan_blue + fentanyl acetic_acid_derivatives_and_related_ substances + fentanyl ciprofloxacin + guaifenesin naloxone + hydrocodone nitrogen_mustard_analogues +  hydrocodone lactate + simethicone docusate + simethicone Heparin + famotidine rocuronium + aprepitant acetaminophen + prednisone venlafaxine + atorvastatin morphine + promethazine trazodone + promethazine hydrocodone + promethazine cefazolin + dexamethasone immunostimulants +  dexamethasone colony_stimulating_factors +  dexamethasone escitalopram + clindamycin venlafaxine + clindamycin propofol + estradiol venlafaxine + estradiol rocuronium + estradiol propofol + phenazopyridine bupivacaine + phenazopyridine escitalopram + phenazopyridine mometasone + phenazopyridine neostigmine + phenazopyridine rocuronium + phenazopyridine acetaminophen + doxorubicin escitalopram + propofol venlafaxine + propofol mometasone + propofol escitalopram + bupivacaine venlafaxine + bupivacaine olopatadine + bupivacaine mometasone + bupivacaine desonide + bupivacaine neostigmine + bupivacaine anti-estrogens + acetaminophen venlafaxine + escitalopram neostigmine + escitalopram olopatadine + venlafaxine mometasone + venlafaxine desonide + venlafaxine neostigmine + venlafaxine rocuronium + olopatadine neostigmine + mometasone neostigmine + desonide Fentanyl + metoclopramide metronidazole + hydrochlorothiazide naproxen + simvastatin valacyclovir + simvastatin venlafaxine + simvastatin fluticasone + simvastatin rocuronium + simvastatin doxorubicin + dexamethasone estradiol + naproxen valacyclovir + naproxen fluticasone + naproxen fluticasone + estradiol anti-inflammatory and anti-rheumatic_ products + sulfamethoxazole fluticasone + azithromycin venlafaxine + valacyclovir mometasone + valacyclovir rocuronium + valacyclovir hydrocodone + acetaminophen fluticasone + venlafaxine thyroxine + mometasone rocuronium + mometasone rocuronium + fluticasone aromatase_inhibitors + rocuronium

Subgroup analysis by molecular subtype

We analyzed synergistic variable interactions in patients stratified by molecular subtype given their varied prognoses and drug utilizations (Table 2 and Supplementary Figure S1). In the estrogen receptor or progesterone receptor positive group, our model identified synergistic pairs of antiestrogens or aromatase inhibitors with antiemetics (eg, ondansetron, granisetron), possibly due to the increased tolerance afforded by the antiemetics. Among human epidermal growth factor receptor 2–positive patients, who often have worse prognoses than other hormone-sensitive subtypes, several synergistic pairs included phenazopyridine, which might have been prescribed to relieve urethral discomfort from aggressive estrogen suppression. Rediscovering such coprescription patterns known to alleviate side effects suggests that our approach can uncover beneficial combinations. Note that while these combinations are associated with reduced mortality, causality cannot be determined. Several synergistic interactions were replicated in the molecular subtypes and the overall cohort. Lipid modifiers (C10, including statins, eg, simvastatin) paired with either anti-inflammatory agents (M01, which includes nonsteroidal anti-inflammatory drugs [NSAIDs], eg, naproxen) or drugs for obstructive airways (R03, eg, fluticasone) reduced mortality both in the overall cohort and the TNBC subtype group.

Synergistic drug pairs from gene expression data

In an orthogonal approach, we identified 8966 differentially expressed proteins from breast cancer gene expression data. These proteins were than associated with 7686 drugs via a DPI database (Drugbank) These data were then used to construct a molecular network. From this network, a synergistic breast cancer association score was calculated for all possible pairs of drugs in the DPI data; these were then ranked in descending order (see the shaded histograms scaled to the right axis in Figures 4A and B).
Figure 4.

Enrichment analysis of EHR-based synergistic drug class pairs (A) anti-inflammatories/antirheumatics with lipid modifiers, (B) anti-inflammatories/antirheumatics with hormone antagonists, and (C) lipid modifiers and drugs for obstructed airways among gene expression–based synergistic drug pairs. All possible pairs of drugs from DrugBank v. 4.0 were scored on their association with genes differentially expressed in breast cancer (shaded area). A GSEA-based analysis was then performed to score the enrichment of pairs of drugs derived from the respective EHR-based classes (derived drug pairs represented by black vertical lines, running enrichment represented by red bold line) and compared to a randomly sampled null distribution (10 000 iterations) to assess significance and fold enrichment.

Enrichment analysis of EHR-based synergistic drug class pairs (A) anti-inflammatories/antirheumatics with lipid modifiers, (B) anti-inflammatories/antirheumatics with hormone antagonists, and (C) lipid modifiers and drugs for obstructed airways among gene expression–based synergistic drug pairs. All possible pairs of drugs from DrugBank v. 4.0 were scored on their association with genes differentially expressed in breast cancer (shaded area). A GSEA-based analysis was then performed to score the enrichment of pairs of drugs derived from the respective EHR-based classes (derived drug pairs represented by black vertical lines, running enrichment represented by red bold line) and compared to a randomly sampled null distribution (10 000 iterations) to assess significance and fold enrichment. Next we determined whether this gene expression approach identified the same synergistic drug classes as the EHR (Table 2). To do so, we used a GSEA-based enrichment method to quantify the enrichment of EHR synergistic classes among gene expression synergistic pairs.Figure 4A shows that the 528 drug pairs derived from one pair of synergistic classes identified using EHR (anti-inflammatories/antirheumatics paired with lipid modifiers, black bars) tended to also be high-scoring gene expression–based drug pairs (shaded histograms with scale on the right axis). Specifically, drug pairs derived from these EHR-based classes were 4.4 times more enriched among high-scoring gene expression–based pairs compared to 10 000 randomly selected sets of 528 drug pairs (P < 0.0001). In Figure 4B, anti-inflammatories/antirheumatics paired with hormone antagonists also received high gene expression–based scores, driving a slight enrichment (about 1.1-fold over random sets of 396 drug pairs, P = 0.164). Finally, although many drug pairs derived from the synergistic EHR classes of lipid modifiers with drugs for obstructive airways also scored high based on gene expression, a large number of drug pairs corresponding to these classes were not synergistic based on gene expression, resulting in no enrichment (NES < 1, Figure 4C). Therefore, for 2 out of 3 EHR-based synergistic drug class pairs, NES > 1 suggests that these pairs also tended to be high scoring based on breast cancer gene expression association. Furthermore, the molecular networks comprising gene expression, drug-protein, and protein-protein interactions used to derive gene expression–based scores provide mechanistic hypotheses for the observed synergism of the EHR-based pairs. These pairs are also supported by preclinical and epidemiologic studies. The third drug class pair discovered using EHR (lipid modifiers and drugs for obstructive airways) was not enriched among gene expression–based pairs. As this study focuses on synergistically beneficial interactions, we will only discuss in detail the 2 synergistically beneficial drug class pairs uncovered by both EHR and gene expression data.

Synergistically beneficial pair 1: anti-inflammatory agents and lipid modifiers

Drugs belonging to the first pair of synergistic drug classes identified using EHR data (anti-inflammatory agents, especially NSAIDs, paired with lipid modifiers, especially statins) have been proposed as a general regimen for chemoprevention. While no benefit specifically against breast cancer has been reported for this combination, there is a growing body of epidemiological evidence supporting a synergistic anticancer benefit of NSAIDs with statins, especially against colorectal and prostate cancer. In addition, preclinical studies suggest plausible anticancer mechanisms of these drugs individually, with NSAIDs functioning as aromatase inhibitors and the inhibitory effects of statins on breast cancer cell growth and proliferation.,, It has been suggested that, in combination, NSAIDs and statins inhibit cell growth and promote apoptosis, possibly by inducing the tumor suppressor RhoB and inhibiting the Akt pathway, key targets in tumorigenesis., Using our breast cancer network model, we identified frequent protein interactions with this pair of drug classes that corroborate this epidemiological and preclinical evidence (Supplementary Table S4). Both transcription factor AP-1 (which interacts with several anti-inflammatories/antirheumatics and sequence CCAAT/enhancer-binding protein beta (which interacts with several lipid modifiers) influence breast cancer cell senescence and apoptosis., A drug combination that targets these proteins simultaneously may therefore elicit stronger effects on cell death or proliferation. Drug synergism could also be achieved when one drug influences the efficacy of a second drug. For example, expression levels of insulin receptor substrate 1 (which interacts with several anti-inflammatory/antirheumatic drugs, can predict patient responses to chemotherapeutic or hormonal breast cancer therapies., Inhibiting AP-1 also potentiates hormonal therapies. These associations suggest that proteins targeted by anti-inflammatory agents may participate in synergistic combinations with hormone antagonists (see below) and possibly other anticancer therapies.

Synergistically protective pair 2: anti-inflammatory agents and anticancer hormone antagonists

While anti-inflammatory drugs are known to help patients better tolerate hormone therapy’s undesirable side effects until endocrine responsiveness is elicited, there is evidence in the literature suggesting joint anticancer action between anti-inflammatory agents and hormone antagonists. Many anti-inflammatories inhibit cyclooxygenase-2, which in turn inhibits aromatase that is otherwise required for estrogen production., By combining a cyclooxygenase-2 inhibitor (NSAID or coxib) with a hormone antagonist like an aromatase inhibitor, synergistic regulation of hormone production may halt or slow mammary tumorigenesis., Clinical trials have shown that the combination of celecoxib and exemestane is slightly better or equivalent to exemestane monotherapy., Benefits include longer periods of stable disease (tumor shrinkage or no new lesions) and reduced tumor expression of proliferation-associated genes. However, the increased cardiovascular risk associated with celecoxib has raised concerns about its risk-benefit ratio. Although synergistic pairs of anti-inflammatory/antirheumatic agents and hormone antagonists were only slightly enriched among all synergistic pairs identified based on gene expression, analysis of proteins that frequently interact with drugs in these classes suggests possible molecular mechanisms to explain their observed synergy (Supplementary Table S5). For example, genetic knock-down of caveolin-1, which interacts with several anti-inflammatory drugs, renders breast tumors hypersensitive to estrogen., Simultaneous inhibition of caveolin-1 may therefore enhance the efficacy of antiestrogen therapies. Another protein linked to multiple anti-inflammatory drugs, tristetraprolin, interacts with progesterone, estrogen, and androgen receptors. Reducing protein levels of tristetraprolin in breast cancer cell lines augments hormonal effects on cell growth and proliferation, possibly rendering cells more sensitive to hormone antagonist therapies. These molecular links could support the predicted synergistic efficacy of anti-inflammatory/antirheumatic drugs and hormone antagonists in breast cancer treatment.

Study limitations

We acknowledge that the effect sizes of the synergism discussed above are small (beta coefficients of interaction terms: 0.004–0.05) despite statistical significance (P < 0.05) in EHR and enrichment in molecular data and supporting evidence from the literature. Some of the drug pairs discovered (Figure 3 and Table 2) may represent intentional concurrent usage rather than actual mechanistic synergism. Disambiguating between the 2 is challenging when using observational data, as the intent is not stated. For example, deliberate concurrent use may be in order to overcome resistance to a single therapy or to relieve side effects (eg, venlafaxine to improve tolerance to aromatase inhibitors) or to treat coincidental conditions (eg, venlafaxine for psychological distress, prevalent in 30–50% of breast cancer patients). Nevertheless, one key advantage of our approach of formulating pairwise interaction effects is the simultaneous discovery of multiple interaction effects. This generates multiple hypotheses for in-depth evaluation by drug screening in cell lines and animal models, as well as by subsequent observational studies and clinical trials. Our approach also discovered synergistically adverse drug pairs (ie, adverse drug-drug interactions, Figure 3), which we did not discuss in detail, given our focus here on the synergistically beneficial ones that could potentially be repurposed. One disadvantage is the risk of false discoveries, especially when correlated pairs could be falsely detected as interaction effects. To minimize false discoveries, we bootstrapped samples and reran our models 500 times to empirically generate 95% CI, in an attempt to address the variance associated with the beta estimates but not necessarily the bias inherent in penalized regressions such as glinternet. While there are sophisticated bootstrapping procedures designed to reduce the bias, estimating the CIs for penalized regression remains an active area of research. The purported breast cancer benefit of metformin could also be used as a positive control for testing our method on monotherapy drugs.,, Metformin, on its own, was not associated with lower mortality. We did, however, find a borderline benefit (Hazard ratio: 0.86 [0.52–1.00]) in a separate lasso Cox regression survival analysis without pairwise interactions (Supplementary Table S6). While a Cox regression model with the same overlap group lasso (as used in glinternet here) was an attractive alternative, the current implementation of glinternet supports only logistic regression. A survival analysis setup with time-varying exposures could also account for the temporal sequence of the drug exposures, which was not considered here. However, Cox regression models are prone to time-dependent biases (eg, immortal bias), and some studies have indeed questioned whether metformin’s benefit could have arisen from such biases. Patients might also have received care, including the drugs studied, outside of Stanford Hospital and PAMF. Such instances and other supporting information, undocumented in our data source, may result in unmeasured confounding, a known limitation of EHR-based studies. Nevertheless, we tried to obtain the most comprehensive clinical details possible in our choice of Oncoshare, which links EHRs from Stanford University and the neighboring PAMF community health service with the California Surveillance, Epidemiology, and End Results Program registry and other supporting services such as Oncotype DX., Another limitation is the use of predefined drug classes, which may be overly broad and heterogeneous (eg, drugs for obstructive airways, R03). We repeated the analysis (Supplementary Table S7), aggregating drugs to ATC drug classes of various granularities and pair 1 between anti-inflammatory agents and, in particular, aromatase inhibitors was replicated. On the other hand, overly specific subclasses containing rarely used drugs may also pose problems, as we did not always observe synergism among the more granular subclasses (Supplementary Table S7). Our molecular analysis was limited primarily by gene expression and drug data availability. In terms of gene expression, only 14 datasets were of sufficient quality and contained appropriate case and control samples for differential gene expression analysis. This precluded us from performing separate meta-analyses of breast cancer molecular subtypes. Information on drugs is similarly limited to those with reported protein interactions, which may be additionally restricted to anticipated interactions based on a drug’s class and approved indications. For example, many specific pairs of EHR-based synergistic drugs lack reported protein interaction information in DrugBank, but have protein interaction information in DrugBank at the drug class level. Although DrugBank is one of several widely used sources of drug information, alternate sources could have been explored. Similarly, protein-protein interactions may not be fully documented or validated, or may vary in their biological relevance (eg, some interactions were discovered in yeast 2-hybrid assays that are less relevant to breast cancer pathology). Despite these limitations, gene expression–based ranking of synergistic drug pairs provides an alternative data source to validate pairs discovered from EHRs. The consistency of the results from multiple data sources and analysis methods should increase the robustness of our findings.

Conclusions and Implications

This is a proof-of-concept study demonstrating that searching for statistical interactions can discover drug pairs that moderate each other’s effects. Such an approach has also been used to discover epistatic interactions among genes., Much of the published literature on drug interactions has focused on adverse drug-drug interactions instead of potentially beneficial interactions for drug repurposing. Here, we report 3 synergistically therapeutic pairs of drug classes associated with lower 5-year mortality in patients with breast cancer. Of the 3 synergistically protective pairs, 2 were supported by analysis of gene expression data of breast cancer patients, biological plausibility, preclinical models, and epidemiologic evidence in the literature. The glinternet analysis of EHRs we presented is scalable to drug combinations of 2 or more. As demonstrated, coupling with orthogonal analysis of gene expression data can corroborate the EHR-based findings and reveal protein interactions that may relate to the mechanism driving drug synergism. This study further demonstrates the translational potential of existing data sources such as real-world patient EHRs and gene expression databases. The multidrug combinations uncovered can be computationally prioritized to help direct preclinical research and, if promising, undergo clinical trial validation, repurposing, and optimizing of existing drugs for maximum therapeutic benefit.

Funding

This work was supported in part by National Library of Medicine grant R01 LM011369 (to NS), National Institute of General Medical Sciences grant R01 GM101430 (to NS), National Science Foundation grant DMS-1407548 (to TH), and National Institutes of Health grant RO1-EB001988-15 (to TH). This work was also supported by the Breast Cancer Research Foundation (to AK, GS, NS); the Susan and Richard Levy Gift Fund (to AK); the Suzanne Pride Bryan Fund for Breast Cancer Research (to AK); the Regents of the University of California’s California Breast Cancer Research Program (16OB-0149 and 19IB-0124) (to AK); and the Stanford University Developmental Research Fund (to AK). The project was supported by National Institutes of Health Clinical and Translational Science award number UL1 RR025744. The collection of cancer incidence data used in this study was supported by the California Department of Health Services as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103 885; the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program under contracts HHSN261201000140C (to the Cancer Prevention Institute of California), HHSN261201000035C (to the University of Southern California), and HHSN261201000034C (to the Public Health Institute); and the Centers for Disease Control and Prevention’s National Program of Cancer Registries, under agreement 1U58 DP000807-01 (to the Public Health Institute). The ideas and opinions expressed herein are those of the authors, and endorsement by the University or State of California, the California Department of Health Services, the National Cancer Institute, or the Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests

The authors have declared that no competing interests exist.

Contributions

Conceived and designed the experiments: YL, NS, AK, GS. Analyzed the data: YL, AD, WC. Contributed data: TS, PK, SG, SW, CT, PY, AD. Contributed tools: MD, ML, TH, MM, AD, AR, CF. Wrote the paper: YL, NS, AK, AD, ES, MD.

Data Sharing

Availability of patient data is subject to the Institutional Data Access/Ethics Committees of Stanford University, Palo Alto Medical Foundation, and the California Cancer Registry for researchers. Supplementary materials include the following: drugs and their ATC drug classes considered for analysis (Supplementary Table S1), gene expression datasets (Supplementary Table S2), main effects and interactions effects significantly associated with 5-year mortality and their odds ratios (Supplementary Table S3), breast cancer–related proteins that most frequently interact with lipid/NSAID pairs (Supplementary Table S4), breast cancer–related proteins that most frequently interact with NSAID/hormone antagonist pairs (Supplementary Table S5), results of Cox regression survival analysis (Supplementary Table S6), and results of alternative analysis with more granular ATC drug classes (Supplementary Table S7), drug utilization profiles by molecular subtypes (Supplementary Figure S1).

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online. Click here for additional data file.
  65 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

Review 2.  Drug repositioning: identifying and developing new uses for existing drugs.

Authors:  Ted T Ashburn; Karl B Thor
Journal:  Nat Rev Drug Discov       Date:  2004-08       Impact factor: 84.694

Review 3.  Computational drug repositioning: from data to therapeutics.

Authors:  M R Hurle; L Yang; Q Xie; D K Rajpal; P Sanseau; P Agarwal
Journal:  Clin Pharmacol Ther       Date:  2013-01-15       Impact factor: 6.875

Review 4.  Breast cancer statistics, 2013.

Authors:  Carol DeSantis; Jiemin Ma; Leah Bryan; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2013-10-01       Impact factor: 508.702

5.  Inducible overexpression of c-Jun in MCF7 cells causes resistance to vinblastine via inhibition of drug-induced apoptosis and senescence at a step subsequent to mitotic arrest.

Authors:  Lingling Duan; Kristen Sterba; Sergey Kolomeichuk; Heetae Kim; Powel H Brown; Timothy C Chambers
Journal:  Biochem Pharmacol       Date:  2006-10-29       Impact factor: 5.858

Review 6.  Computational and experimental advances in drug repositioning for accelerated therapeutic stratification.

Authors:  Khader Shameer; Ben Readhead; Joel T Dudley
Journal:  Curr Top Med Chem       Date:  2015       Impact factor: 3.295

Review 7.  The role of COX-2 inhibition in breast cancer treatment and prevention.

Authors:  Banu Arun; Paul Goss
Journal:  Semin Oncol       Date:  2004-04       Impact factor: 4.929

8.  IRS1 is highly expressed in localized breast tumors and regulates the sensitivity of breast cancer cells to chemotherapy, while IRS2 is highly expressed in invasive breast tumors.

Authors:  Holly A Porter; Anthony Perry; Chris Kingsley; Nhan L Tran; Achsah D Keegan
Journal:  Cancer Lett       Date:  2013-04-02       Impact factor: 8.679

9.  A network inference method for large-scale unsupervised identification of novel drug-drug interactions.

Authors:  Roger Guimerà; Marta Sales-Pardo
Journal:  PLoS Comput Biol       Date:  2013-12-05       Impact factor: 4.475

10.  Association between metformin therapy and mortality after breast cancer: a population-based study.

Authors:  Iliana C Lega; Peter C Austin; Andrea Gruneir; Pamela J Goodwin; Paula A Rochon; Lorraine L Lipscombe
Journal:  Diabetes Care       Date:  2013-04-30       Impact factor: 19.112

View more
  4 in total

1.  PRODeepSyn: predicting anticancer synergistic drug combinations by embedding cell lines with protein-protein interaction network.

Authors:  Xiaowen Wang; Hongming Zhu; Yizhi Jiang; Yulong Li; Chen Tang; Xiaohan Chen; Yunjie Li; Qi Liu; Qin Liu
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

Review 2.  Overcoming cancer therapeutic bottleneck by drug repurposing.

Authors:  Zhe Zhang; Li Zhou; Na Xie; Edouard C Nice; Tao Zhang; Yongping Cui; Canhua Huang
Journal:  Signal Transduct Target Ther       Date:  2020-07-02

3.  The anatomy of phenotype ontologies: principles, properties and applications.

Authors:  Georgios V Gkoutos; Paul N Schofield; Robert Hoehndorf
Journal:  Brief Bioinform       Date:  2018-09-28       Impact factor: 11.622

4.  Drug Repositioning and Subgroup Discovery for Precision Medicine Implementation in Triple Negative Breast Cancer.

Authors:  Zainab Al-Taie; Mark Hannink; Jonathan Mitchem; Christos Papageorgiou; Chi-Ren Shyu
Journal:  Cancers (Basel)       Date:  2021-12-14       Impact factor: 6.639

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.