Literature DB >> 29263829

Mining the transcriptome for rare disease therapies: a comparison of the efficiencies of two data mining approaches and a targeted cell-based drug screen.

A J Mears1, S C Schock1, J Hadwen1,2, S Putos1, D Dyment1,3, K M Boycott1,3, Alex MacKenzie1,2.   

Abstract

Most monogenic diseases can be viewed as conditions caused by dysregulated protein activity; therefore, drugs can be used to modulate gene expression, and thus protein level, possibly conferring clinical benefit. When considering repurposing drugs for loss of function diseases, there are three classes of genetic disease amenable to an increase of function; haploinsufficient dominant diseases, those secondary to hypomorphic recessive alleles, and conditions with rescuing paralogs. This therapeutic model then brings the questions: how frequently do such clinically useful drug-gene interactions occur and what is the most rapid and efficient route by which to identify them. Here we compare three approaches: (1) mining of pre-existing system-wide transcriptomal datasets such as Connectivity Map; (2) utilization of a proprietary causal reasoning engine knowledge base; and, (3) a targeted drug screen using clinically accepted agents tested against normal human fibroblasts. We have determined the validation rate of these approaches for 76 diseases (i.e., in vitro fibroblast mRNA increase); for the Connectivity Map, approximately 5% of tested putative drug-gene interactions validated, for causal reasoning engine knowledge base the rate was 10%, and for the targeted drug screen 9%. The degree of overlap between these methodologies was low suggesting they are complementary not redundant approaches to identify putative drug-gene interactions. Although the validation rate was low, a number of drug-gene interactions were successfully identified and are now being investigated for protein induction and in vivo effect. This analysis establishes potentially valuable therapeutic leads as well as useful benchmarks for the thousands of currently untreatable rare genetic conditions.

Entities:  

Year:  2017        PMID: 29263829      PMCID: PMC5677978          DOI: 10.1038/s41525-017-0018-3

Source DB:  PubMed          Journal:  NPJ Genom Med        ISSN: 2056-7944            Impact factor:   8.617


Introduction

The estimated 7000 monogenic diseases although individually rare are major contributors to human morbidity and mortality collectively affecting approximately 2% of the global population.[1] For example, rare diseases account for nearly twice the aggregate number of years that lives are shortened by diabetes and almost four times those due to infections.[2] Rare genetic diseases thus represent a dramatic unmet diagnostic and therapeutic need. Roughly half of the 7000 rare monogenic diseases have been genetically characterized.[1, 3] The remaining diseases are being solved largely due to the work of national and international consortia such as Care4Rare Canada (http://care4rare.ca/) and the Centers for Mendelian Genomics (3) with collaborative platforms such as the Matchmaker Exchange (http://www.matchmakerexchange.org/.[4] However, the development of rare disease treatment lags far behind the rate of rare disease diagnosis; approximately 500 therapies have been approved for rare diseases (Europe and USA combined[5]). Moreover, the rate of drug development for rare diseases is slow, due to factors such as extreme disease rarity and obscure disease pathogenesis. One alternative to the costly and time-intensive drug discovery process is to repurpose clinically approved compounds for treatment of rare diseases.[6] Often, rare monogenic diseases can be viewed as dosage problems caused by supra-physiologic or infra-physiologic levels of functional gene-product, usually altering protein activity. Under this model, moderating the dosage problem by modulating mRNA and thus protein levels is a potential strategy to repurpose drugs for rare monogenic diseases. This may involve the upregulation of mutated recessive disease genes encoding proteins with residual enzymatic activity (so called hypomorphic alleles), of genes that functionally recapitulate mutated recessive disease genes (rescuing paralog[7]) or of genes that cause disease when haploinsufficient. Conversely, pathogenically increased gene dosage (e.g., gain of function dominant mutations, gene duplication) may be countered by gene downregulation. The gene-dosage therapeutic model relies on the identification of therapeutically relevant drug-gene interactions in which a clinically approved drug modulates mRNA and protein levels. Several approaches to identify such drug-gene interactions exist, yet assessment of and comparison between these techniques have not been conducted. Extensive pharmacologic transcriptome datasets (nearly 4000 representing approximately 2 million samples) are accessible through the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/ [8]). One of the largest sets of data is provided by Connectivity Map (CMAP) which generated microarray transcriptome profiles for 3 different cancer cell lines (leukemia (HL60), breast (MCF7), and prostate (PC3)) treated with over 1200 drugs.[9, 10] Another bioinformatic-based approach to identify putative drug-gene interactions is literature-mining; Pfizer has developed a proprietary advanced algorithm for text-mining called the Causal Reasoning Engine (CRE),[11, 12] which operates against a knowledge base integrating causal interactions from several sources. We have employed the latter two approaches as well as an in-house drug screen on normal human fibroblasts using a curated publicly available library of FDA approved compounds to establish the frequency with which such drug-gene interactions can be first identified and then validated (modulation of mRNA in cell culture), as well as the degree of overlap between these three approaches.

Results

Identifying potential therapeutic targets

We set out to identify a subset of rare diseases that were potentially amenable to mRNA (and by extension protein) modulation. We reviewed the rare disease databases OMIM [3] and Orphanet [13] to identify diseases with a potential mRNA target: (1) autosomal dominant rare diseases caused by haploinsufficiency; (2) recessive diseases where there is evidence that hypomorphic alleles encode proteins with residual function that localize normally in the cell; and (3) diseases where there exists another gene that may functionally recapitulate the disease gene’s function (rescuing paralog). Next, an expert clinical group (KMB, DD, and clinical members of the FORGE Canada Consortium, [14]) further reduced this list by looking for: (1) the existence of a pre-symptomatic period and/or the possibility of reversibility; (2) degree of unmet medical need; and (3) known Canadian patients affected with the disorder to arrive at a list of 76 diseases for study (Table 1).
Table 1

The 76 diseases and associated 75 genes analyzed in this study (classified as haploinsufficient; paralog rescue, or hypomorph categories)

GeneDisorderOMIM#Fibroblast screen?CMAP data?CNS disease?
HAPLOINSUFFICIENT
AFG3L2Spinocerebellar ataxia type 28610246yesyesyes
ATP1A2Familial hemiplegic migraine type 2602481noyesyes
COL6A1Bethlem myopathy158810yesyesno
CSF1RHereditary diffuse leukoencephalopathy with spheroids221820noyesyes
GRNFrontotemporal lobar degeneration with ubiquitin-positive inclusions607485noyesyes
ITPR1Spinocerebellar ataxia type 15606658yesyesyes
MAPTDementia, frontotemporal, with or without parkinsonism600274noyesyes
MPZCharcot-Marie-Tooth disease type 1B118200noyesyes
NKX2-1Chorea, hereditary benign118700noyesyes
OPA1Optic atrophy type 1605290yesyesyes
PMP22Hereditary neuropathy with liability to pressure palsies162500yesyesyes
SCN1ADravet syndrome607208yesyesyes
SLC2A1GLUT1 deficiency612126yesyesyes
SMAD3Familial thoracic aneurysm/ Loeys Dietz syndrome type 3613795yesyesno
SPASTHereditary spastic paraparesis type 4182601yesyesyes
PARALOG RESCUE ( paralog target )
ABCD1 (ABCD2)X-adrenoleukodystrophy300100noyesyes
DDHD2 (DDHD1)Complex hereditary spastic paraplegia (SPG 54)615033yesnoyes
FBN1 (FBN2)Marfan syndrome154700yesyesno
LIMS2 (LIMS1)Limb girdle muscular dystrophy with cardiomyopathy type, 2W616827yesyesno
SLC39A8 (SLC39A14)Congenital disorder of glycosylation, type IIn616721yesyesyes
HYPOMORPH
ACADVLACADVL deficiency (VLCAD)201475noyesno
AGPAT2Lipodystrophy, congenital generalized, type 1608594noyesno
AGXTPrimary hyperoxaluria259900noyesno
ALDH18A1Cutis laxa, autosomal recessive, type IIIA219150noyesyes
AMACRApha-methylacyl-CoA racemase deficinecy614307noyesyes
ARSAMetachromatic leukodystrophy250100yesyesyes
ASAH1Farber disease228000yesyesyes
Spinal muscular atrophy with progressive myoclonic epilepsy159950
ASLArgininosuccinic aciduria207900noyesyes
ASPACanavan disease271900noyesyes
ATP7AOccipital horn syndrome304150yesyesno
ATP7BWilson disease277900yesyesyes
BCKDHAMaple syrup urine disease (intermediate), type Ia248600yesyesyes
BCKDHBMaple Syrup urine disease (Intermediate), type Ib248600yesyesyes
BSCL2Lipodystrophy, congenital generalized, type 2269700yesyesno
CLN3Ceroid lipofuscinosis type 3 (Batten disease)204200yesyesyes
CPT2Carnitine palmitoyltransferase II deficiency255110yesyesno
CTSAGalactosialidosis256540yesyesyes
DDHD2Complex hereditary spastic paraplegia (SPG 54)615033noyesyes
EIF2B5Central hypomyelination and vanishing white matter disease603896yesyesyes
ETFAGlutaric acidemia type 2231680yesnono
FHFumarase deficiency606812yesyesyes
FKRPLimb girdle muscular dystrophy type 5C607155noyesno
GAAGlycogen storage disease type 2/Pompe232300yesyesno
GALCKrabbe disease245200yesyesyes
GALNSMucopolysaccharidosis IVA612222yesyesno
GALTGalactosemia230400yesyesno
GBE1Glycogen storage disease type IV232500yesyesno
GLB1Mucopolysaccharidosis type IVB (Morquio)253010yesyesno
GM1-gangliosidoses Type 3230650
GUSBMucopolysaccharidoses VII253220yesyesyes
HARSUsher syndrome614504yesyesyes
HEXATay Sachs disease272800yesyesyes
HEXBSandhoff disease268800yesyesyes
HPRT1Kelley-Seegmiller syndrome300323yesyesno
HSD11B2Mineralocorticoid Excess218030noyesno
HSD17B4D-bifunctional protein deficiency261515yesyesyes
IDSMucopolysaccharidoses II309900yesyesyes
IDUAMucopolysaccharidoses I607014noyesyes
MAN2B1Alpha-mannosidosis type I248500yesyesyes
MUTMethylmalonic aciduria, mut type251000yesyesyes
NEU1Sialidosis type 1256550yesyesyes
OTCOrnithine transcarbamylase deficiency300461noyesyes
PHYHAdult Refsum disease266500yesyesyes
PLP1PLP1-related disorders312080noyesyes
PMM2Congenital disorder of glycosylation type 1C212065yesyesyes
POLR3ALeukodystrophy, hypomyelinating, 7607694yesnoyes
PPT1Ceroid lipofuscinosis type1600722yesyesyes
SACSARSACS270550yesyesyes
SBDSShwachman-Bodian-Diamond syndrome260400yesnono
SCARB2Action myoclonus renal failure syndrome254900yesyesyes
SGSHMucopolysaccharidoses III252900noyesyes
SLC16A2Allan-Herndon-Dudley syndrome300523noyesyes
SLC52A2Brown-Vialetto-VanLaere Sensory neuropathy614707noyesyes
SLC6A8Creatine transporter deficiency300352noyesyes
SUMF1Multiple sulfatase deficiency272200yesnoyes
TYMPMitochondrial DNA depletion syndrome type 1603041yesyesyes
METABOLIC BYPASS
NEU1Tay Sachs disease272800yesyesyes
The 76 diseases and associated 75 genes analyzed in this study (classified as haploinsufficient; paralog rescue, or hypomorph categories) Of the 76 diseases (associated with 75 different genes), 57 can be caused by hypomorphic mutations (in 55 genes) that might be improved by over-expressing the mutated partially functional protein. We believe that the upregulation of mRNA/protein is a credible treatment modality in such cases given that a modest increase of enzyme activity in a recessive disease may have a profound impact clinically (28). Care was taken to avoid diseases in which the majority of associated mutations resulted in mislocalization or 0% activity. Fifteen of the diseases are haploinsufficient autosomal dominantly inherited conditions caused by a mutation in a single allele; these diseases likely represent the most promising class for pharmacologic gene induction with the remaining non-mutated allele serving as the drug target; we are aware of at least two examples employing this approach in the literature.[15, 16] In this instance, care has been taken to identify conditions where there is good evidence that the causal mutation is null and does not encode a protein with either dominant negative or gain of function properties. In 5 of the 76 diseases, our goal was the over-expression of a paralog; the concept of treating a genetic disease by modulating expression of a gene that functionally compensates (at least partly) the mutation gene is a well-known but comparatively unused therapeutic approach exemplified by SMN2 induction in spinal muscular atrophy (7). In the final instance of Tay Sachs disease, the neuraminidase genes sit on a metabolic bypass pathway which may moderate the clinical severity (29) although recent work has called this interpretation into question (30).

Mining of CMAP for drug conferred gene induction

Broad differences of mRNA “responsiveness” to the drugs were observed when mining the CMAP data; in general drugs such as histone deacetylase inhibitors (HDACi’s) induced the most genes while others had little impact on the transcriptome (Supplementary Table 1). The response of 70 of the 75 target genes (Table 1) to the 149 CMAP drugs also found in the 310 compounds that were later tested in our fibroblast screen was extracted from the CMAP dataset; only two genes failed to be increased by at least one drug. Of the 10,430 possible drug-gene interactions (70 genes × 149 drugs) recorded in CMAP, 970 demonstrated induction (9.3%) in at least one of the three cancer cell lines tested. It should be noted the significant majority of these hits were singletons (89%) and a much smaller fraction displayed induction in two (10%) or all three cell lines (1%; see Table 2; Supplementary Table 2; columns LC-LK).
Table 2

Assessment in the fibroblast screen, presence in Connectivity map database and CNS involvement

GENEDRUGValidation responseCMAPCREBKFScreen
ATP1A2Calcitriol[++]zn/a
Biperiden++xn/a
ITPR1Dexamethasone++xz
Dasatanib[++]z
Ethacrynic acid[++]1
SCN1ABisacodyl[++]x
SLC2A1Buspirone++x
Metformin[++]z
Deferoxamine[++]xx
Fluphenazine[++]x
SMAD3Isotretinoin ++ xx1
Fluphenazine ++ xxx
SPASTNaltrexone[++]xx
Fluphenazine ++ xx
FBN2Dexamethasone ++ z
LIMS1Isotretinoin[++]x
SLC39A14Bisacodyl[++]x
Acetylcysteine+z
AGPAT2Mexiletine+xn/a
AGXTTheophylline ++ xn/a
Trazodone ++ xn/a
AMACRFluphenazine+xxn/a
ASAH1Fluphenazine ++ xx
ATP7ABisacodyl[++]x
BCKDHBChlorpropamide+x
CLN3Bisacodyl[++]x
CTSABisacodyl[++]x
GALCBisacodyl[++]x
Mexiletine+x
GALNSBisacodyl[++]x
GUSBIdarubicin+1
HEXAIdarubicin+1
Bisacodyl[++]x
HEXBBisacodyl[++]x
HPRT1Chlorpropamide+x
HSD11B2Calcitriol+zn/a
IDSBisacodyl[++]x
Naltrexone[++]x
Fluphenazine ++ x
IDUABuspirone+xn/a
MAN2B1Idarubicin+1
Dacarbazine ++ x
NEU1Bisacodyl[++]x
Fluphenazine ++ xxx
SGSHMexiletine+xn/a
TYMPCalcitriol[++]z
CSF1RDexamethasone+zn/a

++ = robust response (>75% induction),+ = modest induction (40–74%)

[++] = induction only observed with high dose of drug (therapeutic dose)

x = response observed in one cell line, xx = two cell lines, xxx = three cell lines

z = identified by causal reasoning engine algorithm

1 = identified by fibroblast screen

n/a = not applicable as screen was not performed on this gene

The remaining blank cells indicate that a gene-drug interaction was not demonstrated/predicted based on CMAP, CREKB or FScreen.

Assessment in the fibroblast screen, presence in Connectivity map database and CNS involvement ++ = robust response (>75% induction),+ = modest induction (40–74%) [++] = induction only observed with high dose of drug (therapeutic dose) x = response observed in one cell line, xx = two cell lines, xxx = three cell lines z = identified by causal reasoning engine algorithm 1 = identified by fibroblast screen n/a = not applicable as screen was not performed on this gene The remaining blank cells indicate that a gene-drug interaction was not demonstrated/predicted based on CMAP, CREKB or FScreen.

Causal reasoning engine knowledge base mining

The medical literature mining algorithm to identify drug-gene relationships was initially set at Edge 2 low stringency settings (see material and methods) but generated so many leads that there was little value in terms of identifying true interactions (data not shown). At the more stringent level focused at exclusively “Edge 1” effects of the 23,250 potential drug-gene interactions (75 genes × 310 drugs), 119 different putative interactions were identified (0.6%; Supplementary Table 2; column LF).

Cell based screening

310 drugs were tested for their impact on 51 target genes in normal human fibroblasts as outlined in methods. Based on the stringent statistical criteria used for identifying putative positive drug-gene interactions (Z-score > 1.65 for both pools of five containing a given drug), of 15,810 interactions tested in our cell-based screen, 61 putative interactions were identified (0.4%; Supplementary Table 2, column LG).

Validation of hits

Validation of a subset of the hits identified by one or more of the three methods (685/970 for CMAP, 85/119 for CRE; 55/61 for the cell-based screen) was undertaken in human skin fibroblasts by incubating with single drug (2  uM and drug doses closer to therapeutic levels) for 8 h followed by qRT-PCR measurement of target mRNA. In total, of the 685 CMAP hits tested, 34 validated in fibroblasts (5%), for the 85 CRE hits tested, 9 validated (11%) and of the 56 fibroblast screen hits tested, 5 validated (9%). Overall, 47 different drug-gene interactions validated (Table 2), and in only two cases; isotretinoin vs. SMAD3 (CMAP and fibroblast screen; Fig. 1) and dexamethasone vs. ITPR1 (CMAP and CRE) was the same interaction identified by more than one method (Fig. 2). Of these 47, 13 demonstrated a robust response with therapeutic levels (serum concentration) of the drug, another 13 showed a modest response, and 21 displayed a response only when the drug dose was very high (typically 10-fold or greater than therapeutic levels). Furthermore, 19 drugs accounted for these 47 responses (see Table 2). Although transcriptionally active compounds including the vitamin A analog isotretinoin (2 genes), dexamethasone (3 genes), calcitriol (3 genes) were among the agents which upregulated mRNAs, some on the transcriptomic modulating list were unexpected; suprapharmacologic levels of the laxative bisacodyl induced a full 11 target mRNAs while the antipsychotic phenothiazine, fluphenazine upregulated 7. Given one of the most robust responses was observed with isotretinoin and SMAD3, we next attempted to see if there was a corresponding protein induction; a greater than 2-fold induction was seen with 100–500 nM isotretinoin (Fig. 3).
Fig. 1

a Deconvolution of drug pool associated with a SMAD3 induction in fibroblasts. qPCR data displaying relative expression of SMAD3 in response to drug pool 20 and its individual component drugs. All drugs are at 2 uM concentration in dimethyl sulfoxide. The pool response was successfully validated and single drug analysis revealed that this pool response was due to the drug Isotretinoin. b A dose response SMAD3 mRNA Isotretinoin in fibroblasts

Fig. 2

Venn diagram showing the overlap of the validated drug-gene interactions identified by the three methods. Thirty four of 685 CMAP hits tested, 9 of 85 CREKB hits and 5 of 55 hits identified in the fibroblast cell-based screen validated by qRT-PCR measurement of target mRNA in single drug dosing (2 uM) of fibroblasts for 8 h. (Table 2)

Fig. 3

a Sample western blot for SMAD3 protein in NHF treated with various concentrations of isotretinoin (13-cis-retinoic acid; 13-CRA) for 16 h. b Quantified SMAD3 protein expression. Error bars represent SEM. n = 4. Heat shock cognate protein 70 (HSC70) *p < 0.05

a Deconvolution of drug pool associated with a SMAD3 induction in fibroblasts. qPCR data displaying relative expression of SMAD3 in response to drug pool 20 and its individual component drugs. All drugs are at 2 uM concentration in dimethyl sulfoxide. The pool response was successfully validated and single drug analysis revealed that this pool response was due to the drug Isotretinoin. b A dose response SMAD3 mRNA Isotretinoin in fibroblasts Venn diagram showing the overlap of the validated drug-gene interactions identified by the three methods. Thirty four of 685 CMAP hits tested, 9 of 85 CREKB hits and 5 of 55 hits identified in the fibroblast cell-based screen validated by qRT-PCR measurement of target mRNA in single drug dosing (2 uM) of fibroblasts for 8 h. (Table 2) a Sample western blot for SMAD3 protein in NHF treated with various concentrations of isotretinoin (13-cis-retinoic acid; 13-CRA) for 16 h. b Quantified SMAD3 protein expression. Error bars represent SEM. n = 4. Heat shock cognate protein 70 (HSC70) *p < 0.05

Discussion

The era of next generation DNA sequencing based rare disease gene identification has served to underline the comparative dearth of effective rare disease therapies; fewer than 500 for the thousands of disorders; the expense of many is such that payers around the world are struggling and the potential to fund many more is clearly in doubt. New means of identifying inexpensive rare genetic disease therapies are clearly needed. As one possible solution, rare diseases are approached here as problems of gene dosage; conditions resulting from too little or too much of a given protein activity. We set out to establish the frequency with which drug-gene interactions (in this case, pharmacologic upregulation of mRNA) can be first identified and then validated as a possible therapeutic approach to attenuate this pathogenic dysregulation for a targeted group of rare diseases. We searched an extensive transcriptomal database (Connectivity Map (CMAP); 149 drugs; 70 genes), used a text mining platform (causal reasoning engine; 310 drugs; 75 genes) and conducted a targeted drug screen on normal human fibroblasts (310 drugs; 51 genes). Forty-seven examples of drug induction of potential rare disease modifying genes were identified in this fashion. Although we assessed 75 genes/76 disorders, it is clear that there exist many others that could be assessed, in particular the so called hypomorphs. In theory any recessive disorder with missense mutations that preserve both the correct protein localization and some function could be a potential target. Broadly speaking 28% of 7000 genetic disorders are recessive (http://ec.europa.eu/health/rare_diseases/orphanet/report_en) and approximately 70 percent are missense.[17] It is not known what fraction of these preserves some function and correct cellular targeting but even it were as low as 25% there are over three hundred such disorders and this may be an underestimate. In terms of the initial identification of potentially significant drug-gene interactions, the CMAP had, by far, the greatest proportional yield of the three approaches; 9.3% (970/10,430) were potentially positive compared with only 0.6% (119/23,250) and approximately 0.4% (61/15,810) for the CREKB mining and cell-based assay, respectively. It should be noted that although the CMAP appeared much more effective than the other two approaches in identifying putative positives, an interaction was scored as a positive if just one cell line (of the three tested) showed a response; if only those that showed a response in all three lines were counted, the yield dropped to 1%, much closer to the values obtained for the other two platforms. The proportion of the putative positive drug-gene interactions which were next validated (qRT-PCR amplification of the specific mRNA from normal human fibroblasts grown in 2 uM drug) was more consistent; approximately 5% of cases for the CMAP, 11% of the CREKB drug-gene interactions and 9% for our targeted drug screen validated (Table 2; Supplementary Table 2; columns LC-LK). These relatively low validation rates likely devolve from a number of factors; in the case of CMAP, the cells were transformed and not primary lineages and the drug concentration was comparatively high (10 uM). The CREKB showed the highest validation rate at 11%, closely followed by the targeted drug screen (9%). Nonetheless approximately 90% of putative hits did not validate. In the case of CREKB, the inference of a direct drug-gene interaction is based on medical literature encompassing a wide range of tissue and cell types with different drug concentrations and timing; thus the one in ten validation rate might be expected. With regard to the fibroblast screen, the 9% validation rate was unanticipated; it may be that the combination of drugs in pools of 5 to achieve the greatest throughput resulted in unpredictable synergistic multi-drug effects that increased the false positive rate. Conversely, it is also likely that repression of gene expression by individual drugs may have masked drug pool induction events resulting in false negatives. Nevertheless, 47 different drug-gene interactions were identified by the three different methods, showing that these approaches can be used in a complementary manner. The methodologies themselves have significant differences; CREKB is mined from a medical literature often directed toward specific diseases, drug classes and specific readouts while the system wide gene expression data found in CMAP are from cells that are transformed with all the genetic anomalies that that may entail. Interestingly if one looked exclusively at the CMAP drug-gene interactions in which all three cell lines were positive, 18% (2/9) validated; the highest yield by a significant margin. Finally, the normal human skin fibroblasts used in the FDA screen while a technically tractable cell line express only approximately two thirds of the transcriptome over 80% of which encode proteins expressed in all tissues (i.e., “housekeeping proteins”).[18] The mining of existing computational datasets for drug repurposing has been explored by other groups. In particular, a number of reports using the CMAP have been published, focusing on more common conditions such as dyslipidemia,[19] pain,[20] cancer,[21] inflammatory bowel disease,[22] cachexia,[23] osteoporosis,[24] and alopecia.[25] A recently published systematic assessment of the CMAP mining for drug repurposing for cancer and other common diseases showed predictive utility particularly for cancer.[26] Our study is the second of which we are aware that uses the CMAP to study rare genetic diseases [27] and unique in that it utilizes single gene levels rather than gene signatures as the therapeutic target. Similarly the causal reasoning engine platform [12] has been used by a number of groups to look for established drugs with activity against cancer and pain;[28] it has not to our knowledge been used to look for gene induction or applied to rare diseases. Having identified 47 possible leads, we shall next assess for a protein upregulation reflecting mRNA induction; we have shown this for a number of conditions (e.g., SLC2A1 by deferasirox, SMAD3 by isotretinoin; Fig. 3) and would anticipate that, despite a considerable attrition, a significant proportion of the remainder shall as well.[29, 30] Following this assessment in vivo analysis for induction in the tissue of greatest pathophysiologic relevance will be undertaken. Finally, although the concept of increasing an underexpressed protein to treat rare genetic disease is intuitively attractive, insight into the biology of the disease process will also be critical for the next stage of assessment. For example Loeys Dietz type III aortopathy despite being caused by haploinsufficiency of the TGFβ responsive SMAD3 transcription factor,[31] counterintuitively results in an induction of the TGFβ axis.[32] Treatment of young SMAD3 ± mice with SMAD3 inducing retinoic acid thus appears only to add to TGFβ activation and not to improve but possibly exacerbate their aneurysmal phenotype. The timing of the repletion (earlier rather than later) appears critical to the possibility of benefit (E. MacFarlane; personal communication). In another example, autosomal-recessive intellectual disability with cerebellar atrophy syndrome (also known as congenital disorder of glycosylation, type II) is caused by a comparative manganese deficiency resulting from mutation of the manganese transporter gene SLC39A8.[33] We anticipated induction of the paralogous manganese transporting SLC39A14 would be beneficial at the outset of this study, but recent work has shown that loss of this transporter results in manganese accumulation.[34] The gene is an exporter of manganese via hepatic metabolism and its upregulation might actually only worsen the manganese depletion observed in the syndrome; downregulation of the SLC39A14 may therefore be the desired outcome. It is clear that after a promising drug gene induction is observed, validation in the appropriate animal disease model is critical. In conclusion, two in silico approaches and a directed drug screen configured to identify putative drug-gene (mRNA) interactions for rare genetic diseases has provided new putative target drugs that may be further tested for in vivo protein induction as a prelude to potential use as repurposed rare disease therapeutics. Our experience with pharmacologic modulation of gene activity will help frame expectations for those pursuing this path and the data presented here will serve as a useful resource for those studying specific diseases and genes.

Methods

Drug selection

Although the Connectivity Map employed 1300 compounds [9, 10] and the CREKB subtends ~ 450,000 causal relations (12), for purposes of tractability the definition of a subset of compounds was needed. Given that repurposed medications for rare disorders will likely have to be given for life, 310 drugs were therefore selected for testing from the Screen-well v2 FDA approved drug library (Enzo Life Sciences) using overall safety profile and tolerability as desirable characteristics (Supplementary Fig. 1). The impact of these 310 drugs on 75 genes as elicited by two and, where possible, all three approaches therefore comprised the central focus of our work.

Mining of existing CMAP datasets

Broad Institute’s CMAP (https://www.broadinstitute.org/cmap/) is a database of drug-gene interactions employing the Affymetrix GeneChip U133-A microarray-based transcriptomic analysis of mRNA extracted from three cell lines (HeLa, PC3, and MCF7) individually treated with ~ 1300 drugs for 6 h at a concentration of 10 µM.[9, 10] All drugs demonstrating mRNA induction for each of the genes represented in this dataset were identified (70/75 target genes; Table 1). These were then cross-referenced with our FDA drug panel identifying 149 drugs that were utilized in the CMAP screen.

Mining the CREKB

Given the extensive and increasingly detailed biomedical literature, text-mining has become a promising approach with which to identify putative drug-gene interactions.[35] The Pfizer CREKB integrates microarray transcriptomic data with causal statements derived from the biomedical literature (Ingenuity and Selventa “knowledge bases”) to infer upstream molecular mechanisms mediating observed gene expression changes.[11, 12] The underlying knowledge base (CREKB) was used to predict pathway targets that would impact the expression of the disease-related mRNA for our target 75 disease genes for the 310 drugs under test. The interactions were then classified by the proximity of their relationship. Edge 1 effects refer to a direct link, whereas edge 2 is inferred from indirect relationships (e.g., when considering drug A and gene C, a direct impact of A on C is an edge 1 effect while drug A affecting gene B which regulates gene C is an edge 2 effect.) However, edge 2 effects were extremely common therefore only edge 1 direct links were scored. This approach was able to distinguish directionality in terms of the relationship, i.e., up-regulation of gene targets by drugs vs. repression.

Fibroblast-based drug screening

Pooled drugs (5 per well) at 2 uM each were utilized; drugs were assessed twice in the screen, and never pooled with the same drugs (Supplementary Table 1). Dosing was conducted on normal human fibroblasts at an early passage (10) grown to 100% confluence in 10 cm dishes to minimize differential responses devolving from the cycling state of the cell. Cells were collected after an 8 h treatment via trypsin and RNA isolated by RNeasy MICRO with on-column DNAsing (QIAGEN). Minimal toxicity was observed by light microscopy at 8 h. cDNA was synthesized via the iScript Advanced system (BIO-RAD) then transferred to a 384-well plate and qPCR performed using the iQ Supermix (BIO-RAD) on a CFX-384 instrument (BIO-RAD). Each sample was represented in triplicate wells (technical replicates). In total, qPCR was performed on 51 different target genes of interest (Supplementary Table 2; column B). The remaining genes were not tested due to very low/negligible expression in fibroblasts. In addition GAPDH and YWHAZ were also run to determine relative stability of signal and cDNA concentration. Relatively low signals (Ct values > 27) were observed for 20/53 genes analyzed. The drug screen was configured so that a single 384-well qPCR plate contained all 128 drug pools (5 drugs per pool) comprising our library in triplicate; thus a single gene was measured in each of the 53 qPCR plates (51 genes tested). Each 384 well plate thus had the same array of drug pools; it was noted that some pools had tendency to either consistently up or downregulate irrespective of the gene being tested. Although the source of the “hot” and “cold” drug pools is unclear, a plate correction was conducted; an individual drug pool-gene reading was normalized against the geometric mean of all genes tested for the corresponding drug pool in all other plates (i.e., 53 qPCR Ct values per drug pool). Z-scores were then calculated for each gene dataset. Each drug of the 310 drug panel was represented twice in the pools of five; an mRNA level demonstrating a Z-score greater than 1.65 (p value < 0.05) for a given drug in both pools was identified as a putative hit warranting additional investigation and validation.

Western blot validation of isotretinoin

Following treatment of fibroblasts (NHF) with isotretinoin, cells were lysed using radioimmunoprecipitation assay buffer supplemented with protease and phosphatase inhibitors by sonicating for 30 s followed by a resting period of 30 s, repeated for a total of 8 min using a water bath sonicator (DiaMed Transsonic T460). Samples were then centrifuged at 4 °C for 40 min and protein was quantified using a Bradford Protein Assay (Bio-Rad #500-0006). 40 μg of protein was loaded on 11% acrylamide gels and run at 80 V for 30 min, then 120 V for 1 h before being transferred with a Bio-Rad Semi-Dry Transfer system (Amersham Biosciences TE 77 Semi-Dry Transfer Unit) onto nitrocellulose membranes (Bio-Rad, #162-0115) for 1 h 15 min at 65 milliAmps/gel. Membranes were blocked in PBS/Tween (0.05% Tween-20) with 5% dried skimmed milk for 1 h. Anti-SMAD3 antibody (Abcam #ab40854) was used at 1:3000 in PBS/Tween (0.05% Tween-20) with 5% dried skimmed milk powder overnight at 4 °C. Anti-rabbit secondary antibody was used at 1:5000 (Cell Signaling Technology, #7074S) in PBS/Tween (0.05% Tween-20) with 5% dried skimmed milk powder at room temperature for 1 h. Both primary and secondary antibody washes were followed by three 15-minute washes with PBS/Tween (0.05% Tween-20). Antigen detection was carried out using Clarity (Bio-Rad, #170-5061) according to the manufacturer’s instructions. Loading control anti-HSC70 (Santa Cruz Biotechnology, #sc-7298) was used at a concentration of 1:2000 in PBS/Tween (0.05% Tween-20) with 5% dried skimmed milk powder at room temperature for 1 h. Anti-mouse secondary antibody was used at 1:5000 (Cell Signaling Technology, #7076S) in PBS/Tween (0.05% Tween-20) with 5% dried skimmed milk powder at room temperature for 1 h. Blots were quantified using ImageJ 1.48 v software.

Availability of data, materials and methods

The CMAP database is available at: https://www.broadinstitute.org/cmap/ The Pfizer CREKB was made available to us through collaboration. Supplementary Table 1 Supplementary Table 2
  34 in total

1.  Causal reasoning on biological networks: interpreting transcriptional changes.

Authors:  Leonid Chindelevitch; Daniel Ziemek; Ahmed Enayetallah; Ranjit Randhawa; Ben Sidders; Christoph Brockel; Enoch S Huang
Journal:  Bioinformatics       Date:  2012-02-21       Impact factor: 6.937

2.  Expression of human skin-specific genes defined by transcriptomics and antibody-based profiling.

Authors:  Per-Henrik D Edqvist; Linn Fagerberg; Björn M Hallström; Angelika Danielsson; Karolina Edlund; Mathias Uhlén; Fredrik Pontén
Journal:  J Histochem Cytochem       Date:  2014-11-19       Impact factor: 2.479

3.  Human gene mutation database-a biomedical information and research resource.

Authors:  M Krawczak; E V Ball; I Fenton; P D Stenson; S Abeysinghe; N Thomas; D N Cooper
Journal:  Hum Mutat       Date:  2000       Impact factor: 4.878

4.  mRNA expression signatures of human skeletal muscle atrophy identify a natural compound that increases muscle mass.

Authors:  Steven D Kunkel; Manish Suneja; Scott M Ebert; Kale S Bongers; Daniel K Fox; Sharon E Malmberg; Fariborz Alipour; Richard K Shields; Christopher M Adams
Journal:  Cell Metab       Date:  2011-06-08       Impact factor: 27.287

5.  Connectivity Map-based discovery of parbendazole reveals targetable human osteogenic pathway.

Authors:  Andrea M Brum; Jeroen van de Peppel; Cindy S van der Leije; Marijke Schreuders-Koedam; Marco Eijken; Bram C J van der Eerden; Johannes P T M van Leeuwen
Journal:  Proc Natl Acad Sci U S A       Date:  2015-09-29       Impact factor: 11.205

6.  FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project.

Authors:  Chandree L Beaulieu; Jacek Majewski; Jeremy Schwartzentruber; Mark E Samuels; Bridget A Fernandez; Francois P Bernier; Michael Brudno; Bartha Knoppers; Janet Marcadier; David Dyment; Shelin Adam; Dennis E Bulman; Steve J M Jones; Denise Avard; Minh Thu Nguyen; Francois Rousseau; Christian Marshall; Richard F Wintle; Yaoqing Shen; Stephen W Scherer; Jan M Friedman; Jacques L Michaud; Kym M Boycott
Journal:  Am J Hum Genet       Date:  2014-06-05       Impact factor: 11.025

7.  Correlations between RNA and protein expression profiles in 23 human cell lines.

Authors:  Marcus Gry; Rebecca Rimini; Sara Strömberg; Anna Asplund; Fredrik Pontén; Mathias Uhlén; Peter Nilsson
Journal:  BMC Genomics       Date:  2009-08-07       Impact factor: 3.969

8.  How is mRNA expression predictive for protein expression? A correlation study on human circulating monocytes.

Authors:  Yanfang Guo; Peng Xiao; Shufeng Lei; Feiyan Deng; Gary Guishan Xiao; Yaozhong Liu; Xiangding Chen; Liming Li; Shan Wu; Yuan Chen; Hui Jiang; Lijun Tan; Jingyun Xie; Xuezhen Zhu; Songping Liang; Hongwen Deng
Journal:  Acta Biochim Biophys Sin (Shanghai)       Date:  2008-05       Impact factor: 3.848

9.  Autosomal-Recessive Intellectual Disability with Cerebellar Atrophy Syndrome Caused by Mutation of the Manganese and Zinc Transporter Gene SLC39A8.

Authors:  Kym M Boycott; Chandree L Beaulieu; Kristin D Kernohan; Ola H Gebril; Aziz Mhanni; Albert E Chudley; David Redl; Wen Qin; Sarah Hampson; Sébastien Küry; Martine Tetreault; Erik G Puffenberger; James N Scott; Stéphane Bezieau; André Reis; Steffen Uebe; Johannes Schumacher; Robert A Hegele; D Ross McLeod; Marina Gálvez-Peralta; Jacek Majewski; Vincent T Ramaekers; Daniel W Nebert; A Micheil Innes; Jillian S Parboosingh; Rami Abou Jamra
Journal:  Am J Hum Genet       Date:  2015-12-03       Impact factor: 11.025

10.  Defective Connective Tissue Remodeling in Smad3 Mice Leads to Accelerated Aneurysmal Growth Through Disturbed Downstream TGF-β Signaling.

Authors:  I van der Pluijm; N van Vliet; J H von der Thusen; J L Robertus; Y Ridwan; P M van Heijningen; B S van Thiel; M Vermeij; S E Hoeks; R M G B Buijs-Offerman; H J M Verhagen; R Kanaar; A M Bertoli-Avella; J Essers
Journal:  EBioMedicine       Date:  2016-09-10       Impact factor: 8.143

View more
  5 in total

1.  Anisomycin Activates Utrophin Upregulation Through a p38 Signaling Pathway.

Authors:  Jeremiah Hadwen; Faraz Farooq; Luke Witherspoon; Sarah Schock; Kevin Mongeon; Alex MacKenzie
Journal:  Clin Transl Sci       Date:  2018-06-07       Impact factor: 4.689

2.  RNA sequencing analysis reveals increased expression of interferon signaling genes and dysregulation of bone metabolism affecting pathways in the whole blood of patients with osteogenesis imperfecta.

Authors:  Lidiia Zhytnik; Katre Maasalu; Ene Reimann; Aare Märtson; Sulev Kõks
Journal:  BMC Med Genomics       Date:  2020-11-23       Impact factor: 3.063

3.  Stakeholders' views on drug development: the congenital disorders of glycosylation community perspective.

Authors:  Maria Monticelli; Rita Francisco; Sandra Brasil; Dorinda Marques-da-Silva; Tatiana Rijoff; Carlota Pascoal; Jaak Jaeken; Paula A Videira; Vanessa Dos Reis Ferreira
Journal:  Orphanet J Rare Dis       Date:  2022-07-30       Impact factor: 4.303

4.  Separating the Wheat from the Chaff: The Use of Upstream Regulator Analysis to Identify True Differential Expression of Single Genes within Transcriptomic Datasets.

Authors:  Jeremiah Hadwen; Sarah Schock; Faraz Farooq; Alex MacKenzie; Julio Plaza-Diaz
Journal:  Int J Mol Sci       Date:  2021-06-11       Impact factor: 5.923

Review 5.  Artificial Intelligence (AI) in Rare Diseases: Is the Future Brighter?

Authors:  Sandra Brasil; Carlota Pascoal; Rita Francisco; Vanessa Dos Reis Ferreira; Paula A Videira; And Gonçalo Valadão
Journal:  Genes (Basel)       Date:  2019-11-27       Impact factor: 4.096

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.