Literature DB >> 25803826

Use of genome-wide association studies for cancer research and drug repositioning.

Jizhun Zhang1, Kewei Jiang1, Liang Lv1, Hui Wang2, Zhanlong Shen1, Zhidong Gao1, Bo Wang1, Yang Yang1, Yingjiang Ye1, Shan Wang1.   

Abstract

Although genome-wide association studies have identified many risk loci associated with colorectal cancer, the molecular basis of these associations are still unclear. We aimed to infer biological insights and highlight candidate genes of interest within GWAS risk loci. We used an in silico pipeline based on functional annotation, quantitative trait loci mapping of cis-acting gene, PubMed text-mining, protein-protein interaction studies, genetic overlaps with cancer somatic mutations and knockout mouse phenotypes, and functional enrichment analysis to prioritize the candidate genes at the colorectal cancer risk loci. Based on these analyses, we observed that these genes were the targets of approved therapies for colorectal cancer, and suggested that drugs approved for other indications may be repurposed for the treatment of colorectal cancer. This study highlights the use of publicly available data as a cost effective solution to derive biological insights, and provides an empirical evidence that the molecular basis of colorectal cancer can provide important leads for the discovery of new drugs.

Entities:  

Mesh:

Year:  2015        PMID: 25803826      PMCID: PMC4372357          DOI: 10.1371/journal.pone.0116477

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Since the advent of high-density single nucleotide polymorphism (SNP) genotyping arrays, researchers have used genome-wide association studies (GWAS) to identify innumerable loci associated with a multitude of diseases. The vast majority of SNPs identified by GWAS are within the intergenic or intronic regions (approximately 88%)[1,2]. GWAS has also enabled the discovery of many genetic variations of colorectal cancers (CRC). The next step was to identify the genes that were affected by causal variants, which would enable us to translate the risk SNPs to meaningful insights on pathogenesis. Most reports have simply implicated the nearest gene to a GWAS hit as a target of the functional variant without any evidence[1]. The identification of expression quantitative trait loci (eQTL) has been proposed as a promising method to find the candidate genes associated with a disease risk[3][4]. It should be noted that identifying an eQTL provides only an indirect evidence of a link between genotype and gene transcription [1]. As far as we know, there’s simply no good way to identify these target genes, which is key to understanding the mechanism by which GWAS variants act. So we proposed a bioinformatics pipeline to prioritize the most likely candidate genes by using several biological data sets. Seven criteria were adopted to prioritize candidate genes. The widely used eQTL criterion mentioned above is only one of seven criteria in the pipeline. One way of accelerating the translation of data from GWAS into clinical benefits, is to use the results to identify new indications for treatment with existing molecules. GWAS may be used to construct drug-related networks, aiding drug repositioning. Although GWAS do not directly identify most of the existing drug targets, there are several reasons to expect that new targets will nevertheless be discovered using these data[5,6]. Initial results on drug repurposing studies using network analysis are encouraging and suggest directions for future development[7]. By integrating rheumatoid arthritis genetic findings with the catalog of approved drugs for rheumatoid arthritis and other diseases, Okada Y et al provided an empirical data to indicating that genetic approaches may be useful for supporting genetics-driven genomic drug discovery efforts in complex human traits[8]. In the present study, we used the in silico pipeline to systematically integrate data on risk loci for CRC biology and drug discovery from a variety of databases.

Materials and Methods

An overview of the study design is illustrated in Fig. 1. Biological candidate genes were obtained from GWAS-identified CRC risk loci. Next, the genetic data were integrated with the results of statistical analyses, computational approaches, and publicly available large data sets to prioritize the obtained genes, and propose new targets for drug treatments.
Fig 1

An overview of the study design.

One hundred and forty-seven candidate genes were obtained from 50 CRC risk loci. A bioinformatics pipeline was developed for the prioritization of these candidate genes. Seven criteria were used to score the genes: (1) CRC risk missense variant; (2) cis-eQTL; (3) PubMed text mining; (4) PPI; (5) cancer somatic mutation; (6) knockout mouse phenotype; and (7) functional enrichment. Extent of overlap with target genes for approved CRC drugs was also assessed.

An overview of the study design.

One hundred and forty-seven candidate genes were obtained from 50 CRC risk loci. A bioinformatics pipeline was developed for the prioritization of these candidate genes. Seven criteria were used to score the genes: (1) CRC risk missense variant; (2) cis-eQTL; (3) PubMed text mining; (4) PPI; (5) cancer somatic mutation; (6) knockout mouse phenotype; and (7) functional enrichment. Extent of overlap with target genes for approved CRC drugs was also assessed.

CRC risk loci from GWAS

We downloaded CRC risk SNPs from the National Human Genome Research Institute (NHGRI) GWAS catalogue database on January 31, 2014 [2].

Biological candidate genes from CRC risk loci

It is a well-known fact that the risk SNPs indicates haplotypes on which the functional variants reside; therefore, the next step was to identify their target genes. By adopting multi-annotations between risk SNPs and their surrounding genes, the snp2gene allowed conventional annotation due to their proximity, as well as linkage disequilibrium[9]. For each of the GWAS SNPs involved, we used the snp2gene to identify the candidate genes. For each gene in the risk loci, we evaluated if the gene was the nearest gene to the CRC risk SNP within the risk locus.

Prioritization of candidate genes

By using several biological data sets, we devised a bioinformatics pipeline to prioritize the most likely candidate genes. Firstly, functional annotations for CRC risk SNPs were identified by ANNOVAR [10]. Trait-associated variants were enriched within chromatin marks, particularly in H3K4me3[11]. H3K4me3 data of 34 cell-types could provide the fine mapping of associated SNPs to identify causal variation in the previous studies[12]. So we evaluated whether the CRC risk SNPs and SNPs in the linkage disequilibrium (r2 > 0.80) were overlapping with H3K4me3 peaks of 34 cell types. The H3K4me3 data were obtained from the National Institutes of Health Roadmap Epigenomics Mapping Consortium, by a permutation procedure with 105 iterations [12]. We identified genes for CRC risk SNPs or SNPs from linkage disequilibrium (r2 > 0.80) that were annotated as missense variants. Secondly, we assessed the cis-expression quantitative trait loci (cis-eQTL) effects using the data of 5,311 European subjects from the study on peripheral blood mononuclear cells (PBMCs) [13]. Westra HJ et al. had made a browser available for all significant cis-eQTLs, detected at a false-discovery rate of 0.50. (http://genenetwork.nl/bloodeqtlbrowser/) In their study, eQTLs were deemed cis-eQTLs when the distance between the SNP chromosomal position and the probe midpoint was less than 250 kb. The eQTLs were mapped using Spearman’s rank correlation on imputed genotype dosages. Resultant correlations were then converted to P values, and their respective z scores were weighted by the square root of sample size. To evaluate cis-eQTL genes of risk SNPs, it was only needed to provide all risk SNP. When the CRC risk SNP was not available in eQTL data sets, we alternatively used the results of best proxy SNPs in linkage disequilibrium with the highest r2 value (r2 > 0.80). Thirdly, by using the Gene Relationships among Implicated Loci (GRAIL), we evaluated the degree of relatedness among the genes within disease regions. GRAIL is a tool to examine the relationships between genes in different disease associated loci. Given many genomic regions or SNPs associated with CRC, GRAIL searches for similarities in the published scientific text among the associated genes[14]. A p value of <0.05 was considered significant. To avoid publications that reported on or were influenced by the disease regions discovered in the recent scans, we use only those PubMed abstracts published prior to December 2006, before the recent onslaught of GWA papers identifying novel associations, avoid any strong bias towards the genes closest to the associated SNPs. This approach effectively avoids this problem.[14]. Next, we used the Disease Association Protein-Protein Link Evaluator (DAPPLE) for assessing the presence of significant physical connectivity among proteins encoded by candidate genes by protein-protein interaction (PPI), reported in the literature. The DAPPLE takes a list of seed SNPs that converts them into genes based on the overlap. The hypothesis behind DAPPLE is that a genetic variation affects a limited set of underlying mechanisms that are detectable by PPI [15]. A p value of <0.05 was considered significant. Nest, we obtained cancer somatic mutation genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database[16], and downloaded knockout mouse phenotype labels and gene information from the Mouse Genome Informatics (MGI) database[17] on April 8, 2014. We defined all CRC risk genes included in the CRC risk loci, and evaluated the overlap with cancer phenotypes with registered somatic mutations, and phenotype labels of knockout mouse genes with human orthologous. Hypergeometric distribution test was used for overlap statistical analyses with significance at a p value of <0.05. Finally, we performed function enrichment analysis to investigate if genes affected by SNPs were enriched for specific functional categories or pathways. The DAVID Bioinformatics Resources that included Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Online Mendelian Inheritance in Man (OMIM) were used for analyses.[18] The obtained results were considered significant at a p value of <0.05. We scored each of the genes by using the following selection criteria, and calculated the number of the satisfied criteria: (1) genes with missense variants; (2) cis-eQTL genes of risk SNPs; (3) genes prioritized by PubMed text mining; (4) genes prioritized by PPI network; (5) cancer somatic mutation genes; (6) genes prioritized by associated knockout mouse phenotypes; and (7) genes prioritized by functional enrichment analysis. Correlations of candidate gene prioritization criteria were evaluated by the Pearson correlation analysis. Each gene was scored based on the number of criteria that were met (scores ranged from 0–7 for each gene) in case of weak correlations. Genes with a score of ⩾2 were defined as ‘biological risk genes’.

Drug validation and discovery

If human genetics can validate drug targets, then it can be used to identify whether the approved drugs currently used for treating other indications can be used for the treatment of CRC. We present here an analysis of the potential application of GWAS data, drug repositioning. We obtained drug target genes and corresponding drug information from Drug Bank [19] and Therapeutic Targets Database (TTD)[20] on Oct 18, 2013. We selected drug target genes that had pharmacological activities and were effective in human orthologous models, and those which annotated with any of the approved, clinical trial or experimental drugs. The drug target genes annotated to CRC drugs were manually extracted by professional oncologists. To decrease the rates of the false positive, only the drugs under current clinical use were involved in the study, which could be found in the NCCN clinical practice guidelines in colorectal cancers.[21] We extracted genes from direct PPI with biological CRC risk genes by using protein interaction network analysis-2 (PINA2), which integrates six well-known manually curated PPI databases[22]. We evaluated the possibility of exploring protein products from the identified biological risk genes, or any genes from a direct PPI network as targets of approved CRC drugs or drugs for other indications. Let x be the set of the biological CRC risk genes and genes in direct PPI with them (nx genes), y be the set of genes with protein products that are the direct target of approved CRC drugs (ny genes), and z be the set of genes with protein products that are the direct target of all approved drugs (nz genes). We defined nx∩y and nx∩z as the numbers of genes overlapping between x and y and between x and z, respectively. Hypergeometric distribution tests were used for overlap statistical analyses, and a p value of <0.05 was considered statistically significant.

Results

In the present study, 50 CRC-associated SNPs were obtained from NHGRI (Table 1), and 140 genes were obtained based on proximity and linkage disequilibrium using the snp2gene (S1 Table).
Table 1

Summary of 50 colorectal cancer GWAS risk alleles obtained from National Human Genome Research Institute.

Risk SNPs Region p-Value OR 95% CI
rs105054778q24.29E-71.13[1.08-1.19]
rs69832678q24.28E-281.2[1.16-1.24]
rs493982718q21.11E-91.06[1.03-1.09]
rs1079566810p144E-61.27[1.14-1.40]
rs168927668q232E-71.14[1.08-1.19]
rs380284211q23.18E-71.18[1.11-1.27]
rs477958415q131E-71.14[1.08-1.18]
rs493982718q21.18E-91.28[1.18-1.39]
rs493982718q21.12E-81.18[1.11-1.25]
rs69832678q24.26E-101.11[1.08-1.15]
rs70143468q24.23E-131.12[1.10-1.16]
rs1041121019q13.12E-71.14[1.06-1.19]
rs444423514q227E-101.07[1.04-1.10]
rs96125320p121E-141.27[1.16-1.39]
rs992921816q223E-111.17[1.12-1.23]
rs109365993q26.24E-81.35[1.20-1.49]
rs1116955212q13.13E-61.47[1.25-1.72]
rs492538620q13.34E-71.24[1.14-1.34]
rs66877581q412E-101.12[1.08-1.16]
rs66911701q411E-81.1[1.06-1.12]
rs69832678q24.29E-261.19[1.15-1.23]
rs77582296q25.37E-111.24[1.17-1.33]
rs1163271515q13.33E-61.37[1.20-1.56]
rs1696968115q13.39E-71.13[1.08-1.19]
rs195763614q22.22E-101.12[1.09-1.16]
rs168927668q234E-101.17[1.11-1.22]
rs380284211q23.13E-61.13[1.08-1.20]
rs477958415q132E-101.12[1.18-1.16]
rs493982718q21.16E-61.11[1.06-1.15]
rs731543812q24.23E-181.27[1.20-1.34]
rs13213116p21.21E-101.11[1.08-1.15]
rs382499911q13.41E-101.1[1.07-1.13]
rs5934683Xp22.22E-101.09[1.05-1.11]
rs797246512q13.138E-71.18[1.11-1.27]
rs109112511q252E-61.28[1.16-1.43]
rs119037572q32.33E-61.06[0.88-1.29]
rs131307874q223E-81.09[1.06-1.13]
rs1709498314q231E-111.13[1.09-1.18]
rs19124531q235E-71.12[1.08-1.19]
rs20573146q22.17E-61.1[1.05-1.14]
rs21283828q24.24E-81.16[1.10-1.22]
rs321781012p13.39E-61.07[1.04-1.11]
rs321790112p13.33E-71.09[1.06-1.13]
rs380284211q23.18E-61.11[1.06-1.16]
rs477958415q135E-81.18[1.11-1.25]
rs481380220p122E-81.18[1.11-1.24]
rs493982718q21.14E-71.14[1.08-1.20]
rs5933612q24.23E-81.04[1.04-1.10]
rs69832678q24.28E-101.11[1.08-1.15]
rs1077421412p13.32E-61.28[1.15-1.41]
rs1077421412p13.33E-61.28[1.12-1.42]
rs166565010q25.35E-101.17[1.11-1.23]
rs242327920p126E-81.2[1.12-1.28]
rs6471615q31.12E-91.09[1.06-1.12]
rs101144089q22.35E-101.17[1.11-1.23]
rs1087935712q21.14E-61.27[1.15-1.41]
rs177309294q13.24E-71.11[1.06-1.15]
rs3676155q213E-61.08[1.04-1.11]
rs394537p15.34E-101.08[1.05-1.10]
rs45915173p24.31E-91.08[1.06-1.11]
rs93657236q25.31E-121.16[1.09-1.27]
rs125480218p123E-61.25[1.14-1.39]
rs31049648q22.14E-71.09[1.06-1.13]
rs81800403p21.35E-71.23[1.14-1.34]

Functional annotations of CRC risk SNPs

Most SNPs (62%) were located in the intergenic regions (S2 Table). Two SNPs were identified in linkage disequilibrium with missense SNPs (r2 > 0.80; S3 Table). Next, we assessed 50 CRC risk loci for enriched epigenetic chromatin marks [12]. Of the 34 cell types investigated, we observed a significant enrichment of CRC risk alleles with H3K4me3 peaks in rectal mucosa cells (p = 0.00014 and 0.00023, respectively) (S4 Table).

Cis-expression quantitative trait loci (cis-eQTL)

Using the cis-eQTL data obtained from the PBMC study [13], we found that 13 risk SNPs showed cis-eQTL effects (p < 0.0016 and FDR < 0.5) (S5 Table).

PubMed text-mining

Twenty-four genes were prioritized based on data obtained by PubMed text mining using GRAIL with gene-based p < 0.05 [14] (S6 Table).

Protein-protein interaction (PPI)

Two genes were prioritized by PPI network using gene-based DAPPLE with p < 0.05 [15] (S7 Table).

Cancer somatic mutation

Among the 522 genes with registered somatic mutations obtained from the COSMIC database[16], a significant overlap was observed in genes associated in non-hematological cancers (5/6, P = 2.41E -05) (S8 Table).

Knockout mouse phenotype

We evaluated overlap with genes implicated in knockout mouse phenotypes[17]. Among the 30 categories of phenotypes, we observed nine categories significantly enriched with CRC risk genes (p < 0.05), led by craniofacial phenotype (S9 Table).

Functional enrichment analysis

GO analysis indicated that a few genes were enriched in three categories (S10 Table); two with KEGG pathways observed in cancer (p = 0.002), and one with small cell lung cancer (p = 0.011) were functionally related (S11 Table). Functional analysis by OMIM demonstrated enriched gene sets in colorectal diseases (S12 Table). Based on these new findings, we adopted the following seven criteria to prioritize each of the 140 genes from the 50 CRC risk loci: (1) genes with CRC risk missense variant (n = 2); (2) cis-eQTL genes (n = 13); (3) genes prioritized by PubMed text mining (n = 24); (4) genes prioritized by PPI (n = 2); (5) cancer somatic mutation genes (n = 6); (6) genes prioritized by associated knockout mouse phenotypes (n = 40); and (7) genes prioritized by functional enrichment analysis (n = 31). Because these criteria showed weak correlations with each other (R2 < 0.48; S13 Table), each gene was scored based on the number of criteria that were met (scores ranged from 0–7 for each gene). Thirty-five genes (25.2%) had a score>2, which were defined as ‘biological risk genes’ (S1 Fig.). Three loci included multiple biological CRC risk genes, (for example, ROS1 and GOPC by rs2057314) (Table 2).
Table 2

Biological genes in the CRC risk loci with a score≥2.

SNPgeneNearest gene from risk SNPmissense variantcis-eQTLpubmed text-miningPPIcancer somatic mutation genesknockout mouse phenotypefunctionscore
rs3217810 CCND2 1 - 1 1 - 1 1 1 5
rs10774214 CCND2 1 - - 1 - 1 1 1 4
rs1321311 CDKN1A 1 - 1 1 - - 1 1 4
rs4444235 BMP4 1 - 1 1 - - 1 1 4
rs9929218 CDH1 1 - - 1 - 1 1 1 4
rs10114408 BARX1 1 - - 1 - - 1 1 3
rs10411210 RHPN2 1 - - 1 - - 1 1 3
rs10911251 LAMC1 1 - - 1 - - 1 1 3
rs10911251 LAMC2 - - - 1 - - 1 1 3
rs13130787 ATOH1 1 - - 1 - - 1 1 3
rs39453 CYCS 1 - 1 - - - 1 1 3
rs4813802 BMP2 1 - - 1 - - 1 1 3
rs4925386 LAMA5 1 - 1 1 - - - 1 3
rs4939827 SMAD7 1 - - 1 - - 1 1 3
rs59336 TBX3 1 - - 1 - - 1 1 3
rs647161 PITX1 - - - 1 - - 1 1 3
rs7315438 TBX3 1 - - 1 - - 1 1 3
rs11169552 ATF1 - - - - - 1 1 - 2
rs11632715 GREM1 - - - 1 - - - 1 2
rs12548021 DUSP4 1 - - - - - 1 1 2
rs16969681 GREM1 - - - 1 - - - 1 2
rs17094983 DACT1 1 - - 1 - - 1 - 2
rs1912453 RGS4 - - - - - - 1 1 2
rs1957636 CDKN3 1 - - 1 - - - 1 2
rs2057314 ROS1 - - - - - 1 1 - 2
rs2057314 GOPC 1 - - - - 1 - 1 2
rs367615 MAN2A1 1 - - - - - 1 1 2
rs3802842 C11orf53 - - - 1 1 - - - 2
rs4779584 GREM1 - - - 1 - - - 1 2
rs7758229 SLC22A3 1 - - - - - 1 1 2
rs8180040 PTPN23 - 1 - - - - - 1 2
rs8180040 PTH1R - - - - - - 1 1 2
rs8180040 SETD2 - - - - - 1 1 - 2
rs9365723 SYNJ2 1 - 1 - - - - 1 2
rs9929218 CDH3 - - - 1 - - 1 - 2
To provide empirical evidence of the pipeline, we analyzed the gene scores. Genes with higher biological scores were more likely to be nearest to the risk SNP (62.8% for gene score ⩾ 2, 24% for gene score < 2; p < 0.001). Meanwhile, rectal mucosa cells demonstrated significant overlapping proportions with H3K4me3 peaks compared with other cell types. Finally, we evaluated the potential role of genetics in relation to drug discovery for the treatment of CRC. Hypergeometric distribution tests were used for overlap statistical analyses. We obtained 11303 genes pairs from curated PPI databases. We obtained 871 drug target genes corresponding to approved, in clinical trials or experimental drugs for human diseases (S14 Table). For the sake of calculation reliability, only CRC drugs in the first line therapy were involved in the study. Eight target genes of approved CRC drugs were included (S15 Table). Thirty-one biological CRC risk genes overlapped with 533 genes from the expanded PPI network (S2 Fig. and S3 Fig.). We found an overlap of 5/8 drug target genes of approved CRC drugs (5/8 vs 70/781, 12.09-fold enrichment, P = 0.00013). All 871 drug target genes (regardless of disease indication) overlapped with 70 genes from the PPI network, which suggested that the enrichment was 1.55-fold higher than that expected by chance alone (p = 0.00012), but less by 7.78-fold when compared with currently approved CRC drugs (p = 1.78 × 10–5). Examples of approved CRC therapies identified by this analysis included irinotecan, regorafenib, and cetuximab (Fig. 2).
Fig 2

Summary of connections between risk SNPs, biological candidate genes from each risk locus, genes from the PPI network and approved CRC drugs.

Black lines indicate connections.

Summary of connections between risk SNPs, biological candidate genes from each risk locus, genes from the PPI network and approved CRC drugs.

Black lines indicate connections. Correlation of approved drugs for other diseases with biological CRC risk gene was also assessed. An example of drug repositioning (Fig. 3) is the use of crizotinib, an approved drug for non-small cell lung cancer for the treatment of CRC [16]. Arsenic trioxide vrinostat, dasatinib, estramustine, and tamibarotene are all promising drugs for the treatment of CRC (Fig. 4).
Fig 3

Connections between risk SNP, biological CRC genes and drugs indicated for other diseases.

Fig 4

Detailed summary of the connections between risk SNPs, biological candidate genes from each risk locus, genes from the PPI network and drugs indicated for other diseases.

Discussion

GWAS have identified innumerable disease-associated genetic variants. However, significant obstacles have hampered our ability to identify genes affected by causal variants and in elucidating the mechanism by which genotype influences phenotype. Most reports have simply implicated the nearest gene to a GWAS hit without substantial evidence[1]. This study prioritized the most likely target genes. A total of 31 biological CRC risk genes were identified. Although biological CRC risk genes are more likely to be causal genes, this still needs confirmation by basic molecular studies using advanced technologies. Edwards et al provided a pipeline for follow-up studies, which includes fine mapping of risk SNPs, prioritization of putative functional SNPs, and in vitro and in vivo experimental verification of predicted molecular mechanisms for identifying the targeted genes[1]. GWAS is criticized for their lack of clinical translation because of the size of the effect. However, individual small effect sizes do not necessarily preclude clinical utility. Sanseau et al proposed the use of GWAS for drug repositioning, which is regarded as a promising strategy in translational medicine. In a study investigating 3-hydroxy-3-methyl-glutaryl coenzyme A, a well-known cholesterol-lowering medication, SNPs within this gene were unambiguously associated with low-density lipopolysaccharide cholesterol levels in the GWAS data. [6] Their study included all GWAS-associated genes that were selected from the GWAS catalog, without achievement about CRC drugs. In the present study, we focused on repurposing drugs for CRC based on prioritization of candidate genes in the GWAS-identified loci. For example, crizotinib, arsenic trioxide, vrinostat, dasatinib, estramustine, and tamibarotene are also promising repurposed drugs for CRC. Although further investigations are necessary to confirm the results of this study, we opine that these target drugs selected could be promising drug candidates in the treatment of CRC. GWAS data is useful in providing insights into the biology of diseases, but may also translate these leads into profitable opportunities in drug development. However, GWAS data does not provide detailed pathophysiological information; hence, the newly identified uses of old drugs may possibly be side effects[23]. Successful repurposing of a drug entails the combination of results from published literature, and clinical research. Although there were a number of positive aspects from this study, there were some limitations as well. Firstly, data of the PBMC study was used for cis-eQTL analysis. Although eQTLs identified from one tissue type may be a useful surrogate to study the genetics of gene expression in another tissue [24], the use of tissue-specific eQTLs is probably more useful in understanding the pathogenesis of CRC [25]. Secondly, of the 34 cell types investigated, we only observed a significant enrichment of risk SNP with H3K4me3 peaks in rectal mucosa cells. Nevertheless, the enrichment was not significant for the colon mucosa cells. In this study, we integrated genetic data and statistical analysis, computational approaches, and publicly available large data sets to prioritize candidate genes, and propose new targets for CRC drug treatments. We believe that target genes and drugs selected by this approach could be promising leads in the development of candidate drugs for the treatment of CRC, although, further investigations are warranted for confirmation of these results.

Histogram distribution of gene scores.

Thirty-five genes with a score of >2 were defined as ‘biological risk genes’. (TIF) Click here for additional data file.

Overlap of 31 biological genes plus 553 genes in direct PPI with them and drug target genes.

We found overlap of 5 genes from the 8 drug target genes of approved CRC drugs (12.09-fold enrichment, p = 1.78 × 10−5). All 871 drug target genes (regardless of disease indication) overlapped with 70 genes from the PPI network, indicating a 1.55-fold higher enrichment than expected by chance alone (p = 1.20× 10−4); but less than 7.78-fold enrichment compared with CRC drugs (p = 1.30 × 10−4). (TIF) Click here for additional data file.

PPI network of biological CRC risk genes and drug target genes.

Pink: drug target genes; Orange: CRC risk genes; Cyan: direct PPI genes in PINA2 database. (TIF) Click here for additional data file.

Summary of 140 candidate genes based on proximity and linkage disequilibrium.

(XLSX) Click here for additional data file.

Risk SNPs annotated by annovar.

(XLSX) Click here for additional data file.

Missense variant in linkage disequilibrium (r2 > 0.8) with risk single nucleotide polymorphisms annotated by ANNOVAR.

(XLSX) Click here for additional data file.

Overlap of colorectal cancer risk single nucleotide polymorphisms with H3K4me3 peaks in cells.

(DOCX) Click here for additional data file.

cis-expression quantitative trait loci of colorectal cancer risk single nucleotide polymorphisms.

(XLSX) Click here for additional data file.

Genes prioritized by PubMed text mining.

(XLSX) Click here for additional data file.

Genes prioritized by protein-protein interaction network.

(XLSX) Click here for additional data file.

Overlap of colorectal cancer risk genes with cancer somatic mutation genes.

(XLSX) Click here for additional data file.

Genes prioritized by knockout mouse phenotype using hypergeometric distribution test.

(DOCX) Click here for additional data file.

Genes prioritized by go enrichment analysis.

(XLSX) Click here for additional data file.

Genes prioritized by KEGG enrichment analysis.

(XLSX) Click here for additional data file.

Genes prioritized by OMIM enrichment analysis.

(XLSX) Click here for additional data file.

Correlations of biological candidate gene prioritization criteria.

(XLSX) Click here for additional data file.

A list of drug target genes.

(DOCX) Click here for additional data file.

Summary of approved drugs for colorectal cancer and target genes.

(DOCX) Click here for additional data file.
  25 in total

1.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nat Protoc       Date:  2009       Impact factor: 13.491

Review 2.  The Therapeutic Target Database: an internet resource for the primary targets of approved, clinical trial and experimental drugs.

Authors:  Xin Liu; Feng Zhu; Xiaohua Ma; Lin Tao; Jingxian Zhang; Shengyong Yang; Yuquan Wei; Yu Zong Chen
Journal:  Expert Opin Ther Targets       Date:  2011-05-28       Impact factor: 6.902

3.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Authors:  Kai Wang; Mingyao Li; Hakon Hakonarson
Journal:  Nucleic Acids Res       Date:  2010-07-03       Impact factor: 16.971

4.  Polymorphic cis- and trans-regulation of human gene expression.

Authors:  Vivian G Cheung; Renuka R Nayak; Isabel Xiaorong Wang; Susannah Elwyn; Sarah M Cousins; Michael Morley; Richard S Spielman
Journal:  PLoS Biol       Date:  2010-09-14       Impact factor: 8.029

Review 5.  Mapping complex disease traits with global gene expression.

Authors:  William Cookson; Liming Liang; Gonçalo Abecasis; Miriam Moffatt; Mark Lathrop
Journal:  Nat Rev Genet       Date:  2009-03       Impact factor: 53.242

6.  Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology.

Authors:  Elizabeth J Rossin; Kasper Lage; Soumya Raychaudhuri; Ramnik J Xavier; Diana Tatar; Yair Benita; Chris Cotsapas; Mark J Daly
Journal:  PLoS Genet       Date:  2011-01-13       Impact factor: 5.917

7.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors:  Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal:  Nucleic Acids Res       Date:  2010-10-15       Impact factor: 16.971

8.  PINA v2.0: mining interactome modules.

Authors:  Mark J Cowley; Mark Pinese; Karin S Kassahn; Nic Waddell; John V Pearson; Sean M Grimmond; Andrew V Biankin; Sampsa Hautaniemi; Jianmin Wu
Journal:  Nucleic Acids Res       Date:  2011-11-08       Impact factor: 16.971

9.  Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions.

Authors:  Soumya Raychaudhuri; Robert M Plenge; Elizabeth J Rossin; Aylwin C Y Ng; Shaun M Purcell; Pamela Sklar; Edward M Scolnick; Ramnik J Xavier; David Altshuler; Mark J Daly
Journal:  PLoS Genet       Date:  2009-06-26       Impact factor: 5.917

10.  DrugBank: a knowledgebase for drugs, drug actions and drug targets.

Authors:  David S Wishart; Craig Knox; An Chi Guo; Dean Cheng; Savita Shrivastava; Dan Tzur; Bijaya Gautam; Murtaza Hassanali
Journal:  Nucleic Acids Res       Date:  2007-11-29       Impact factor: 16.971

View more
  10 in total

Review 1.  Pathway and network-based strategies to translate genetic discoveries into effective therapies.

Authors:  Casey S Greene; Benjamin F Voight
Journal:  Hum Mol Genet       Date:  2016-06-23       Impact factor: 6.150

Review 2.  Vitamin K and hepatocellular carcinoma: The basic and clinic.

Authors:  Xia Jinghe; Toshihiko Mizuta; Iwata Ozaki
Journal:  World J Clin Cases       Date:  2015-09-16       Impact factor: 1.337

3.  Repurposing paclitaxel for the treatment of fibrosis: indication discovery for existing drugs.

Authors:  Xiao-Wu Chen; Wei Duan; Shu-Feng Zhou
Journal:  Drug Des Devel Ther       Date:  2015-09-07       Impact factor: 4.162

4.  Drug Repurposing Hypothesis Generation Using the "RE:fine Drugs" System.

Authors:  Kelly Regan; Soheil Moosavinasab; Philip Payne; Simon Lin
Journal:  J Vis Exp       Date:  2016-12-11       Impact factor: 1.355

Review 5.  A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries.

Authors:  Kalpana Raja; Matthew Patrick; Yilin Gao; Desmond Madu; Yuyang Yang; Lam C Tsoi
Journal:  Int J Genomics       Date:  2017-02-26       Impact factor: 2.326

Review 6.  Overcoming cancer therapeutic bottleneck by drug repurposing.

Authors:  Zhe Zhang; Li Zhou; Na Xie; Edouard C Nice; Tao Zhang; Yongping Cui; Canhua Huang
Journal:  Signal Transduct Target Ther       Date:  2020-07-02

7.  In silico drug screening by using genome-wide association study data repurposed dabrafenib, an anti-melanoma drug, for Parkinson's disease.

Authors:  Takeshi Uenaka; Wataru Satake; Pei-Chieng Cha; Hideki Hayakawa; Kousuke Baba; Shiying Jiang; Kazuhiro Kobayashi; Motoi Kanagawa; Yukinori Okada; Hideki Mochizuki; Tatsushi Toda
Journal:  Hum Mol Genet       Date:  2018-11-15       Impact factor: 6.150

8.  Precision drug repurposing via convergent eQTL-based molecules and pathway targeting independent disease-associated polymorphisms.

Authors:  Francesca Vitali; Joanne Berghout; Jungwei Fan; Jianrong Li; Qike Li; Haiquan Li; Yves A Lussier
Journal:  Pac Symp Biocomput       Date:  2019

9.  A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition.

Authors:  Mina Gachloo; Yuxing Wang; Jingbo Xia
Journal:  Genomics Inform       Date:  2019-06-27

10.  The Polygenic Risk Score Knowledge Base offers a centralized online repository for calculating and contextualizing polygenic risk scores.

Authors:  Madeline L Page; Elizabeth L Vance; Matthew E Cloward; Ed Ringger; Louisa Dayton; Mark T W Ebbert; Justin B Miller; John S K Kauwe
Journal:  Commun Biol       Date:  2022-09-02
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.