| Literature DB >> 28117334 |
Rachel J J Elands1, Colinda C J M Simons1, Mona Riemenschneider2,3, Aaron Isaacs4,5, Leo J Schouten1, Bas A Verhage1, Kristel Van Steen6, Roger W L Godschalk7, Piet A van den Brandt1, Monika Stoll2,5, Matty P Weijenberg1.
Abstract
Data from GWAS suggest that SNPs associated with complex diseases or traits tend to co-segregate in regions of low recombination, harbouring functionally linked gene clusters. This phenomenon allows for selecting a limited number of SNPs from GWAS repositories for large-scale studies investigating shared mechanisms between diseases. For example, we were interested in shared mechanisms between adult-attained height and post-menopausal breast cancer (BC) and colorectal cancer (CRC) risk, because height is a risk factor for these cancers, though likely not a causal factor. Using SNPs from public GWAS repositories at p-values < 1 × 10-5 and a genomic sliding window of 1 mega base pair, we identified SNP clusters including at least one SNP associated with height and one SNP associated with either post-menopausal BC or CRC risk (or both). SNPs were annotated to genes using HapMap and GRAIL and analysed for significantly overrepresented pathways using ConsensuspathDB. Twelve clusters including 56 SNPs annotated to 26 genes were prioritised because these included at least one height- and one BC risk- or CRC risk-associated SNP annotated to the same gene. Annotated genes were involved in Indian hedgehog signalling (p-value = 7.78 × 10-7) and several cancer site-specific pathways. This systematic approach identified a limited number of clustered SNPs, which pinpoint potential shared mechanisms linking together the complex phenotypes height, post-menopausal BC and CRC.Entities:
Mesh:
Year: 2017 PMID: 28117334 PMCID: PMC5259777 DOI: 10.1038/srep41034
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Flow diagram with overview of SNP selection methodology and the corresponding results.
Overview of the prioritised SNP clusters in which at least one height and one post-menopausal breast or colorectal cancer risk-associated SNP were annotated to the same gene as based on either HapMap or GRAIL, complemented by the SNP-annotation to biological regulatory function information and gene-annotation to enriched pathway and gene ontology categories.
| Cluster ID | Genomic region based on Ensembl Genome Browser release 81 | | Chromosome and cytogenicbond based on Ensembl Genome Browser release 81 | LD tag | | Mapped genein HapMap 37 | Annotated gene in GRAIL | ConsensusPathDB analyses | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GWAS catalogue | RegulomeDB | Gene ontology | Pathway | |||||||||||
| SNP ID | Phenotype | Score | Cis-eQTL | Transcripition factor binding | #1 | #2 | #3 | |||||||
| rs13387042 | BC | intergenic | 2q | 0 | 0 | ✓ | ✓ | |||||||
| rs2553026 | H | enhancer | 0 | 0 | NA | |||||||||
| rs1351164 | H | intron | 0 | 0 | NA | |||||||||
| rs16857609 | BC | intron | 0 | 5 | NA | |||||||||
| rs6435999 | H | intron | 0 | 0 | ||||||||||
| rs3791950 | H | intergenic | 0 | 2b | ||||||||||
| rs10187066 | H | intron | 0 | 1 f | Hedgehog signalling | |||||||||
| rs12470505* | H | upstream gene | 1 | 1 f | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs1052483* | H | exon | 1 | 1 f | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs6724465* | H | intron | 1 | 1 f | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs16859517 | H | intergenic | 0 | 5 | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs9790517 | BC | intron | 4q | 0 | 0 | ✓ | ✓ | |||||||
| rs10010325 | H | intron | 0 | 0 | ✓ | ✓ | ||||||||
| rs6855629 | H | intron | 0 | 6 | ||||||||||
| rs526896* | H | intergenic | 5q | 1 | 5 | ✓ | ✓ | |||||||
| rs31198* | H | intron | 1 | 5 | ✓ | ✓ | ||||||||
| rs647161 | CRC | intron | 0 | 5 | ✓ | ✓ | ||||||||
| rs1047014 | H | upstream gene | 6p | 0 | 5 | ✓ | ✓ | |||||||
| rs16882214 | BC | intergenic | 0 | 0 | NA | |||||||||
| rs2322633 | H | intron | 6q | 0 | 6 | |||||||||
| rs310405 | H | intergenic | 0 | 0 | NA | |||||||||
| rs17530068 | BC | intergenic | 0 | 0 | ||||||||||
| rs961764 | H | intergenic | 6q | 0 | 0 | |||||||||
| rs2057314 | CRC | intron | 0 | 4 | ||||||||||
| rs9285425 | H | intron | 0 | 5 | ||||||||||
| rs3757318* | BC | intron | 6q | 1 | 2c | |||||||||
| rs3734805* | BC | 3 prime UTR variant | 1 | 0 | ||||||||||
| rs2046210 | BC | intergenic | 0 | 1 f | ||||||||||
| rs9383938 | BC | intron | 0 | 5 | ||||||||||
| rs543650 | H | intron | 0 | 0 | ✓ | ✓ | ✓ | |||||||
| rs9383951 | BC | intergenic | 0 | 4 | ✓ | ✓ | ✓ | |||||||
| rs2982712 | H | intron | 0 | 0 | ✓ | ✓ | ✓ | |||||||
| rs10114408 | CRC | intergenic | 9q | 0 | 6 | ✓ | ✓ | |||||||
| rs1257763 | H | intergenic | 0 | 0 | ✓ | ✓ | ||||||||
| rs16910061 | H | upstream gene | 0 | 5 | ✓ | ✓ | ||||||||
| rs473902 | H | intron | 0 | 3a | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs10512248 | H | promotor | 0 | 6 | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs2025151 | H | enhancer | 0 | 1 f | ✓ | ✓ | ||||||||
| rs10816533 | H | promotor | 0 | 1 f | ✓ | ✓ | ||||||||
| rs704010 | BC | intron | 10q | 0 | 2b | ✓ | ✓ | |||||||
| rs7916441* | H | enhancer | 1 | 5 | ✓ | ✓ | ||||||||
| rs780151* | H | intron | 1 | 5 | ✓ | ✓ | ||||||||
| rs12355688 | BC | exon | 0 | 4 | ✓ | ✓ | ||||||||
| rs2145998* | H | intergenic | 1 | 5 | ✓ | ✓ | ||||||||
| rs941873 * | H | promoter flanking | 1 | 4 | ✓ | ✓ | ||||||||
| rs2588809 | BC | intron | 14q | 0 | 0 | |||||||||
| rs1570106 | H | intron | 0 | 0 | ||||||||||
| rs999737 | BC | intron | 0 | 6 | ||||||||||
| rs961253 | CRC | intergenic | 20p | 0 | 5 | ✓ | ||||||||
| rs967417* | H | intergenic | 1 | 6 | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs2145270* | H | intergenic | 1 | 0 | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs2145272* | H | intergenic | 1 | 3a | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs4813802 | CRC | promotor | 0 | 3a | ✓ | ✓ | ✓ | Hedgehog signalling | ||||||
| rs139909 | H | intron | 22q | 0 | 2b | ✓ | ✓ | |||||||
| rs5757949 | H | intron | 0 | 5 | ||||||||||
| rs6001930 | BC | intron | 0 | 5 | ||||||||||
Abbreviations: eQTL; expression quantitative trait loci; GWAS, genome-wide association study; LD, linkage disequilibrium NA, data not available in GWAS catalogue; SNP, single nucleotide polymorphism.
aSNPs with the highest level of regulatory evidence were prioritised, indicated by the footnote (a). In cases were the regulatory evidence was equal, SNPs in high LD were prioritised according to the most significant p-value.
bPhenotype specifies whether a SNP derived from the GWAS catalogues by Hindorff et al.14 and Johnson O’Donnel et al.15 is associated with height (H), breast cancer risk (BC) or colorectal cancer risk (CRC).
cAn LD tag equal to one denotes two or more SNPs within the same cluster that are in high LD (r2 > 0.7).
dRegulomeDB score for the putative regulatory function of a SNP.
eGenes for which the SNP is a cis-eQTL according to RegulomeDB. (Cis-eQTLs are SNPs that are associated with mRNA expression of (a) nearby located gene(s)).
fKnown transcription factor proteins that are binding to the genomic coordinates of the SNP according to RegulomeDB.
gSNPs were annotated to a gene using the physical mapping of a SNP to a gene according to HapMap.
hGene annotations using GRAIL (http://software.broadinstitute.org/mpg/grail/) were based on gene relationships among the complete set of SNPs listed in this table (S1 Table). In GRAIL, SNPs are annotated to genes by integrating the geographical location of a SNP derived from HapMap release 22with the biological data of a SNP obtained through text-mining using Pubmed 2014. GRAIL was set to correct for biases introduced by variable gene size when annotating the SNPs to genes. Large genes are more likely to have significant SNPs, and thus have a higher probability to be included in the regions that are being tested (Book: Computational Methods for Genetics of Complex Traits).
iIndicated with check-marks is whether the GRAIL gene annotation for a particular SNP contributed to the finding that the top three gene ontology terms, i.e. (#1) regulation of biosynthetic process (GO:009889), (#2) regulation of macromolecule metabolic process (GO:0060255), and (#3) epithelial cell proliferation (GO:0050673), were overrepresented in the total set of gene annotations from GRAIL (overrepresentation analyses were performed using ConsensusPathDB).
jIndicates whether a gene mapped to a SNP is annotated to the overrepresented Indian hedgehog signalling pathway according to ConsensusPathDB.
Overrepresented pathways in prioritised SNP selectiona.
| Pathways | Set size | Number of genes from set in annotated gene list | Genes | Pathway source | ||
|---|---|---|---|---|---|---|
| Overrepresented pathways using the genes annotated to the prioritised set of SNPs associated with height, post-menopausal breast and colorectal cancer risk. | ||||||
| Hedgehog signalling pathway | 52 | 4 | 7.78 × 10−7 | 2.08 × 10−5 | KEGG | |
| Hedgehog signalling pathway | 16 | 3 | 1.49 × 10−6 | 2.08 × 10−5 | Wikipathways | |
| Hedgehog | 25 | 3 | 6.06 × 10−6 | 5.56 × 10−5 | NetPath | |
| Ligand-receptor interactions | 8 | 2 | 5.76 × 10−5 | 3.77 × 10−4 | Reactome | |
| Basal cell carcinoma | 55 | 3 | 6.73 × 10−5 | 1.63 × 10−3 | KEGG | |
| HH-Core | 19 | 2 | 3.48 × 10−4 | 1.63 × 10−3 | Signalink | |
| Signalling events mediated by the Hedgehog family | 23 | 2 | 5.14 × 10−4 | 2.06 × 10−3 | PID | |
| Hedgehog, on, state | 42 | 2 | 1.41 × 10−3 | 4.93 × 10−3 | Reactome | |
| Hedgehog signalling events mediated by Gli proteins | 50 | 2 | 2.24 × 10−3 | 6.97 × 10−3 | PID | |
| Endochondral ossification | 64 | 2 | 3.83 × 10−3 | 1.07 × 10−3 | Wikipathways | |
| TGF-beta signalling pathway | 80 | 2 | 5.96 × 10−3 | 1.48 × 10−3 | KEGG | |
| Signalling by Hedgehog | 87 | 2 | 6.41 × 10−3 | 1.48 × 10−3 | Reactome | |
| Class B/2 (Secretin family receptors) | 88 | 2 | 6.87 × 10−3 | 1.48 × 10−3 | Reactome | |
| Overrepresented pathways using the genes annotated to the prioritised SNPs associated with height and post-menopausal breast cancer risk. | ||||||
| Hedgehog signalling pathway | 16 | 2 | 1.13 × 10−4 | 1.35 × 10−3 | Wikipathways | |
| Hedgehog | 25 | 2 | 2.81 × 10−4 | 1.68 × 10−3 | NetPath | |
| Hedgehog signalling pathway | 52 | 2 | 1.18 × 10−3 | 4.70 × 10−3 | KEGG | |
| Signalling by | 153 | 2 | 9.02 × 10−3 | 2.19 × 10−2 | Reactome | |
| Androgen receptor | 149 | 2 | 9.14 × 10−3 | 2.19 × 10−2 | NetPath | |
| Overrepresented pathways using the genes annotated to the prioritised SNPs associated with height and colorectal cancer risk. | ||||||
| Hedgehog signalling pathway | 52 | 2 | 2.81 × 10−4 | 6.34 × 10−4 | KEGG | |
| Basal cell carcinoma | 55 | 2 | 2.53 × 10−4 | 6.34 × 10−4 | KEGG | |
Abbreviations: SNP, single nucleotide polymorphism.
aOverrepresented pathways were retrieved using the SNP-gene annotations from GRAIL.
bThe p-values are corrected for multiple testing using the false discovery rate method and are shown as q-values.
Top ten most significantly overrepresented gene-ontology terms in prioritised SNP selectiona.
| GO terms | Set size | Number of genes from set in annotated gene list | Sub-analysis: height and breast cancer risk | Sub-analysis: height and colorectal cancer risk | ||
|---|---|---|---|---|---|---|
| GO:0009889 regulation of biosynthetic process | 4061 | 15 | 4.85 × 10−6 | 6.21 × 10−4 | ✓ | |
| GO:0060255 regulation of macromolecule metabolic process | 5358 | 16 | 2.85 × 10−5 | 1.80 × 10−3 | ✓ | |
| GO:0050673 epithelial cell proliferation | 323 | 5 | 3.29 × 10−5 | 3.30 × 10−2 | ✓ | ✓ |
| GO:0048754 branching morphogenesis of an epithelial tube | 170 | 4 | 4.55 × 10−5 | 1.80 × 10−3 | ✓ | ✓ |
| GO:0090304 nucleic acid metabolic process | 4893 | 15 | 5.61 × 10−5 | 1.80 × 10−3 | ✓ | |
| GO:0016070 RNA metabolic process | 4339 | 14 | 7.48 × 10−5 | 1.81 × 10−3 | ✓ | |
| GO:0061138 morphogenesis of a branching epithelium | 202 | 4 | 8.47 × 10−5 | 1.81 × 10−3 | ✓ | |
| GO:0048732 gland development | 407 | 5 | 9.38 × 10−5 | 3.30 × 10−3 | ✓ | ✓ |
| GO:0060322 head development | 678 | 6 | 10.40 × 10−4 | 3.30 × 10−3 | ✓ | |
| GO:0001763 morphogenesis of a branching structure | 213 | 4 | 10.50 × 10−4 | 3.30 × 10−3 | ✓ |
Abbreviations GO, gene ontology; SNP, single nucleotide polymorphism.
aOverrepresentation analysis for GO terms were performed using using the SNP-gene annotations from GRAIL.
bThe p-values are corrected for multiple testing using the false discovery rate method and are available as q-values.
cThe check-mark indicates which of the top 10 GO-terms from the main GO overrepresentation analysis were also present in separate analyses for breast and colorectal cancer risk.