| Literature DB >> 31308362 |
Nima C Emami1,2, Linda Kachuri2, Travis J Meyers2, Rajdeep Das3,4, Joshua D Hoffman2, Thomas J Hoffmann2,5, Donglei Hu5,6,7, Jun Shan8, Felix Y Feng3,4,7, Elad Ziv5,6,7, Stephen K Van Den Eeden3,8, John S Witte9,10,11,12,13.
Abstract
Here we train cis-regulatory models of prostate tissue gene expression and impute expression transcriptome-wide for 233,955 European ancestry men (14,616 prostate cancer (PrCa) cases, 219,339 controls) from two large cohorts. Among 12,014 genes evaluated in the UK Biobank, we identify 38 associated with PrCa, many replicating in the Kaiser Permanente RPGEH. We report the association of elevated TMPRSS2 expression with increased PrCa risk (independent of a previously-reported risk variant) and with increased tumoral expression of the TMPRSS2:ERG fusion-oncogene in The Cancer Genome Atlas, suggesting a novel germline-somatic interaction mechanism. Three novel genes, HOXA4, KLK1, and TIMM23, additionally replicate in the RPGEH cohort. Furthermore, 4 genes, MSMB, NCOA4, PCAT1, and PPP1R14A, are associated with PrCa in a trans-ethnic meta-analysis (N = 9117). Many genes exhibit evidence for allele-specific transcriptional activation by PrCa master-regulators (including androgen receptor) in Position Weight Matrix, Chip-Seq, and Hi-C experimental data, suggesting common regulatory mechanisms for the associated genes.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31308362 PMCID: PMC6629701 DOI: 10.1038/s41467-019-10808-7
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1TWAS experimental design and comparison of reference panel model performance. a Experimental design for TWAS study of prostate cancer risk. b Scatter plot comparison of the cross-validated performance r2 for 1884 gene expression models derived from GTEx prostate data (N = 87 subjects) vs. the training dataset for the present study (N = 471). In addition to a linear regression line and 95% confidence interval, marginal histograms and density curves are included for both the x-axis (training data model performance) and y-axis (GTEx model performance), with the minimum and mean r2 values also labeled. Performance r2 was computed based on in-sample cross-validation in each respective dataset. c Scatter plot comparison of the out-of-sample model performance for models derived from GTEx vs. the training dataset. Both sets of models were applied to a TCGA normal prostate tissue dataset (N = 45) to measure the relationship between observed and imputed expression for 1753 genes. The correlation (Spearman’s rho) between imputed and observed expression is illustrated in red, while the mean squared error of the predictions is illustrated in violet, both with marginal density curves
Fig. 2TWAS Discovery Associations. Two Manhattan plots depicting the transcriptome-wide associations with prostate cancer risk for genes with a positive direction of effect (“Risk Genes”, top) and genes with a negative direction of effect (“Protective Genes”, bottom) in the UK Biobank discovery cohort (N = 7963 prostate cancer cases, 189,218 male controls). For both Manhattan plots, the associations (Logistic Regression -log10(p-value), y-axis) are plotted against the chromosome and position (x-axis) of the transcription start site of a given gene, with non-significant genes on odd and even chromosomes colored in alternating shades. Thresholds for significant (p < 4.16 × 10−6) and suggestive (4.16 × 10−6 < p < 4.16 × 10−5) associations are illustrated by dashed gray lines, and genes nominally significant (p < 0.05) or unreplicated in the Kaiser Permanente RPGEH replication cohort are illustrated as red triangles and pink circles, respectively
Discovery and replication analysis summary statistics for significant and suggestive genes
| Gene | Discovery (UK Biobank) Beta (SE); | Replication (KP) Beta (SE); | Model | Locus | Meta |
|---|---|---|---|---|---|
| MSMB | −1.63 (0.12); 2.97 × 10−41 | −1.48 (0.14); 1.68 × 10−25 | 0.124 | 10q11.22 | 7.00 × 10−65 |
| NCOA4 | 0.75 (0.06); 1.34 × 10−38 | 0.66 (0.06); 6.50 × 10−25 | 0.402 | 10q11.22 | 1.53 × 10−61 |
| HNF1B | 2.03 (0.16); 5.89 × 10−36 | 1.76 (0.19); 1.50 × 10−20 | 0.145 | 17q12 | 1.50 × 10−54 |
| AGAP7 | 1.21 (0.12); 2.05 × 10−24 | 0.60 (0.10); 7.88 × 10−9 | 0.204 | 10q11.22 | 1.90 × 10−28 |
| POU5F1B | 3.64 (0.44); 8.40 × 10−17 | 3.42 (0.53); 1.11 × 10−10 | 0.033 | 8q24.21 | 6.44 × 10−26 |
| C19orf48 | 2.95 (0.39); 2.46 × 10−14 | 2.04 (0.40); 2.50 × 10−7 | 0.150 | 19q13.33 | 1.34 × 10−19 |
| KLK15 | 1.65 (0.23); 1.26 × 10−12 | 1.22 (0.27); 4.57 × 10−6 | 0.056 | 19q13.33 | 6.05 × 10−17 |
| PCAT1 | −1.28 (0.18); 5.01 × 10−12 | −1.41 (0.21); 1.85 × 10−11 | 0.072 | 8q24.21 | 6.47 × 10−22 |
| TMPRSS2 | 0.50 (0.08); 2.42 × 10−9 | 0.24 (0.08); 3.33 × 10−3 | 0.154 | 21q22.3 | 3.84 × 10−10 |
| FAM57A | −0.50 (0.08); 4.23 × 10−9 | −0.26 (0.10); 7.49 × 10−3 | 0.376 | 17p13.3 | 5.69 × 10−10 |
| PPP1R14A | 1.80 (0.31); 9.99 × 10−9 | 1.48 (0.37); 6.07 × 10−5 | 0.206 | 19q13.2 | 3.31 × 10−12 |
| ZFP36L2 | −4.06 (0.74); 4.26 × 10−8 | −3.39 (0.87); 9.71 × 10−5 | 0.035 | 2p21 | 2.10 × 10−11 |
| BHLHA15 | 1.80 (0.33); 5.18 × 10−8 | 0.79 (0.28); 4.24 × 10−3 | 0.067 | 7q21.3 | 1.34 × 10−8 |
| GEMIN4 | −2.16 (0.41); 1.39 × 10−7 | −1.45 (0.48); 2.65 × 10−3 | 0.080 | 17p13.3 | 2.52 × 10−9 |
| STK25 | 4.97 (1.02); 9.85 × 10−7 | 3.80 (1.01); 1.76 × 10−4 | 0.100 | 2q37.3 | 9.82 × 10−10 |
| KLK1 | 0.36 (0.08); 7.71 × 10−6 | 0.31 (0.07); 6.24 × 10−6 | 0.143 | 19q13.33 | 2.27 × 10−10 |
| HOXA4 | −5.71 (1.31); 1.43 × 10−5 | −1.89 (0.94); 0.04 | 0.067 | 7p15.2 | 3.13 × 10−5 |
| VPS53 | −2.30 (0.53); 1.68 × 10−5 | −1.40 (0.51); 5.79 × 10−3 | 0.259 | 17p13.3 | 6.90 × 10−7 |
| TIMM23 | 3.31 (0.79); 2.77 × 10−5 | 3.46 (0.93); 1.89 × 10−4 | 0.080 | 10q11.22 | 2.01 × 10−8 |
Fig. 3TWAS analysis conditional upon prostate cancer risk GWAS variants and correlation between imputed TMPRSS2 expression and observed ERG expression in TCGA prostate tumors. a Comparison of the associations in the UK Biobank discovery cohort before (red or pink) and after (blue) adjusting a gene’s association (y-axis, −log10(p-value)) for the genotypes at the previously reported lead variant for an adjacent prostate cancer risk GWAS locus. When the lead variant was not present in the imputed UK Biobank genotype dataset, a suitable proxy (r2 > 0.8 in 1000 Genomes Phase III EUR) was used if available. The p-value threshold for Bonferroni-corrected significance (Logistic Regression p < 4.16 × 10−6) is illustrated by a dashed black line, and the suggestive p-value threshold by a dashed grey line. Genes nominally significant (p < 0.05) or unreplicated in the Kaiser Permanente RPGEH replication cohort are illustrated as red triangles and pink circles, respectively. b Scatter plot illustrating the relationship between imputed expression of TMPRSS2 in normal prostate tissue as predicted by germline cis-eQTL genotypes (x-axis) and observed tumoral expression of ERG (y-axis) in prostate cancer cases from The Cancer Genome Atlas (TCGA). Data are colored by TMPRSS2:ERG (T2E) fusion status for T2E-positive (orange, N = 101) and T2E-negative (green, N = 161) subjects, as inferred from paired-end RNA-Seq data. Linear regression lines and 95% confidence intervals illustrate the respective means and trends for T2E-positive and T2E-negative subjects
Replicated genes with eQTLs in or tagging VCaP ChIP-Seq transcription factor binding sites
| Gene | VCaP ChIP-Seq TFBS | Variant(s) (hg19 position) |
|---|---|---|
| AGAP7 | AR | rs58186870 (chr10:51812898), rs58677292 (chr10:51812896), rs56106241 (chr10:51812825) |
| BHLHA15 | AR | rs6975156 (chr7:97925533), rs7789380 (chr7:97956179), rs10953245 (chr7:97855461) |
| C19orf48 | AR | rs11665748a (chr19:51354396), rs78177998a (chr19:51345263), rs2659051a (chr19:51345567), rs11665698 (chr19:51354410) |
| FAM57A | AR | rs461251a (chr17:619161), rs684232a (chr17:618964) |
| GEMIN4 | AR | rs461251a (chr17:619161), rs684232a (chr17:618964) |
| KLK1 | AR | rs11084033a (chr19:51353954) |
| KLK15 | AR | rs78177998a (chr19:51345263) |
| NCOA4 | AR | rs12571566 (chr10:51813068), rs61848292 (chr10:51813024), rs12569965 (chr10:51813070) |
| PCAT1 | SPDEF | rs1516942 (chr8:128019902), rs28615829 (chr8:128018204), rs7844107a (chr8:128023385), rs73351621 (chr8:128014414), rs9693379 (chr8:128022940), rs78316206a (chr8:128019308), rs2035637a (chr8:128023058), rs17830059 (chr8:128016372), rs73351629 (chr8:128018465), rs16901898 (chr8:128015091) |
| PPP1R14A | AR | rs73034946 (chr19:38460492) |
| STK25 | AR | rs56390510a (chr2:242274488) |
| TMPRSS2 | AR | rs56095453a (chr21:42893807), rs8134378 (chr21:42893757), rs8134657 (chr21:42893907) |
| VPS53 | AR | rs461251a (chr17:619161), rs684232a (chr17:618964) |
aDirectly modeled eQTL variants in VCaP ChIP-Seq TFBS. Remaining variants in LD (r2 ≥ 0.8 in 1000 Genomes Phase III EUR) with a modeled eQTL variant
Fig. 4Comparison of variant effect on androgen receptor (AR) TFBS affinity and modeled eQTL effect on gene expression levels. a Illustration of the relationship between the effect of variant rs9979885 (orange) on prostatic TMPRSS2 expression levels (βeQTL), estimated from elastic net regression, and the effect of rs8134378 (teal) on AR binding (pBinding). In determining predictors of TMPRSS2 levels in normal prostate tissue, the penalized regression model selects rs9979885, a perfect LD proxy for rs8134378. As depicted by binding motif allele frequencies in the AR TFBS motif sequence logo and previously validated experimentally, the rs8134378-G allele significantly improves the affinity of AR binding in comparison to the rs8134378-A allele, substantially improving the probability of AR occupancy (pBinding = 0.006 vs. 0.187, using TRANSFAC vertebrate matrix V$AR_01, in comparison to human promoter background) according to sTRAP transcription factor affinity prediction modeling. Likewise, the rs9979885-C allele, in total linkage disequilibrium (LD r2 = 1.0 in 1000 Genomes Phase III EUR) with rs8134378-G, is predicted to increase expression of TMPRSS2 (located on the reverse-strand of chromosome 21), in comparison to the rs9979885-T allele. The correlation between the alleles estimated to increase transcription factor binding and gene expression reflects the model’s biologically relevant and mechanistic ascertainment of the effect of AR binding on TMPRSS2 expression. b Illustration of the relationship between the effect of variant rs142470094 (orange) on prostatic AGAP7 expression (βeQTL) and the effect of rs58677292 (teal) on AR binding (pBinding). As depicted by the AR TFBS motif sequence logo, the rs58677292-T allele significantly improves the affinity of AR binding in comparison to the rs58677292-C allele, increasing the probability of AR occupancy (pBinding = 0.009, vs. 0.225, using TRANSFAC Vertebrate Matrix V$AR_01) according to sTRAP Modeling. Likewise, the rs142470094-A allele, in high linkage disequilibrium (LD r2 = 0.801 in 1000 Genomes Phase III EUR) with rs58677292-T, is predicted to increase expression of AGAP7 (located on the reverse-strand of chromosome 10) in comparison to the rs142470094-ATG indel, suggesting that AGAP7 may be regulated in part by genetic effects on androgen receptor binding