Literature DB >> 34317694

Pleiotropy-guided transcriptome imputation from normal and tumor tissues identifies candidate susceptibility genes for breast and ovarian cancer.

Siddhartha P Kar1,2, Daniel P C Considine3, Jonathan P Tyrer4, Jasmine T Plummer5,6, Stephanie Chen5,6, Felipe S Dezem5,6, Alvaro N Barbeira7, Padma S Rajagopal8, Will T Rosenow9, Fernando Moreno10, Clara Bodelon11, Jenny Chang-Claude12,13, Georgia Chenevix-Trench14, Anna deFazio15,16, Thilo Dörk17, Arif B Ekici18,19, Ailith Ewing20,21, George Fountzilas22, Ellen L Goode23, Mikael Hartman24,25, Florian Heitz26,27, Peter Hillemanns28, Estrid Høgdall29,30, Claus K Høgdall31, Tomasz Huzarski32,33, Allan Jensen34, Beth Y Karlan35, Elza Khusnutdinova36,37, Lambertus A Kiemeney38, Susanne K Kjaer29,39, Rüdiger Klapdor28, Martin Köbel40, Jingmei Li24,41, Clemens Liebrich42, Taymaa May43, Håkan Olsson44, Jennifer B Permuth45, Paolo Peterlongo46, Paolo Radice47, Susan J Ramus48,49, Marjorie J Riggan50, Harvey A Risch51, Emmanouil Saloustros52, Jacques Simard53, Lukasz M Szafron54, Linda Titus55, Cheryl L Thompson56, Robert A Vierkant57, Stacey J Winham58, Wei Zheng59, Jennifer A Doherty60, Andrew Berchuck61, Kate Lawrenson5,62, Hae Kyung Im7, Ani W Manichaikul9,63, Paul D P Pharoah3,4, Simon A Gayther5, Joellen M Schildkraut64.   

Abstract

Familial, sequencing, and genome-wide association studies (GWASs) and genetic correlation analyses have progressively unraveled the shared or pleiotropic germline genetics of breast and ovarian cancer. In this study, we aimed to leverage this shared germline genetics to improve the power of transcriptome-wide association studies (TWASs) to identify candidate breast cancer and ovarian cancer susceptibility genes. We built gene expression prediction models using the PrediXcan method in 681 breast and 295 ovarian tumors from The Cancer Genome Atlas and 211 breast and 99 ovarian normal tissue samples from the Genotype-Tissue Expression project and integrated these with GWAS meta-analysis data from the Breast Cancer Association Consortium (122,977 cases/105,974 controls) and the Ovarian Cancer Association Consortium (22,406 cases/40,941 controls). The integration was achieved through application of a pleiotropy-guided conditional/conjunction false discovery rate (FDR) approach in the setting of a TWASs. This identified 14 candidate breast cancer susceptibility genes spanning 11 genomic regions and 8 candidate ovarian cancer susceptibility genes spanning 5 genomic regions at conjunction FDR < 0.05 that were >1 Mb away from known breast and/or ovarian cancer susceptibility loci. We also identified 38 candidate breast cancer susceptibility genes and 17 candidate ovarian cancer susceptibility genes at conjunction FDR < 0.05 at known breast and/or ovarian susceptibility loci. The 22 genes identified by our cross-cancer analysis represent promising candidates that further elucidate the role of the transcriptome in mediating germline breast and ovarian cancer risk.

Entities:  

Year:  2021        PMID: 34317694      PMCID: PMC8312632          DOI: 10.1016/j.xhgg.2021.100042

Source DB:  PubMed          Journal:  HGG Adv        ISSN: 2666-2477


Introduction

The last three decades have witnessed major advances in our understanding of the shared inherited genetic basis of breast and ovarian cancer. The identification of rare inherited mutations in BRCA1 (MIM: 113705)[1] and BRCA2 (MIM: 600185)[2] that confer high risks of developing both breast and ovarian cancer has directly opened up the identification of oncogenic mechanisms leading to the development of poly ADP ribose polymerase inhibitor therapy.[3] The findings from genome-wide association studies (GWASs) have demonstrated that there is a strong genetic correlation between breast and ovarian cancer[4] and have identified several genomic regions containing common (minor allele frequency > 1%) variants that confer risk of developing both breast and ovarian cancer.[5,6] Transcriptome-wide association studies (TWASs) represent the latest study design for the identification of disease-associated susceptibility genes. TWASs involve establishing robust multi-variant models for the component of somatic (normal or tumor) gene expression that is regulated by germline genetic variation in a smaller dataset where both germline genotype and somatic transcriptomic data are available. These models are then used to impute the germline genetically regulated component of gene expression into a larger GWAS dataset where measured gene expression is unavailable but that offers significantly improved power to identify genes associated with disease risk where such risk may be mediated by expression. Moving from single variants (GWASs) to genes (TWASs) as the unit of association reduces the multiple testing burden. The use of gene expression provides a readily accessible readout of the functional basis of the identified association in contrast to GWAS-identified risk variants that predominantly reside in non-coding regions of the genome.[7] PrediXcan is a method developed recently for conducting TWASs.[8] TWAS methods have been applied to single cancer types before, including breast cancer[9,10] and ovarian cancer.[11,12] Here we present an application of PrediXcan, and indeed broadly of TWASs, in the pleiotropic cross-cancer setting. We used the normal and tumor breast- and ovary-specific gene expression and matched germline genotype datasets to generate tissue-specific PrediXcan models and first imputed these models into GWAS data for the corresponding cancers (i.e., from breast-tissue-derived models into breast cancer GWASs and likewise for the ovarian models). We then imputed models across cancer types (i.e., from breast-tissue-derived models into ovarian cancer GWASs and vice versa). Finally, we implemented a powerful conjunction false discovery rate (FDR) approach[13,14] that has been applied previously to GWASs,[15-18] but not to TWASs, to leverage the combined GWAS sample of over 145,000 breast and ovarian cancer cases. We identify candidate breast and ovarian cancer susceptibility genes in regions not previously implicated by GWAS or TWAS analyses of these cancers.

Material and methods

Matched germline genotype: normal/tumor gene expression datasets

We used data for 211 normal breast tissue samples and 99 normal ovarian tissue samples from the Genotype-Tissue Expression (GTEx) project (version 7 release).[19] Germline genotypes in the GTEx data had been called from whole-genome sequencing (Illumina HiSeq X), and gene expression was profiled using RNA-sequencing (Illumina TruSeq). We also used data from 681 breast cancer[20] and 295 high-grade serous ovarian cancer (HGSOC)[21] cases from The Cancer Genome Atlas (TCGA) network. Germline genotypes in the TCGA data had been called from genotyping arrays (Affymetrix SNP 6.0), and gene expression was profiled using RNA-sequencing (Illumina HiSeq 2000). Imputation of TCGA germline genotypes using the 1000 Genomes version 5 reference panel was performed as described previously.[22,23] TCGA sample sizes reported here refer to only those samples that had >95% European ancestry. Ancestry was estimated using the Local Ancestry in adMixed Populations tool (LAMP version 2.5).[24] Downstream PrediXcan modeling (described below) used variants imputed with quality > 0.8 that had a minor allele frequency > 5% in TCGA datasets.

Genome-wide association datasets

Summary statistics from genome-wide association meta-analyses were obtained from the Breast Cancer Association Consortium (BCAC)[22] and the Ovarian Cancer Association Consortium (OCAC).[23] The breast cancer susceptibility data were based on 122,977 cases and 105,974 controls, including 21,468 estrogen receptor (ER)-negative cases. The ovarian cancer susceptibility data were based on 22,406 epithelial ovarian cancer cases and 40,941 controls, including 13,037 HGSOC cases. We harmonized the signs of the effect size estimates and aligned them to the same effect allele in the breast and ovarian cancer GWAS datasets. We retained 9,530,997 variants with minor allele frequency > 1% and imputation quality > 0.4 in both datasets for S-PrediXcan analyses. All individuals in these studies were of genetically inferred European ancestry.

PrediXcan model development and S-PrediXcan analyses

We built genetically regulated gene expression prediction models using the elastic net regularization approach implemented in PrediXcan and validated these models using tenfold cross-validation.[8] Essentially, this generates a list of variants for each gene where model construction is successful and each variant in the list is assigned a weight reflecting its influence on its target gene expression. Genes with models where the nested tenfold cross-validated correlation between predicted and actual levels of expression was >10% (predictive performance r2 > 0.01) and p value of the correlation test was <0.05 were retained. These models were adjusted for the latent determinants of gene expression variation (referred to hereafter as PEER factors), which were identified using the Probabilistic Estimation of Expression Residuals (PEER; version 1.3) method.[25] We adjusted for 60 and 45 PEER factors for TCGA breast and ovarian cancer data, respectively. The choice of these numbers is a function of sample size and consistent with recommendations.[8,25] ESR1 expression was also included as a covariate in the construction of breast cancer models to account for ER status and its influence on the expression of individual genes. For the GTEx version 7 datasets, we downloaded pre-computed PrediXcan models from predictdb.org. Our pipeline for processing the TCGA datasets, including the application of PEER factors, was designed to be consistent with the pipeline used to generate the pre-computed GTEx PrediXcan models. S-PrediXcan refers to the application of the PrediXcan gene expression models, specifically the variant weights from elastic net combined into multi-variant gene-level instruments, to summary statistics GWAS datasets and has been described in detail before.[8] The variance of a gene’s expression that was explained by the SNPs in its model was calculated as W′ × G × W (where W is the vector of SNP weights in a gene’s model, W′ is its transpose, and G is the covariance matrix).

Conditional and conjunction FDR analyses

We obtained p values for association of predicted expression of each gene with breast cancer risk and with ovarian cancer risk. We then computed the FDR for gene-breast cancer risk association conditional on gene-ovarian cancer risk association (as conditional FDRBreast Cancer∣Ovarian Cancer). This is the probability that a gene is not associated with breast cancer risk given the p values for association with both breast cancer risk and ovarian cancer risk. The analogous conditional FDR for gene-ovarian cancer risk association was also calculated (FDROvarian Cancer∣Breast Cancer). Finally, the conjunctional FDR estimate, which is conservatively defined as the maximum of the two conditional FDR values, was computed. This process minimizes the effect of a single phenotype (in this case, breast or ovarian cancer) driving the shared association signal. It allows the power of pleiotropic associations to be tapped for genetic discovery, unlike a traditional FDR approach that is informed solely by the distribution of p values for a single phenotype. We used the R implementation of the conditional FDR method. The conditional and conjunctional FDR method has been described extensively elsewhere[13-18] but not applied before to the TWAS setting. The overall study design is summarized in Figure 1.
Figure 1.

Overview of datasets and analyses in this study

Flowchart providing an overview of the datasets used and the various steps in the analysis. GTEx, Genotype-Tissue Expression project; TCGA, The Cancer Genome Atlas; GWAS, genome-wide association study; FDR, false discovery rate.

Fine-mapped candidate causal risk variant datasets

We examined the overlap between variants in the breast gene expression prediction models and a published list of fine-mapped candidate causal risk variants for breast cancer.[26] This was done to follow up genes that we identified in genomic regions that are known to be associated with breast cancer risk under the intuition that gene-level association signals identified by S-PrediXcan that demonstrate such overlap with fine-mapped variants are likely being driven by the GWAS association signal in the same region. Fine-mapped candidate causal risk variants lists for breast cancer were obtained from Fachal et al.[26] Briefly, Fachal et al. fine-mapped 150 known breast cancer susceptibility regions using dense genotype data on women participating in the BCAC and in the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA). Stepwise multinomial logistic regression was used to identify independent association signals in each region. Credible causal variants within each signal were defined as being within a 100-fold likelihood of the top conditional variant to delineate the variants driving the GWAS associations in each region. We adopted a similar analytic strategy for the ovarian cancer dataset from OCAC. Each genomic region with a genome-wide significant (p < 5 × 10−8) variant was explored to identify additional independent association signals. All variants within a given genomic region were jointly analyzed to evaluate the simultaneous effects of multiple variants, using a 1 Mb window centered on the most significant variant, in stepwise conditional models. Given the presence of a genome-wide significant variant in the region, the prior probability of an additional risk variant in the same region is higher than in a region without a genome-wide significant lead variant; therefore, we used a threshold of p < 1 × 10−5 to identify additional independent association signals. All variants in each region were ranked by the likelihood of association with ovarian cancer based on p values. The likelihood of each variant was then compared with the likelihood of the lead variant in the region based on the primary association analysis for primary signals and the conditional association analysis for conditional signals. Variants with odds > 1:100 compared with the lead variant (corresponding to a p value 100 times larger than the most significant p value[27]) were selected as credible causal variants.

Results

Development of tissue/tumor-specific gene expression prediction models

We built genetically regulated gene expression predictor models using matched germline genotype and tumor gene expression data from TCGA by applying elastic net regularization as implemented in the PrediXcan software. Genes with models where the nested tenfold cross-validated correlation between predicted and actual levels of expression was >10% (predictive performance r2 > 0.01) and p value of the correlation test was <0.05 were retained in line with best practice quality control recommendations by the developers of PrediXcan.[8] We constructed and evaluated predictor models that met these criteria for 4,457 genes based on 681 TCGA breast tumor samples and for 2,705 genes based on 295 TCGA ovarian tumor samples. We obtained pre-computed genetically regulated gene expression predictor models that met the same criteria (predictive performance r2 > 0.01;correlation test p < 0.05) in matched germline genotype and normal tissue gene expression data from the GTEx Project. Specifically, the pre-computed data included 5,274 genes modeled based on 211 GTEx breast tissue samples and 3,034 genes modeled based on 99 GTEx ovarian tissue samples. The variance of a gene’s expression explained by SNPs in its model was, on average, lower in tumors and higher in normal tissues (mean [standard deviation] for TCGA breast cancer: 0.04 [0.07];TCGA ovarian cancer: 0.05 [0.06)]; GTEx breast: 0.09 [0.09];and GTEx ovary: 0.15 [0.13]), likely reflecting the relatively smaller influence of germline genetic variation on tumor gene expression compared to its impact on normal tissue gene expression. Prediction performance as measured by the cross-validated correlation of the tissue model’s correlation to the gene’s measured transcriptome was, in general, substantially better for the normal tissue models than the tumor tissue models (Figure S1).

Imputation of gene expression into GWAS and pleiotropy-guided FDR control

We used the GTEx normal breast-tissue-derived prediction models to impute genetically regulated gene expression in a genome-wide association meta-analysis involving 122,977 breast cancer cases and 105,974 controls using S-PrediXcan. We tested for association between imputed gene expression and breast cancer risk. We also used the same GTEx breast-tissue-based models to impute gene expression in a genome-wide association meta-analysis of 22,406 ovarian cancer cases and 40,941 controls and test for association between imputed expression and ovarian cancer risk. For these two steps, we applied the conditional FDR method to the S-PrediXcan gene-level association p values to correct for testing 5,274 genes in each analysis. This yielded two conditional FDR values: one for association with breast cancer risk given association with ovarian cancer risk and the other for association with ovarian cancer risk given association with breast cancer risk. Finally, we took the larger of the two values for each gene as a conservative estimate of its conjunction FDR to identify candidate breast cancer susceptibility genes at conjunction FDR < 0.05. We refer to these genes as candidate breast cancer susceptibility genes because they were identified on the basis of gene expression predictor models derived from breast tissue. However, the conditional-conjunction FDR analysis effectively borrowed information from pleiotropic associations with inherited susceptibility to a second cancer type (in this case ovarian cancer) in addition to the primary cancer type (breast cancer), and these genes may be considered as risk genes for the second cancer as well. These steps were repeated for three other ordered combinations of datasets: TCGA breast tumor tissue-breast cancer GWAS-ovarian cancer GWAS to identify candidate breast cancer susceptibility genes; GTEx normal ovarian tissue-ovarian cancer GWAS-breast cancer GWAS and TCGA ovarian tumor tissue-ovarian cancer GWAS-breast cancer GWAS to identify candidate ovarian cancer susceptibility genes. We also replaced the overall breast cancer GWASs and all invasive ovarian cancer GWASs used in the four dataset combinations described above with ER-negative breast cancer GWASs (21,468 cases/105,974 controls) and HGSOC GWASs (13,037 cases/22,406 controls), respectively. This helped identify additional candidate breast and ovarian cancer susceptibility genes driven by subtype-specific associations at conjunction FDR < 0.05. For each gene, coverage was defined as the percentage of the number of variants included in its expression prediction model that were also captured in the genome-wide association meta-analysis. The coverage was ≥80% for at least 93% of the genes in each of the four matched germline genotype and normal or tumor gene expression datasets used to build the predictor models, indicating that for most genes, most of the corresponding model variants available were used. In each ordered analytic combination of datasets (e.g., GTEx normal breast tissue-breast cancer GWAS-ovarian cancer GWAS) we observed that, in general, for progressively smaller S-PrediXcan p values of the second cancer type, the true discovery rate for association with the primary cancer type approached 100% at progresssively larger S-PrediXcan p values for the primary cancer type (Figure 2; Figure S2). This was consistent with substantial shared gene-level associations for breast and ovarian cancer risk and these shared signals being tapped by the conditional-conjunction FDR method to power candidate susceptibility gene discovery.
Figure 2.

True discovery rate of S-PrediXcan associations for each cancer stratified by associations with the other cancer

True discovery rate against the negative logarithm (base 10) of the p value for each cancer for subsets of genes based on strength of association with the other cancer. The y axis of each plot is the true discovery rate, which is defined as 1 – conditional FDR (cFDR). For a given ordered analytic combination of datasets (e.g., GTEx normal breast tissue as transcriptome reference panel-breast cancer GWAS-ovarian cancer GWAS, plotted in the upper left corner) we observed that, in general, for progressively smaller S-PrediXcan p values of the second cancer type (indicated by the key “Threshold p” next to each plot), the true discovery rate (y axis) for association with the primary cancer type approached 100% at progressively larger S-PrediXcan p values for the primary cancer type (x axis; negative logarithm [base 10] of the p values). Only p values > 10−6 are plotted on the x axis. BC, overall breast cancer risk; OC, all invasive ovarian cancer risk.

Identification of candidate breast cancer and ovarian cancer susceptibility genes

We identified 14 candidate breast cancer susceptibility genes at the conjunction FDR < 0.05 threshold (Table 1; Table S1). The 14 genes were distributed between 11 genomic regions >1 Mb apart from each other (Table 1). These genes have not been reported as susceptibility genes in any prior TWAS of breast cancer risk and are >1 Mb away from published genome-wide significant lead variants for breast cancer susceptibility.[28] For ovarian cancer, we identified 8 candidate susceptibility genes at conjunction FDR < 0.05 (Table 2; Table S2). The 8 genes were located across 5 genomic regions >1 Mb apart from each other (Table 2). These genes have not been reported as candidate risk genes in any previously reported TWASs of ovarian cancer risk and are >1 Mb away from published genome-wide significant lead variants for ovarian cancer susceptibility.[23]
Table 1.

Candidate breast cancer susceptibility genes identified by pleiotropy-guided S-PrediXcan analysis

GeneGenomic regionp value OCp value BCConditional FDR OC∣BCConditional FDR BC∣OCConjunction FDR
Transcriptome reference panel: GTEx breast (normal)∣primary GWAS: overall BC risk (second GWAS: all invasive OC risk)
ZSCAN2915q15.31.8E–049.1E–044.1E–046.0E–036.0E–03
STRCP115q15.31.6E–031.8E–034.1E–031.9E–021.9E–02
AC011330.515q15.35.4E–042.5E–031.8E–032.2E–022.2E–02
STRC15q15.31.4E–043.8E–036.1E–042.2E–022.2E–02
ZNF27616q24.33.7E–064.7E–032.4E–052.2E–022.2E–02
RGS1920q13.331.1E–035.4E–034.3E–034.3E–024.3E–02
RNFT117q23.12.4E–048.7E–031.3E–034.7E–024.7E–02
C15orf6515q21.32.2E–035.9E–038.2E–034.7E–024.7E–02
Transcriptome reference panel: TCGA breast (tumor)∣primary GWAS: overall BC risk (second GWAS: all invasive OC risk)
GMNC3q282.6E–031.2E–036.1E–032.0E–022.0E–02
ESRP216q22.11.9E–029.6E–044.3E–023.3E–024.3E–02
BHLHA157q21.38.5E–057.0E–035.5E–044.9E–024.9E–02
SCGB1D211q12.33.5E–045.5E–032.0E–034.9E–024.9E–02
Transcriptome reference panel: TCGA breast (tumor)∣primary GWAS: ER-negative BC risk (second GWAS: HGSOC risk)
ETAA12p143.0E–031.5E–032.0E–022.0E–022.0E–02
ATP8B415q21.21.6E–032.2E–031.5E–022.4E–022.4E–02

Abbreviations: BC, breast cancer; OC, ovarian cancer; FDR, false discovery rate; ER, estrogen receptor; HGSOC, high-grade serous ovarian cancer.

Table 2.

Candidate ovarian cancer susceptibility genes identified by pleiotropy-guided S-PrediXcan analysis.

GeneGenomic regionp value OCp value BCConditional FDR OC∣BCConditional FDR BC∣OCConjunction FDR
Transcriptome reference panel: GTEx ovary (normal)∣primary GWAS: all invasive OC risk (second GWAS: overall BC risk)
STRCP115q15.37.2E–046.4E–053.1E–038.5E–053.1E–03
CPNE120q11.221.2E–037.2E–055.0E–039.9E–055.0E–03
AC011330.515q15.31.7E–032.6E–055.8E–034.5E–055.8E–03
CCNE119q121.9E–033.2E–031.4E–024.4E–031.4E–02
CATSPER2P115q15.34.8E–031.9E–041.8E–024.1E–041.8E–02
UQCC120q11.223.8E–032.5E–032.8E–024.7E–032.8E–02
Transcriptome reference panel: TCGA ovary (tumor)∣primary GWAS: all invasive OC risk (second GWAS: overall BC risk)
CPNE120q11.222.0E–039.0E–052.0E–024.9E–042.0E–02
Transcriptome reference panel: GTEx ovary (normal)∣primary GWAS: HGSOC risk (second GWAS: ER-negative BC risk)
CCNE119q121.7E–032.0E–045.9E–031.5E–035.9E–03
STRCP115q15.39.2E–033.2E–043.1E–023.9E–033.1E–02
HEATR316q12.14.3E–033.1E–024.6E–024.4E–024.6E–02
Transcriptome reference panel: TCGA ovary (tumor)∣primary GWAS: HGSOC risk (second GWAS: ER-negative BC risk)
THSD7A7p21.31.5E–031.2E–022.8E–024.3E–024.3E–02

Abbreviations: BC, breast cancer; OC, ovarian cancer; FDR, false discovery rate; ER, estrogen receptor.

Candidate breast cancer and ovarian cancer susceptibility genes at known GWAS loci

We identified 38 candidate breast cancer susceptibility genes that were located within 1 Mb of a published lead variant associated at genome-wide significance with breast cancer risk (Table S3).[28] Four of the 38 genes have also been reported in previously published TWASs (Table S3).[9,10] The 38 genes were spread across 12 genomic regions >1 Mb apart from each other. Overlaying fine-mapped candidate causal breast cancer risk variants on breast gene expression predictor model variants showed that for 21/38 (55%) genes, the prediction model variants included at least one fine-mapped candidate causal variant (Tables S3 and S4). This suggested that, for these genes, the GWAS association signal was driving the S-PrediXcan signal. We also identified three additional genes that were >1 Mb away from known GWAS loci that have previously been reported as TWAS loci for breast cancer risk (Table S3).[9,10] For ovarian cancer, we identified 17 candidate susceptibility genes that were located within 1 Mb of a published lead variant associated at genome-wide significance with ovarian cancer risk (Table S5).[23] Six of these genes have also been reported in a previously published TWAS for ovarian cancer (Table S5).[11,12] The 17 genes span 5 different genomic regions >1 Mb apart. Overlaying fine-mapped candidate causal ovarian cancer risk variants onto the ovarian gene expression predictor model variants showed that for 12/17 (71%) genes, the prediction model variants included at least one fine-mapped candidate causal variant (Tables S5 and S6), suggesting that for these genes the GWAS association signal underpinned the S-PrediXcan signal.

Discussion

In this study, we used the conditional and conjunctional FDR as a tool to systematically improve the power of breast cancer and ovarian cancer candidate susceptibility gene discovery in a PrediXcan-based TWAS. While gene expression prediction models based on multiple tissue types have been the more common approach to improving TWAS power,[11,29] the conditional/conjunction FDR approach gains power through the incorporation of multiple related GWAS datasets into a TWAS analysis. We investigated the shared inherited genetic basis of these two cancer types by integrating normal and tumor-tissue-specific transcriptomic datasets with large-scale genome-wide association meta-analysis findings for susceptibility to breast cancer and ovarian cancer. We identified 11 genomic regions associated with breast cancer risk and five regions linked to ovarian cancer risk. We identified 14 candidate breast cancer susceptibility genes (Table 1). Many of these genes have a strong biological rationale for involvement in breast carcinogenesis and are in or near genomic regions associated with other cancer types or potential cancer risk factors. For example, the ZNF276 (MIM: 608460) intronic variant rs12925026 is associated at genome-wide significance with non-melanoma skin cancer.[30] ZNF276 overlaps FANCA (MIM: 607139) in a tail-to-tail manner.[31] The genetically regulated predictor model for ZNF276 expression was fit using gene expression measured in GTEx breast tissues, but neither this dataset nor any of the other datasets could capture a predictor model for FANCA expression. FANCA encodes one of eight subunits that together form the core Fanconi Anemia (FA) complex that repairs blockages in DNA replication due to cross-linking.[32] Several members of the FA family of proteins have been implicated in breast and ovarian cancer predisposition, including BRCA1 (FANCS), BRCA2 (FANCD1), BRIP1 (MIM: 605882) (FANCJ), PALB2 (MIM: 610355) (FANCN), RAD51C (MIM: 602774) (FANCO), and FANCM (MIM: 609644), and it is possible that FANCA may represent another or possibly the true target breast cancer susceptibility gene in this region, given this biological function and its overlap with ZNF276.[32,33] ZNF276 in its own right has also been implicated as a candidate tumor suppressor gene in breast cancer,[31] and consistent with this potential tumor suppressor function we observed that lower ZNF276 expression was associated with increased breast cancer risk. Other candidate breast cancer susceptibility genes we identified include ESRP2 (MIM: 612960), which encodes an epithelial cell-specific regulator of splicing of the breast cancer susceptibility gene FGFR2 (MIM: 176943)[34,35] and SCGB1D2 (MIM: 615061), which encodes lipophilin B, which is known to be expressed in both breast and ovarian tumors.[36] Lipophilin B is tightly co-expressed with and forms a covalent complex with Mammaglobin A encoded by SCGB2A2, the gene next to SCGB1D2.[36] Mammaglobin A may be used to detect disseminated or circulating tumor cells and is under investigation as a potential immunotherapeutic target in breast cancer.[37] However, we were unable to develop gene expression prediction models for SCGB2A2 in breast normal or tumor tissues. BHLHA15 (MIM: 608606) encodes an estrogen-regulated transcription factor that is required to maintain mammary gland differentiation in mice,[38] and we found that decreased BHLHA15 expression was associated with greater susceptibility to breast cancer. ETAA1 (MIM: 613196) harbors lead variants associated at genome-wide significance with pancreatic cancer[39] and the hormone-related traits of age at menopause[40] and male-pattern baldness.[41] It encodes an activator of ATR kinase that accumulates at DNA damage sites and promotes replication fork progression and integrity.[42] Breast cancer is closely linked to DNA damage repair defects, and, in the presence of DNA damage, further loss of ETAA1 has been shown to be synthetically lethal for the cell, suggesting that ETAA1 expression may be essential for tumorigenesis on a background of DNA damage.[43] In keeping with this observation, we noted that elevated ETAA1 expression was associated with increased breast cancer risk. While our pleiotropy-guided transcriptome imputation study was ongoing, a genome-wide association meta-analysis for breast cancer risk that was performed in parallel identified lead variants rs79518236 (184 kb from BHLHA15) and rs9712235 (244 kb from ETAA1) at genome-wide significance only on addition of 10,407 breast cancer cases and 7,815 controls to the Michailidou et al.[44] dataset used here. There were no known GWAS associations for breast cancer risk in these regions until the larger GWAS meta-analysis, and our concomitant identification of the same regions using gene expression imputation into a smaller GWAS underscores the power of leveraging expression data to bolster genetic discovery. We identified 11 candidate ovarian cancer susceptibility genes (Table 2). As with breast cancer, there is strong support for a role of several genes in ovarian cancer pathogenesis, and many of these genes are in regions of the genome that harbor pleiotropic associations with other cancer types. Variants immediately upstream of CCNE1 (MIM: 123837) are associated at genome-wide significance with bladder cancer risk.[45] CCNE1 amplification is believed to be an early event in the development of ovarian cancer[46] and is a frequent somatic event in HGSOCs that do not carry homologous recombination DNA repair pathway defects.[47] CCNE1 amplification is also associated with poor prognosis in triple-negative breast tumors,[48] and it is worth noting that we observed the stronger conjunction FDR association signal for CCNE1 in the pleiotropy-informed analysis that was based on the HGSOC and ER-negative breast cancer susceptibility GWAS datasets (Table 2). However, we noted that increased CCNE1 expression was associated with decreased HGSOC (and ER-negative breast cancer) risk. This paradoxical direction of risk effect may be explained by the fact that CCNE1 amplification is less common and the loss of homologous recombination (HR) pathway function is far more common in ovarian cancer, and, in the absence of a functional HR pathway, CCNE1 is known to be essential for the developing tumor cell to survive.[49] This study suggests a role for CCNE1 in conferring ovarian cancer risk. Intronic variants in HEATR3 (MIM: 614951) are associated at genome-wide significance with glioma in European ancestry individuals[50] and with squamous cell esophageal carcinoma in East Asian ancestry individuals.[51] HEATR3 was also identified by a TWAS of glioma susceptibility.[52] Intronic variants in THSD7A (MIM: 612249) are associated with epithelial ovarian cancer risk in East Asians,[53] albeit not at genome-wide significance (lead variant rs10260419 p = 1 × 10−7). Gene expression prediction models derived from breast and ovarian tissues both implicated the 15q15.3 region as a breast and ovarian cancer susceptibility region on imputation with these models into the breast and ovarian cancer GWAS data. Our analysis suggested several genes in this region (Tables 1 and 2), with the pseudogene STRCP1 as the only common gene across breast and ovarian tissues. STRCP1 overlaps the protein coding gene STRC (MIM: 606440), also identified in the breast-tissue-based analysis (Table 1), and variants in STRC have previously been associated with lung cancer risk (lung cancer lead variant rs35028925 p = 2 × 10−6).[54] In this analysis, we chose to label the identified genes as candidate breast cancer susceptibility genes if they were identified on integrating the GTEx or TCGA breast expression prediction models with the breast cancer GWASs and incorporating pleiotropic information from the ovarian cancer GWASs and vice versa for candidate ovarian cancer susceptibility genes. However, application of the conjunction FDR over and above the conditional FDR in principle identified genes associated with both cancer types by tapping into GWAS data from both cancers. Therefore, in a sense, all these genes may well be regarded as candidate breast and ovarian cancer susceptibility genes. Moreover, in our pleiotropy-guided study design, the ovarian cancer dataset, in a sense, served as a replication dataset for the breast cancer findings and vice versa, which was particularly important given the lack of adequately powered and truly independent breast and ovarian cancer datasets outside of the datasets used in this study.[55] We identified 38 candidate breast cancer susceptibility genes and 17 candidate ovarian cancer susceptibility genes in regions previously implicated by GWASs for breast cancer and ovarian cancer, respectively (Tables S3 and S5). The identification of a large number of genes in these regions is unsurprising, given that GWAS associations are the key determinant of the S-PrediXcan signal. However, we were able to take advantage of fine-scale mapping data generated by the Breast and Ovarian Cancer Association Consortia to separately pinpoint those genes where a fine-mapped candidate causal GWAS risk variant was incorporated in the PrediXcan model, suggesting that it drives the gene-based association. Overall, we found this to be the case for 60% of the candidate susceptibility genes identified by PrediXcan in the breast and ovarian cancer susceptibility regions identified by GWASs. Comprehensive functional follow-up of the 19p13.11 breast and ovarian cancer GWAS region suggests that ABHD8 and ANKLE1 are the most likely targets in this region.[5] While there was no overlap between S-PrediXcan model variants for ABHD8 and ANKLE1 and fine-mapped risk variants in this region, S-PrediXcan did detect both genes as candidate causal susceptibility genes, with ANKLE1 being the only gene that made the cut in both breast and ovarian tissues, suggesting that S-PrediXcan applied to pleiotropic gene-dense regions such as 19p13.11 does help short-list the key targets even in the absence of overlap with fine-mapped variants. A total of 21/38 breast and 13/17 ovarian cancer candidate susceptibility genes in the published GWAS regions were clustered at 17q21.31, reflecting the unique long-distance linkage disequilibrium structure of this region.[56] This phenomenon has also led to clustering of associations at 17q21.31 in previous TWASs of breast or ovarian cancer risk.[9,11] Gene expression prediction models in this study were built using genomic data from women with genetically inferred European ancestry. The predictive performance of these models in a non-European ancestry cohort was not evaluated. Thus, a key limitation of this study is the potential lack of generalizability of these models to non-European ancestry cohorts. Recent analyses suggest that default TWAS models trained in large datasets such as GTEx suffer from a significant reduction in prediction accuracy, particularly in individuals of African ancestry, when compared to those of European ancestry.[57] There is an urgent and compelling need for trans-ancestry datasets that drive TWAS in diverse ancestral cohorts. In conclusion, the powerful combination of pleiotropic breast and ovarian cancer GWAS datasets with transcriptome imputation from normal and tumor breast and ovarian tissues identified a total of 16 genomic loci (22 genes) associated with breast and ovarian cancer risks. Fine-mapping in larger GWAS datasets and deeper laboratory-based functional follow-up studies of these loci and candidate genes have the potential to provide fresh insights into the common biological underpinnings of breast and ovarian cancer.

Data and code availability

All datasets analyzed in this study are publicly available: Genome-wide summary genetic association statistics from BCAC are available at: http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/oncoarray-and-combined-summary-result/gwas-summary-results-breast-cancer-risk-2017/. Genome-wide summary genetic association statistics from OCAC are available at: https://www.ebi.ac.uk/gwas/downloads/summary-statistics (please search the GWAS catalog at the link above using the study accession numbers GCST004415 for the overall ovarian cancer and GCST004480 for the HGSOC datasets). PrediXcan prediction models trained on the GTEx version 7 data (including breast and ovarian tissues) are available here: https://zenodo.org/record/3572799. PrediXcan prediction models trained on the TCGA data (breast and ovarian tumors) are available here: https://zenodo.org/record/3818295. Code, including a tutorial, for running S-PrediXcan is available here: https://github.com/hakyimlab/MetaXcan. The data used for the analyses described in this manuscript can be obtained from dbGaP via accession number phs000424.
  57 in total

Review 1.  Susceptibility pathways in Fanconi's anemia and breast cancer.

Authors:  Alan D D'Andrea
Journal:  N Engl J Med       Date:  2010-05-20       Impact factor: 91.245

2.  Association Between Genetic Traits for Immune-Mediated Diseases and Alzheimer Disease.

Authors:  Jennifer S Yokoyama; Yunpeng Wang; Andrew J Schork; Wesley K Thompson; Celeste M Karch; Carlos Cruchaga; Linda K McEvoy; Aree Witoelar; Chi-Hua Chen; Dominic Holland; James B Brewer; Andre Franke; William P Dillon; David M Wilson; Pratik Mukherjee; Christopher P Hess; Zachary Miller; Luke W Bonham; Jeffrey Shen; Gil D Rabinovici; Howard J Rosen; Bruce L Miller; Bradley T Hyman; Gerard D Schellenberg; Tom H Karlsen; Ole A Andreassen; Anders M Dale; Rahul S Desikan
Journal:  JAMA Neurol       Date:  2016-06-01       Impact factor: 18.302

3.  Genome-wide Pleiotropy Between Parkinson Disease and Autoimmune Diseases.

Authors:  Aree Witoelar; Iris E Jansen; Yunpeng Wang; Rahul S Desikan; J Raphael Gibbs; Cornelis Blauwendraat; Wesley K Thompson; Dena G Hernandez; Srdjan Djurovic; Andrew J Schork; Francesco Bettella; David Ellinghaus; Andre Franke; Benedicte A Lie; Linda K McEvoy; Tom H Karlsen; Suzanne Lesage; Huw R Morris; Alexis Brice; Nicholas W Wood; Peter Heutink; John Hardy; Andrew B Singleton; Anders M Dale; Thomas Gasser; Ole A Andreassen; Manu Sharma
Journal:  JAMA Neurol       Date:  2017-07-01       Impact factor: 18.302

Review 4.  The genesis and evolution of high-grade serous ovarian cancer.

Authors:  David D L Bowtell
Journal:  Nat Rev Cancer       Date:  2010-10-14       Impact factor: 60.716

Review 5.  Common Genetic Variation and Breast Cancer Risk-Past, Present, and Future.

Authors:  Jenna Lilyquist; Kathryn J Ruddy; Celine M Vachon; Fergus J Couch
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2018-01-30       Impact factor: 4.254

6.  Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors.

Authors:  Ole A Andreassen; Srdjan Djurovic; Wesley K Thompson; Andrew J Schork; Kenneth S Kendler; Michael C O'Donovan; Dan Rujescu; Thomas Werge; Martijn van de Bunt; Andrew P Morris; Mark I McCarthy; J Cooper Roddey; Linda K McEvoy; Rahul S Desikan; Anders M Dale
Journal:  Am J Hum Genet       Date:  2013-01-31       Impact factor: 11.025

7.  A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci.

Authors:  Nathaniel Rothman; Montserrat Garcia-Closas; Nilanjan Chatterjee; Nuria Malats; Xifeng Wu; Jonine D Figueroa; Francisco X Real; David Van Den Berg; Giuseppe Matullo; Dalsu Baris; Michael Thun; Lambertus A Kiemeney; Paolo Vineis; Immaculata De Vivo; Demetrius Albanes; Mark P Purdue; Thorunn Rafnar; Michelle A T Hildebrandt; Anne E Kiltie; Olivier Cussenot; Klaus Golka; Rajiv Kumar; Jack A Taylor; Jose I Mayordomo; Kevin B Jacobs; Manolis Kogevinas; Amy Hutchinson; Zhaoming Wang; Yi-Ping Fu; Ludmila Prokunina-Olsson; Laurie Burdett; Meredith Yeager; William Wheeler; Adonina Tardón; Consol Serra; Alfredo Carrato; Reina García-Closas; Josep Lloreta; Alison Johnson; Molly Schwenn; Margaret R Karagas; Alan Schned; Gerald Andriole; Robert Grubb; Amanda Black; Eric J Jacobs; W Ryan Diver; Susan M Gapstur; Stephanie J Weinstein; Jarmo Virtamo; Victoria K Cortessis; Manuela Gago-Dominguez; Malcolm C Pike; Mariana C Stern; Jian-Min Yuan; David J Hunter; Monica McGrath; Colin P Dinney; Bogdan Czerniak; Meng Chen; Hushan Yang; Sita H Vermeulen; Katja K Aben; J Alfred Witjes; Remco R Makkinje; Patrick Sulem; Soren Besenbacher; Kari Stefansson; Elio Riboli; Paul Brennan; Salvatore Panico; Carmen Navarro; Naomi E Allen; H Bas Bueno-de-Mesquita; Dimitrios Trichopoulos; Neil Caporaso; Maria Teresa Landi; Federico Canzian; Borje Ljungberg; Anne Tjonneland; Francoise Clavel-Chapelon; David T Bishop; Mark T W Teo; Margaret A Knowles; Simonetta Guarrera; Silvia Polidoro; Fulvio Ricceri; Carlotta Sacerdote; Alessandra Allione; Geraldine Cancel-Tassin; Silvia Selinski; Jan G Hengstler; Holger Dietrich; Tony Fletcher; Peter Rudnai; Eugen Gurzau; Kvetoslava Koppova; Sophia C E Bolick; Ashley Godfrey; Zongli Xu; José I Sanz-Velez; María D García-Prats; Manuel Sanchez; Gabriel Valdivia; Stefano Porru; Simone Benhamou; Robert N Hoover; Joseph F Fraumeni; Debra T Silverman; Stephen J Chanock
Journal:  Nat Genet       Date:  2010-10-24       Impact factor: 38.330

8.  Dissection of genetic variation and evidence for pleiotropy in male pattern baldness.

Authors:  Chloe X Yap; Julia Sidorenko; Yang Wu; Kathryn E Kemper; Jian Yang; Naomi R Wray; Matthew R Robinson; Peter M Visscher
Journal:  Nat Commun       Date:  2018-12-20       Impact factor: 14.919

9.  Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses.

Authors:  Haoyu Zhang; Thomas U Ahearn; Julie Lecarpentier; Daniel Barnes; Jonathan Beesley; Guanghao Qi; Xia Jiang; Tracy A O'Mara; Ni Zhao; Manjeet K Bolla; Alison M Dunning; Joe Dennis; Qin Wang; Zumuruda Abu Ful; Kristiina Aittomäki; Irene L Andrulis; Hoda Anton-Culver; Volker Arndt; Kristan J Aronson; Banu K Arun; Paul L Auer; Jacopo Azzollini; Daniel Barrowdale; Heiko Becher; Matthias W Beckmann; Sabine Behrens; Javier Benitez; Marina Bermisheva; Katarzyna Bialkowska; Ana Blanco; Carl Blomqvist; Natalia V Bogdanova; Stig E Bojesen; Bernardo Bonanni; Davide Bondavalli; Ake Borg; Hiltrud Brauch; Hermann Brenner; Ignacio Briceno; Annegien Broeks; Sara Y Brucker; Thomas Brüning; Barbara Burwinkel; Saundra S Buys; Helen Byers; Trinidad Caldés; Maria A Caligo; Mariarosaria Calvello; Daniele Campa; Jose E Castelao; Jenny Chang-Claude; Stephen J Chanock; Melissa Christiaens; Hans Christiansen; Wendy K Chung; Kathleen B M Claes; Christine L Clarke; Sten Cornelissen; Fergus J Couch; Angela Cox; Simon S Cross; Kamila Czene; Mary B Daly; Peter Devilee; Orland Diez; Susan M Domchek; Thilo Dörk; Miriam Dwek; Diana M Eccles; Arif B Ekici; D Gareth Evans; Peter A Fasching; Jonine Figueroa; Lenka Foretova; Florentia Fostira; Eitan Friedman; Debra Frost; Manuela Gago-Dominguez; Susan M Gapstur; Judy Garber; José A García-Sáenz; Mia M Gaudet; Simon A Gayther; Graham G Giles; Andrew K Godwin; Mark S Goldberg; David E Goldgar; Anna González-Neira; Mark H Greene; Jacek Gronwald; Pascal Guénel; Lothar Häberle; Eric Hahnen; Christopher A Haiman; Christopher R Hake; Per Hall; Ute Hamann; Elaine F Harkness; Bernadette A M Heemskerk-Gerritsen; Peter Hillemanns; Frans B L Hogervorst; Bernd Holleczek; Antoinette Hollestelle; Maartje J Hooning; Robert N Hoover; John L Hopper; Anthony Howell; Hanna Huebner; Peter J Hulick; Evgeny N Imyanitov; Claudine Isaacs; Louise Izatt; Agnes Jager; Milena Jakimovska; Anna Jakubowska; Paul James; Ramunas Janavicius; Wolfgang Janni; Esther M John; Michael E Jones; Audrey Jung; Rudolf Kaaks; Pooja Middha Kapoor; Beth Y Karlan; Renske Keeman; Sofia Khan; Elza Khusnutdinova; Cari M Kitahara; Yon-Dschun Ko; Irene Konstantopoulou; Linetta B Koppert; Stella Koutros; Vessela N Kristensen; Anne-Vibeke Laenkholm; Diether Lambrechts; Susanna C Larsson; Pierre Laurent-Puig; Conxi Lazaro; Emilija Lazarova; Flavio Lejbkowicz; Goska Leslie; Fabienne Lesueur; Annika Lindblom; Jolanta Lissowska; Wing-Yee Lo; Jennifer T Loud; Jan Lubinski; Alicja Lukomska; Robert J MacInnis; Arto Mannermaa; Mehdi Manoochehri; Siranoush Manoukian; Sara Margolin; Maria Elena Martinez; Laura Matricardi; Lesley McGuffog; Catriona McLean; Noura Mebirouk; Alfons Meindl; Usha Menon; Austin Miller; Elvira Mingazheva; Marco Montagna; Anna Marie Mulligan; Claire Mulot; Taru A Muranen; Katherine L Nathanson; Susan L Neuhausen; Heli Nevanlinna; Patrick Neven; William G Newman; Finn C Nielsen; Liene Nikitina-Zake; Jesse Nodora; Kenneth Offit; Edith Olah; Olufunmilayo I Olopade; Håkan Olsson; Nick Orr; Laura Papi; Janos Papp; Tjoung-Won Park-Simon; Michael T Parsons; Bernard Peissel; Ana Peixoto; Beth Peshkin; Paolo Peterlongo; Julian Peto; Kelly-Anne Phillips; Marion Piedmonte; Dijana Plaseska-Karanfilska; Karolina Prajzendanc; Ross Prentice; Darya Prokofyeva; Brigitte Rack; Paolo Radice; Susan J Ramus; Johanna Rantala; Muhammad U Rashid; Gad Rennert; Hedy S Rennert; Harvey A Risch; Atocha Romero; Matti A Rookus; Matthias Rübner; Thomas Rüdiger; Emmanouil Saloustros; Sarah Sampson; Dale P Sandler; Elinor J Sawyer; Maren T Scheuner; Rita K Schmutzler; Andreas Schneeweiss; Minouk J Schoemaker; Ben Schöttker; Peter Schürmann; Leigha Senter; Priyanka Sharma; Mark E Sherman; Xiao-Ou Shu; Christian F Singer; Snezhana Smichkoska; Penny Soucy; Melissa C Southey; John J Spinelli; Jennifer Stone; Dominique Stoppa-Lyonnet; Anthony J Swerdlow; Csilla I Szabo; Rulla M Tamimi; William J Tapper; Jack A Taylor; Manuel R Teixeira; MaryBeth Terry; Mads Thomassen; Darcy L Thull; Marc Tischkowitz; Amanda E Toland; Rob A E M Tollenaar; Ian Tomlinson; Diana Torres; Melissa A Troester; Thérèse Truong; Nadine Tung; Michael Untch; Celine M Vachon; Ans M W van den Ouweland; Lizet E van der Kolk; Elke M van Veen; Elizabeth J vanRensburg; Ana Vega; Barbara Wappenschmidt; Clarice R Weinberg; Jeffrey N Weitzel; Hans Wildiers; Robert Winqvist; Alicja Wolk; Xiaohong R Yang; Drakoulis Yannoukakos; Wei Zheng; Kristin K Zorn; Roger L Milne; Peter Kraft; Jacques Simard; Paul D P Pharoah; Kyriaki Michailidou; Antonis C Antoniou; Marjanka K Schmidt; Georgia Chenevix-Trench; Douglas F Easton; Nilanjan Chatterjee; Montserrat García-Closas
Journal:  Nat Genet       Date:  2020-05-18       Impact factor: 38.330

10.  A gene-based association method for mapping traits using reference transcriptome data.

Authors:  Eric R Gamazon; Heather E Wheeler; Kaanan P Shah; Sahar V Mozaffari; Keston Aquino-Michaels; Robert J Carroll; Anne E Eyler; Joshua C Denny; Dan L Nicolae; Nancy J Cox; Hae Kyung Im
Journal:  Nat Genet       Date:  2015-08-10       Impact factor: 38.330

View more
  3 in total

1.  chromMAGMA: regulatory element-centric interrogation of risk variants.

Authors:  Robbin Nameki; Anamay Shetty; Eileen Dareng; Jonathan Tyrer; Xianzhi Lin; Paul Pharoah; Rosario I Corona; Siddhartha Kar; Kate Lawrenson
Journal:  Life Sci Alliance       Date:  2022-07-01

2.  Identification of a novel gene signature predicting response to first-line chemotherapy in BRCA wild-type high-grade serous ovarian cancer patients.

Authors:  Marianna Buttarelli; Alessandra Ciucci; Fernando Palluzzi; Giuseppina Raspaglio; Claudia Marchetti; Emanuele Perrone; Angelo Minucci; Luciano Giacò; Anna Fagotti; Giovanni Scambia; Daniela Gallo
Journal:  J Exp Clin Cancer Res       Date:  2022-02-04

3.  TIGAR-V2: Efficient TWAS tool with nonparametric Bayesian eQTL weights of 49 tissue types from GTEx V8.

Authors:  Randy L Parrish; Greg C Gibson; Michael P Epstein; Jingjing Yang
Journal:  HGG Adv       Date:  2021-11-04
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.