Literature DB >> 24886783

Systematic pathway enrichment analysis of a genome-wide association study on breast cancer survival reveals an influence of genes involved in cell adhesion and calcium signaling on the patients' clinical outcome.

Andrea Woltmann1, Bowang Chen1, Jesús Lascorz1, Robert Johansson2, Jorunn E Eyfjörd3, Ute Hamann4, Jonas Manjer5, Kerstin Enquist-Olsson6, Roger Henriksson7, Stefan Herms8, Per Hoffmann8, Kari Hemminki9, Per Lenner2, Asta Försti9.   

Abstract

Genome-wide association studies (GWASs) may help to understand the effects of genetic polymorphisms on breast cancer (BC) progression and survival. However, they give only a focused view, which cannot capture the tremendous complexity of this disease. Therefore, we investigated data from a previously conducted GWAS on BC survival for enriched pathways by different enrichment analysis tools using the two main annotation databases Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). The goal was to identify the functional categories (GO terms and KEGG pathways) that are consistently overrepresented in a statistically significant way in the list of genes generated from the single nucleotide polymorphism (SNP) data. The SNPs with allelic p-value cut-offs 0.005 and 0.01 were annotated to the genes by excluding or including a 20 kb up-and down-stream sequence of the genes and analyzed by six different tools. We identified eleven consistently enriched categories, the most significant ones relating to cell adhesion and calcium ion binding. Moreover, we investigated the similarity between our GWAS and the enrichment analyses of twelve published gene expression signatures for breast cancer prognosis. Five of them were commonly used and commercially available, five were based on different aspects of metastasis formation and two were developed from meta-analyses of published prognostic signatures. This comparison revealed similarities between our GWAS data and the general and the specific brain metastasis gene signatures as well as the Oncotype DX signature. As metastasis formation is a strong indicator of a patient's prognosis, this result reflects the survival aspect of the conducted GWAS and supports cell adhesion and calcium signaling as important pathways in cancer progression.

Entities:  

Mesh:

Year:  2014        PMID: 24886783      PMCID: PMC4041745          DOI: 10.1371/journal.pone.0098229

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Worldwide, breast cancer (BC) is the most common cancer among women, comprising 23% of all female cancer. Each year, about 1.4 million new cases are diagnosed and about 460,000 women die of this disease [1]. It has been shown that survival of BC is in part heritable which can possibly be explained by yet unknown genetic factors [2]. Further knowledge about the effects of inherited genetic variation on BC survival can help to predict the patient’s individual risk for disease progression and survival probabilities and to develop new and better therapies and prevention strategies. A genome-wide association study (GWAS) is a powerful tool to search for a genetic influence on complex traits. Within the last six years 34 GWASs on breast cancer have been performed identifying 194 new susceptibility loci (http://www.genome.gov/gwastudies). Also three GWASs on breast cancer survival have been conducted leading only to three prognostic loci [3]–[5]. Therefore, a more global view on GWAS data can reveal new insights in cancer formation and progression and give new clues for further investigations. A good tool to set high-throughput data into a global context is a pathway enrichment analysis [6]. The gene-group-based approach increases the likelihood to identify the biological processes which are overrepresented in the high-throughput data and have a high impact on the studied disease. The most commonly used gene annotation databases are Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), in which the biological knowledge about genes and their associated processes and pathways are collected. This knowledge can be used by pathway enrichment tools, which map the genes of the investigated list to the associated biological annotation terms of the database. Then, the customized enrichment result is compared to the control background and an enrichment p-value is calculated and corrected for multiple testing. Currently, a huge variety of different pathway enrichment tools are available. Some of the tools input lists of genes or proteins and output enriched pathways. Others take the locations of single nucleotide polymorphisms (SNPs) into consideration, and thus gene lists can be derived from GWAS data. The aim of our study was to submit a GWAS on BC survival to a pathway enrichment analysis. In the GWAS, the genotype data of women of Western European origin with long and short time survival after the diagnosis of BC were compared. Pathway enrichment analysis was conducted using six different enrichment tools on four final gene lists. The gene lists based on our SNP data with allelic p-value cut-offs 0.005 and 0.01 and with a gene annotation by excluding or including a 20 kb up- and down-stream sequence of the gene. Only those categories which were enriched in all four lists and more than one tool were considered to be consistently enriched. We were also interested whether our results are supported by gene signatures on breast cancer prognosis derived from gene expression profiling studies. Therefore, we performed pathway enrichment analyses with several commonly used prognostic gene signatures and compared the results with our GWAS data.

Materials and Methods

Ethics Statement

All participants in the GWAS gave written informed consent to the use of their samples for research purpose. The study was approved by the ethical committee of each participating institute.

GWAS

The GWAS on BC survival was a population based case-only study, in which the BC patients were divided in two groups based on their survival time. We considered as cases 369 women with short-time survival (less than 6 years after breast cancer diagnosis) which were compared with a group consisting of 369 women with long-time survival (≥11 years after breast cancer diagnosis) as controls. Details of the characteristics of the study population are found in the table S1. The cases and controls were selected from four cohorts and matched for age at diagnosis (<40, 40–49, 50–59 and ≥60 years), gender, diagnosis period (1985–1989, 1990–1994 and 1995–) and cohort (table S2). Blood samples were prospectively collected in each cohort. The cases and controls were identified from the cohorts by record linkage to the regional cancer registries. Follow-up was performed until December, 31st, 2007 and the data were available for every patient. The Västerbotten intervention project (VIP), the mammary screening project (MSP) and the Department of Oncology, Norrlands University Hospital, Umeå, Sweden, contributed with 96 cases and 96 controls [7]. Within VIP, blood samples have been collected since 1985, within MSP since 1995, with subsequent BC diagnosis during the years 1988–2005. Norrlands University Hospital collects blood samples consecutively since 1990 from newly diagnosed BC patients and 43 BC patients, not included in VIP or MSP, were included in the study. The Malmö Diet and Cancer Study, Malmö, Sweden contributed 44 cases and 44 controls [8], [9]. Blood samples were collected between 1991 and 1996, prior to BC diagnosis between 1991 and 2005. The third sample set comprised 82 cases and 14 controls from the Städtisches Klinikum Karlsruhe and Deutsches Krebsforschungszentrum Breast Cancer Study (SKKDKFZS) and 68 controls from the Umeå cohort. The SKKDKFZS consists of women between 21–93 years of age at diagnosis with pathologically confirmed breast cancer recruited at the Städtisches Klinikum Karlruhe, Karlsruhe, Germany from 1993–2005 and a blood sample collected at the time of diagnosis. The Icelandic Cancer Society and University of Iceland Biobank contributed with 147 cases and 147 controls with BC diagnosis during the years 1983–2004 [10]. A genome-wide scan of ∼ 300,000 tagging SNPs was conducted using the Illumina HumanCytoSNP-12 v1 according to the manufacturer’s protocols. Before analysis, markers with one or more of the following criteria were excluded: <90% genotype call rate, minor allele frequency <5% or Hardy–Weinberg equilibrium exact p-value <10−5. Genotype calling was done using Illumina GenomeStudio 2010. The GWAS was conducted by PLINK v1.06, with the option of “model” to perform a Cochran-Armitage and a full-model case-control association test.

Enrichment Analysis

The GWAS data were investigated for SNPs which were annotated to a gene and located within the 5′UTR, 3′UTR, introns and exons of the gene, alternatively within a genomic region including up to 20 kb up- and downstream of a gene locus. Different allelic p-value cut-offs (0.05, 0.01, 0.005, 0.001 and 0.0001) were set to generate gene lists for both scenarios. If there was more than one SNP per gene meeting the selection criteria, the SNP with the lowest p-value was taken into account. Finally, four gene lists, two per scenario, with the allelic p-value cut offs of 0.01 and 0.005 were selected as input for six enrichment analysis tools (ConsensusPathDB, DAVID, FatiGO, GATHER, GeneCodis and WebGestalt) using the two main annotation databases Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) as basis. These pathway enrichment tools were selected based on our previous experience on pathway enrichment analyses [11]. The selection criteria included free availability, user-friendly handling, the usage of gene names as input and the GO and the KEGG database as basis, variations in stringency, test statistics and multiple comparison adjustment methods. The four gene lists were selected because they provided the enrichment tools with an applicable number of genes to run the enrichment analyses successfully. With a too stringent allelic SNP p-value cut-off, too few genes serve as input resulting in no significantly enriched categories. A too tolerant allelic p-value cut-off increases the background noise and may result either in too many unspecific or no enriched categories [12], [13]. For all tools the same conditions were applied, which were a significance threshold of 0.05 for the adjusted enrichment p-value, at least two genes from the input list in the enriched category and the whole genome as reference background. The goal was to identify the functional categories (GO terms and KEGG pathways) that are consistently overrepresented in a statistically significant way in the list of SNPs inferred from the GWAS on BC prognosis. The used tools and their characteristics can be seen in table S3.

Consistently Enriched Categories

For DAVID, FatiGO and GATHER, the tool’s default p-value cut-off of 0.05 generated a list of 20–30 enriched categories for the comparison. However, for Consensus PathDB, GeneCodis and WebGestalt, this p-value cut-off generated lists of up to 176, 278 and 77 enriched GO terms and we used the more stringent enrichment p-value cut-offs of 1×10−6, 1×10−4 and 0.01, respectively. The enriched categories of each allelic p-value cut-off gene list based on the SNPs within a gene region were compared to those of the gene list taking also the SNPs within the ±20 kb spanning region into account (0.01 list vs. 0.01±20 kb list; 0.005 list vs. 0.005±20 kb list). This was done for every tool separately. Then, the overlaps of the two different allelic p-value cut-offs were compared to each other. Finally, we compared the results of all tools to each other. Only categories enriched in all four gene lists and by more than one tool were considered consistently enriched (figure 1).
Figure 1

Flow chart of the pathway enrichment analysis of the GWAS on BC survival.

Prognostic Gene Expression Signatures

Literature was searched for commonly used prognostic gene signatures derived from breast cancer expression data. Twelve gene expression signatures were selected for further pathway enrichment analysis conducted by MetaCore GeneGo pathway enrichment analysis (false discovery rate (FDR) cut-off 0.05) because this tool enables the pathway enrichment analysis of two gene lists simultaneously and compares the results to each other. Also the 0.01 gene list of our GWAS was analyzed again with this tool to make the results comparable to the ones of the gene expression signatures.

Results

Systematic Enrichment Analyses of the GWAS Data

The consecutive steps of the pathway enrichment analysis are summarized in figure 1. The GWAS data were filtered for SNPs located within a gene (5′UTR, 3′UTR, intron and exon), as well as for SNPs located in a genomic region 20 kb up- and downstream from a gene locus to take also genetic effects in regulatory regions into account. Five different p-value cut-offs for both scenarios were set to generate gene lists (table 1). The gene lists based on SNPs created by the p-value cut-offs 0.01 and 0.005, consisting of 737 and 1143 genes and 402 and 638 genes, respectively, provided the enrichment tools with an applicable number of genes to run the enrichment analyses successfully.
Table 1

Number of SNPs and genes corresponding to allelic p-value cut-offs of the GWAS on BC survival.

p-valuetotal No. of SNPsNo. of SNPs within a geneNo. of genesNo. of SNPs ±20 kbNo. of genes ±20 kb
<0.0545721664101525251576
<0.013080113773717251143
<0.0051607576402746638
<0.00132911283163125
<0.000140991210
These four gene lists served as input in six different pathway enrichment analysis tools under the same setting. The number of enriched GO terms/KEGG pathways differed enormously between the enrichment analysis tools due to the individual tool features although the same analysis conditions were assigned (table 2). The ConsensusPathDB and GeneCodis tool reported in general much more enriched GO terms than the other used tools. For example, they generated 176 and 278 overrepresented GO annotations, respectively, when the 0.01±20 kb gene list was analyzed. As comparison, DAVID and FatiGO reported only 4 enriched categories each.
Table 2

Number of GO annotations and KEGG pathways enriched by six pathway enrichment tools for gene lists with allelic SNP p-value cut-offs 0.005 and 0.01.

SNP p-value cut-off 0.005SNP p-value cut-off 0.01
within a genegene locus ±20 kbwithin a genegene locus ±20 kb
Tool nameNo. of GO AnnotationsNo. of KEGG pathwaysNo. of GO AnnotationsNo. of KEGG pathwaysNo. of GO AnnotationsNo. of KEGG pathwaysNo. of GO AnnotationsNo. of KEGG pathways
ConsensusPathDB15916170141612517621
DAVID32015022541
FatiGO70505040
GATHER203102335234
GeneCodis13311172152232927826
WebGestalt5015241777375237

Pathway enrichment p-value cut-off: 0.05.

Pathway enrichment p-value cut-off: 0.05. To reduce the number of enriched categories, the results of the two gene lists with an allelic p-value cut-off of 0.005 were compared with each other. This was also done for the two gene lists with a p-value cut-off of 0.01 and the resulting overlaps were compared with each other. This was done separately for every tool. To define consistently enriched categories, the categories had to be overrepresented by at least two different tools. After this comparison eleven categories remained: two GO terms, which were “calcium ion binding” and “cell adhesion” and nine KEGG pathways named “adherens junction”, “arrythmogenic right ventricular cardiomyopathy”, “axon guidance”, “calcium signaling”, “dilated cardiomyopathy”, “ECM-receptor interaction”, “focal adhesion”, “O-glycan-biosynthesis” and “small cell lung cancer” (table 3). Most categories were reported by three or four tools.
Table 3

Consistently enriched categories of the GWAS on BC survival.

GO Annotations (6 tools)Number of toolsNumber of genes in categoryNumber of GWAS genes* in category
GO:0005509calcium ion binding368567
GO:0007155cell adhesion495852
KEGG Pathways (4 tools)
KEGG 04520Adherens junction3599
KEGG 05412Arrhythmogenic right ventricular cardiomyopathy26510
KEGG 04360Axon guidance38015
KEGG 04020Calcium signaling pathway313917
KEGG 05414Dilated cardiomyopathy38211
KEGG 04512ECM-receptor interaction4578
KEGG 04510Focal adhesion313415
KEGG 00512O-Glycan biosynthesis2117
KEGG 05222Small cell lung cancer2656

Only categories enriched in all four gene lists and by more than one tool were considered consistently enriched. * Genes present in the 0.01 gene list (allelic SNP p-value cut-off 0.01).

Only categories enriched in all four gene lists and by more than one tool were considered consistently enriched. * Genes present in the 0.01 gene list (allelic SNP p-value cut-off 0.01). We compared the genes of every category to each other to detect overlaps of the pathways to define the consistently enriched categories (table 4). The cross-tabulation revealed a strong association of “cell adhesion” genes with all pathways except for the genes in “calcium signaling” and “O-glycan biosynthesis”. Moreover, we investigated the overlap of our GWAS genes in the pathways, resulting in a similar outcome (table 5). Based on this analysis most categories were summarized in two overarching categories:
Table 4

Gene overlap of the consistently enriched categories for all pathway genes.

abcdefghijk
IDCategoryNumber of genes68595859658013982571341165
a GO:0005509Calcium ion binding6851850631481900
b GO:0007155Cell adhesion9582129255254863023
c KEGG 04520Adherens junction59892101700
d KEGG 05412Arrhythmogenic right ventricular cardiomyopathy651551212306
e KEGG 04360Axon guidance802112002
f KEGG 04020Calcium signaling pathway139180601
g KEGG 05414Dilated cardiomyopathy82212306
h KEGG 04512ECM-receptor interaction5742018
i KEGG 04510Focal adhesion134032
j KEGG 00512O-Glycan biosynthesis110
k KEGG 05222Small cell lung cancer65
Table 5

Gene overlap of the consistently enriched categories based on the GWAS genes present in the 0.01 gene list.

abcdefghijk
IDCategoryNumber of genes675291015171181576
a GO:0005509Calcium ion binding67131811172460
b GO:0007155Cell adhesion523530371005
c KEGG 04520Adherens junction921000200
d KEGG 05412Arrhythmogenic right ventricular cardiomyopathy101283401
e KEGG 04360Axon guidance15111201
f KEGG 04020Calcium signaling pathway1740100
g KEGG 05414Dilated cardiomyopathy113301
h KEGG 04512ECM-receptor interaction8705
i KEGG 04510Focal adhesion1505
j KEGG 00512O-Glycan biosynthesis70
k KEGG 05222Small cell lung cancer6
“Cell adhesion” with its 52 GWAS genes combined different kinds of cell adhesion processes, such as the KEGG pathways “adherens junction”, “ECM-receptor interaction”, “focal adhesion”, as well as “small cell lung cancer” and to a lesser extent “axon guidance”. Calcium ion binding” characterizes the group of KEGG pathways “arrythmogenic right ventricular cardiomyopathy”, “calcium signaling”, “dilated cardiomyopathy” and “O-glycan biosynthesis”. Additionally, the gene overlap of 20–30% between the two GO terms “calcium ion binding” and “cell adhesion” supports a connection of these two annotations.

Enrichment Analysis of the 0.01 Gene List with MetaCore

As we wanted to compare our GWAS pathways with prognostic expression signatures, the longer 0.01 gene list was further analyzed by GeneGo, the pathway enrichment analysis tool of MetaCore, which allows simultaneous analysis and comparison of two gene lists. Fifteen pathways passed the significance level defined by a FDR of 0.05 and the analysis confirmed the importance of cell adhesion, axon guidance and calcium signaling (figure 2) in the GWAS survival signature. Although the GeneGo pathway enrichment analysis uses its own pathway terms, they are similar to the GO terms or KEGG pathways. Also the O-Glycan biosynthesis was found in the top 5 enriched pathways. The five most common terms, cell adhesion, cytoskeleton remodeling, development, muscle contraction and neurophysiological process, constituted 56% of the top 50 pathways (table 6, table S4).
Figure 2

Top 25 GeneGO pathways enriched by the 0.01 gene list derived from the GWAS data.

red numbers = significant at FDR of 0.05.

Table 6

Distribution of the seven generic terms among the 50 top pathways in the enrichment analyses of the 13 gene lists.

signature nameNo. ofgenescytoskeletonremodelingcelladhesioncellcycleneurophysio- logicalprocessmusclecontractionimmuneresponsedevelopmentsum ofpathways
total%total%total%total%total%total%total%total%
based on GWAS data 0.01 gene list7376126120061236127142958
commercially used gene signatures Mammaprint70123636510243612242958
Oncotype DX215104891812001210203060
MapQuant9748121734000024122550
Gene Search76243691824006126122856
Wound responsesignature5125103636000048481938
special metastasis gene signatures Lung metastasissignature5448510000000163211223672
Brain metastasissignature24324612000012183611223876
Bone metastasissignature10261248000000142811223570
general metastasis gene signatures Invasivenesssignature18648121212008167142244
Generalmetastasissignature12848488162436485103060
derived from meta-analyses Meta genesignature376000017340000510362550
374 GeneSet/consensusgenes374510361734000036363162

Top 25 GeneGO pathways enriched by the 0.01 gene list derived from the GWAS data.

red numbers = significant at FDR of 0.05.

Enrichment Analyses of the Gene Expression Signatures

Literature was searched for commonly used prognostic gene signatures derived from breast cancer expression data. We selected twelve signatures for further pathway enrichment analysis (table 7). Mammaprint [14], Oncotype DX [15], MapQuant [16], Gene Search [17] and the fibroblast core serum response (CSR) signature, commonly known as wound response signature [18], are well established, often cited in literature and commercially available. We also included five gene signatures based on expression data of metastatic breast cancer or metastatic adenocarcinomas of diverse origin [19]–[23], because metastasis formation has a profound impact on patients’ survival. Last, we added two prognostic gene signatures based on meta-analyses of published gene expression signatures and microarray data sets of breast tumors [24], [25] to evaluate how a combination of several prognostic gene signatures influences the enrichment analysis and if this result is comparable to the one obtained by the GWAS.
Table 7

Prognostic gene expression signatures selected for pathway enrichment analysis.

SignaturenameAuthorYear ofpublicationNo. ofgenesStudy designOutcome
MammaprintVan’t Veeret al. [14] 20027078 patients with sporadic primary breast tumors:<5 cm, N0, age <55 years; 34 patients developeddistant metastasis <5 years vs. 44 patients:disease-free >5 yearsPrognosis for distantmetastasis
Oncotype DXPaik et al. [15] 200421668 tumors from patients: N0 and ER+,treated with tamoxifenPrognosis for distantrecurrence/overall survival
MapQuantSotirou et al. [16] 20069764 samples: ER+, grade 1 vs. grade 3Prognosis for recurrence/relapse-free survival
Gene SearchWang et al.[17] 200576115 tumors: all N0; 80 samples ER+,35 ER-, analyzed separately fordistant tumor recurrence,then combinedPrognosis for distanttumor recurrence
Wound responsesignatureChang et al. [18] 200451250 fibroblast culturesfrom 10anatomic sites: response offibroblast to serum exposurePrognosis formetastasis/survival
Lung metastasissignatureMinn et al. [19] 200554Comparison of highly and weaklylung-metastatic cell populationsderived from the breast cancercell line MDA-MB-231Prognosis for lungmetastasis
Brain metastasissignatureBos et al. [20] 2009243Comparison of cell lines withdifferent metastatic potentialsderived from the breast cancercell lines MDA-MB-231 and CN34Prognosis for brainmetastasis
Bone metastasissignatureKang et al. [21] 2003102MDA-MB-231 breast cancer cell line+12 derivative subpopulations withdifferent metastatic potentialsPrognosis for bonemetastasis
InvasivenesssignatureLiu et al. [22] 2007186CD44+CD24−/low breast cancer cellswith high tumorgenic capacity vs. cellsof normal breast epitheliumPrognosis for overall/metastasis-free survival
General metastasissignatureRamaswamyet al. [23] 200312864 primary adenocarcinomas of diverseorigin (lung, breast, prostate,colorectal, uterus, ovary) vs. 12unmatched adenocarcinoma metastasisMetastatic potential,clinical outcome
Meta genesignatureGyörffyet al. [24] 2009376Meta-analysis of 20 published genesignatures on 7 breast cancermicroarray data sets (n = 1079)Prognosis forrelapse-free survival
374 GeneSet/consensus genesLausset al. [25] 2008374Meta-analysis of 44 published genesignatures on 8 breastcancer microarray datasets (n = 1067)Prognosis forsurvival
The signatures could be divided in four subgroups. The commercially available gene signatures Mammaprint, Oncotype DX, MapQuant and Gene Search were dominated by the terms cell cycle and development (table 6). Also the two meta-analyses showed enrichment of genes involved in cell cycle (34% of the 50 top pathways each), as did the general metastasis signature (16%). The specific lung, brain and bone metastasis signatures showed a strong connection to the generic terms immune response and development, which represented about 30% and 20% of the enriched pathways, respectively, and they were lacking pathways associated with cell cycle. The wound response and invasiveness signature did not show any specific pattern.

Comparison of the GWAS and the Gene Expression Signatures

In order to evaluate the similarities between the GWAS and the gene expression signatures, we analyzed the GWAS gene signature and every prognostic gene expression signature simultaneously with the MetaCore GeneGo pathway enrichment analysis tool to get a detailed view on their common pathways (figure 3, Figures S1–S11). In this analysis the two gene lists were investigated for overrepresented pathways and compared to each other. Only pathways enriched by both gene lists are displayed and ranked by their enrichment p-values. The pathways which were significantly enriched by both gene signatures at the same time were a rare event. In all simultaneous analyses, only three pathways passed the 0.05 FDR significance level in both analyzed gene lists contemporaneously. Two of them were enriched in the analysis of our GWAS gene list together with the general metastasis signature (figure 3). These were the “Airway smooth muscle contraction in asthma” pathway placed at rank 7 (P0.01 gene list = 5.3×10−5; Pgeneral metastasis signature = 7.9×10−4) and the “Cytoskelton remodeling_Cytoskelton remodeling” pathway (P0.01 gene list = 1.7×10−4; Pgeneral metastasis signature = 9.7×10−4) placed at rank 9.
Figure 3

Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and general metastasis signature; red numbers = significant at FDR of 0.05; green box = pathway significantly enriched by both gene lists; Pathway “Airway smooth muscle contraction in asthma” was placed at rank 7, pathway “Cytoskeleton remodeling_Cytoskeleton remodeling” was placed at rank 9.

“Airway smooth muscle contraction in asthma” (figure 4) is almost identical to the top pathway in this analysis, “Muscle contraction_GPCRs in the regulation of smooth muscle tone” (figure S12), showing a clear connection to calcium ion binding, with the Ca2+-ions containing endoplasmatic reticulum and the associated proteins as one central part of these pathways. Several proteins of these pathways can also be found in the GeneGo pathway “Cytoskeleton remodeling_Cytoskeleton remodeling” (figure 5). This pathway combines several sub-pathways, many of them involved in cell adhesion. These include the pathways “ECM-receptor interaction”, “focal adhesion” and “adherens junction”. Also links to the well-known cancer pathways “TGF-β signaling” and “Wnt signaling” are observed. “Cytoskeleton remodeling_Cytoskeleton remodeling” was also significantly enriched by the brain metastasis gene signature (Pbrain metastasis signature = 2.5×10−3) and placed at rank 15 (figure S7).
Figure 4

GeneGo pathway “Airway smooth muscle contraction in asthma”.

Barometers: 1 = 0.01 gene list; 2 = general metastasis signature. red = Calcium signaling pathway, blue = Smooth muscle contraction/relaxation.

Figure 5

GeneGo pathway “Cytoskeleton remodeling”.

Barometer: 1 = 0.01 gene list; 2 = general metastasis signature; 3 = brain metastasis signature. orange = ECM-receptor interaction, purple = Adherens junction pathway, red = Calcium signaling pathway, pink = Focal adhesion pathway, yellow = TGF-β signaling pathway, green = Wnt signaling pathway, blue = Smooth muscle contraction/relaxation.

GeneGo pathway “Airway smooth muscle contraction in asthma”.

Barometers: 1 = 0.01 gene list; 2 = general metastasis signature. red = Calcium signaling pathway, blue = Smooth muscle contraction/relaxation.

GeneGo pathway “Cytoskeleton remodeling”.

Barometer: 1 = 0.01 gene list; 2 = general metastasis signature; 3 = brain metastasis signature. orange = ECM-receptor interaction, purple = Adherens junction pathway, red = Calcium signaling pathway, pink = Focal adhesion pathway, yellow = TGF-β signaling pathway, green = Wnt signaling pathway, blue = Smooth muscle contraction/relaxation. The third pathway significantly overrepresented by the 0.01 gene list and a gene expression signature in the simultaneous analyses was “Neurophysiological process_Receptor-mediated axon growth repulsion” (figure 6), which was significantly enriched in the analysis together with the Oncotype DX signature and placed at rank 4 (P0.01 gene list = 9.1×10−5; POncotype DX = 6.4×10−3) (figure S2). Also this pathway has a connection to the calcium signaling and cell adhesion pathway.
Figure 6

GeneGo pathway “Neurophysiological process_Receptor-mediated axon growth repulsion”.

Barometers: 1 = 0.01 gene list; 2 = Oncotype DX. red = Calcium signaling pathway.

GeneGo pathway “Neurophysiological process_Receptor-mediated axon growth repulsion”.

Barometers: 1 = 0.01 gene list; 2 = Oncotype DX. red = Calcium signaling pathway.

Discussion

The aim of our study was to set data derived from a GWAS on breast cancer survival into a global context by using a systematic pathway enrichment analysis with the two independent databases GO and KEGG as basis. In this process, the GO database was searched for overrepresented terms on a higher level of abstraction. A more detailed and focused view was achieved by using the data of the KEGG. By this way we gained eleven consistently enriched categories, two more general GO terms and nine specific KEGG pathways, which may have an influence on BC survival. A gene overlap of up to 87% between six of these categories revealed a strong connection to cell adhesion and included the KEGG pathways “adherens junction”, “axon guidance” “ECM-receptor interaction”, “focal adhesion”, and “small cell lung cancer” and the GO term “cell adhesion”. The second category with a high proportion of overlapping genes involved in calcium ion binding included the GO term “calcium ion binding” and the KEGG pathways “arrythmogenic right ventricular cardiomyopathy”, “calcium signaling”, “dilated cardiomyopathy” and “O-glycan biosynthesis”. There was also an overlap of 20–30% between the two overarching categories, which emphasizes the interplay of cellular adhesion processes and calcium signaling as an important process in breast cancer survival. In the second part of our study we compared the pathway enrichment results of our GWAS data to those of twelve prognostic gene expression signatures. Simultaneously conducted pathway enrichment analyses with each of the expression signatures revealed that the “Airway smooth muscle contraction in asthma”, the “Cytoskeleton remodeling” and the “Neurophysiological process_Receptor-mediated axon growth repulsion” pathways were the only ones which were significantly overrepresented by both the GWAS and a gene expression signature. The gene expression signatures involved were the general metastasis signature with two simultaneously enriched pathways and the specific brain metastasis signature and Oncotype DX each with one simultaneously enriched pathway, respectively. The general metastasis signature was derived from a comparison of gene expression data of adenocarcinomas of diverse origin (lung, breast, prostate, colorectal, uterus, ovary) with the corresponding metastases leading to 128 genes that distinguished best between primary tumors and metastases [23]. The brain metastasis signature is based on a genome-wide expression analysis of two BC cell lines and their highly brain metastatic cell derivates [20]. The comparison of the gene expression profiles led to 243 differentially expressed genes, which were used as a brain metastasis signature in our study. The Oncotype DX signature was generated by a hypothesis driven search of the literature and databases for candidate cancer genes, which were tested for their correlation with disease recurrence in three independent breast cancer studies [15]. The sixteen best performing genes and five reference genes were used to calculate a recurrence score. These three gene expression signatures are based on genes which are already known for their involvement in cancer (Oncotype DX) or which are associated with the metastasis forming process (general and brain metastasis signature). Together, the three simultaneously enriched pathways picture well a possible interaction of the cell adhesion pathways with the calcium signaling pathway in the metastatic process and the patients’ survival probabilities (figures 4–6). Calcium signaling and cell adhesion interact in various ways with each other and play an important role in metastasis, which involves detachment from the solid primary tumor, migration and invasion in a foreign tissue [26]. For example, E-cadherin as a key cell-to-cell adhesion molecule, essentially requires Ca2+-ions to form homophilic interactions between two neighboring cells in adherens junction [27]. Its down-regulation or inactivation in carcinomas has been reported to result in reduced cell adhesion [28], [29] making it as a major suppressor of metastasis. Also focal adhesions, as the main linkage point between the cells and the extra cellular matrix (ECM), are influenced by calcium. Focal adhesion turnover, which determines the efficiency of cell migration, is regulated by calcium signaling. An important component in this process is focal adhesion kinase (FAK), which is a contact point for diverse extracellular stimuli, including Ca2+-concentration. FAK coordinates signals between integrins, the attachment molecules to the ECM, and growth factor receptors and promotes cell migration [30]. These examples point to the regulation of the metastasis formation either directly through mutations in the involved adhesion molecules or indirectly through impaired “calcium signaling”. Metastases are the leading cause of death of cancer patients and therefore strongly connected to patients’ survival. This was also reflected in our study population: short-time survivors tended to have tumors with higher stage than long-time survivors. As our data is based on a GWAS on BC survival comparing women with short-time survival to those with long-time survival, the results of our pathway enrichment analyses reflect the impact of the invasive tumor phenotype on the survival of a patient. Moreover, the comparison analyses with the pathway enrichment results of commonly used prognostic gene expression signatures support our conclusion. Although pathway enrichment tools are able to put the GWAS data into a global context, there are some points which need to be considered [12], [31]. Large genes with more SNPs are more likely to contain associated SNPs by chance alone than small genes. To avoid this bias, we annotated the SNPs to a gene both by excluding and including a 20 kb up- and downstream sequence of the gene. Only the best SNP (i.e. the one with the lowest p-value) per gene was included in the analysis. The 20 kb limit was applied because the average length of haplotype blocks in the CEU population ranges between 5.9 kb (calculation method based on the four gamete test) and 16.3 kb (calculation method based on a composite of local D′ values) [32]. The pathway enrichment tools themselves also suffer from some limitations, which we experienced in our study. Even though the conditions were identical in all analyses, the different tools showed large variability in the number of overrepresented categories and their corresponding p-values [33]. The reasons for this variation include the source and the version of the annotation files, the annotation level used by the tool, the statistical model applied for the enrichment analysis, the correction for multiple testing, and the background gene set, which is used to calculate the p-values for the overrepresented pathways [34]. One way to avoid the problem of inconsistent results obtained by different tools is to use several tools and to compare the results with each other. In our study, we analyzed four gene lists derived from the GWAS on BC survival with six tools and compared the results to detect true, consistently enriched categories. In conclusion, our pathway enrichment analysis of the high-throughput data from a GWAS on BC survival revealed an influence of cell adhesion and calcium signaling on BC patients’ survival. This was also confirmed by our comparison to the enrichment analyses of twelve prognostic gene expression signatures. The known high impact of metastasis on a patients’ survival is supported by our genetic data, which also highlights the influence of changes in cell adhesion and calcium signaling in the metastatic process. Therefore, a further investigation of the identified pathways and the defined mechanisms of metastasis is a promising target to get classifiers for the patients’ survival. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and Mammaprint. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and Oncotype DX. red numbers = significant at FDR of 0.05; green box = “Neurophysiological process_Receptor mediated axon growth repulsion” pathway significantly enriched by both gene lists at rank 4. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and MapQuant. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and Gene Search. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and wound response signature. red numbers = significant at FDR of 0.05; green box = pathway significantly enriched by both gene lists. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the lung metastasis signature. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the brain metastasis signature. red numbers = significant at FDR of 0.05; green box = “Cytoskeleton remodeling_ Cytoskeleton remodeling” pathway, significantly enriched by both gene lists at rank 15. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the bone metastasis signature, red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the invasiveness signature. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the meta gene signature. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and the consensus gene signature. red numbers = significant at FDR of 0.05. (TIF) Click here for additional data file. GeneGo pathway “Muscle contraction_GPCRs in the regulation of smooth muscle tone”. Barometers: 1 = 0.01 gene list; 2 = general metastasis signature; red = Calcium signaling pathway. (TIF) Click here for additional data file. Detailed characteristics of the whole study population. (DOCX) Click here for additional data file. General characteristics of sub-populations used in the GWAS. (DOCX) Click here for additional data file. Used pathway enrichment tools and their features. (DOCX) Click here for additional data file. Top 50 GeneGo pathways enriched by the 0.01 gene list. (DOCX) Click here for additional data file.
  34 in total

1.  Survival in breast cancer is familial.

Authors:  Kari Hemminki; Jianguang Ji; Asta Försti; Jan Sundquist; Per Lenner
Journal:  Breast Cancer Res Treat       Date:  2007-08-03       Impact factor: 4.872

2.  The prognostic role of a gene signature from tumorigenic breast-cancer cells.

Authors:  Rui Liu; Xinhao Wang; Grace Y Chen; Piero Dalerba; Austin Gurney; Timothy Hoey; Gavin Sherlock; John Lewicki; Kerby Shedden; Michael F Clarke
Journal:  N Engl J Med       Date:  2007-01-18       Impact factor: 91.245

3.  Consensus genes of the literature to predict breast cancer recurrence.

Authors:  Martin Lauss; Albert Kriegner; Klemens Vierlinger; Ilhami Visne; Ahmet Yildiz; Erkan Dilaveroglu; Christa Noehammer
Journal:  Breast Cancer Res Treat       Date:  2007-09-26       Impact factor: 4.872

4.  Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

Authors:  Peter Holmans; Elaine K Green; Jaspreet Singh Pahwa; Manuel A R Ferreira; Shaun M Purcell; Pamela Sklar; Michael J Owen; Michael C O'Donovan; Nick Craddock
Journal:  Am J Hum Genet       Date:  2009-06-18       Impact factor: 11.025

Review 5.  Use and misuse of the gene ontology annotations.

Authors:  Seung Yon Rhee; Valerie Wood; Kara Dolinski; Sorin Draghici
Journal:  Nat Rev Genet       Date:  2008-05-13       Impact factor: 53.242

6.  Using genome-wide pathway analysis to unravel the etiology of complex diseases.

Authors:  Clara C Elbers; Kristel R van Eijk; Lude Franke; Flip Mulder; Yvonne T van der Schouw; Cisca Wijmenga; N Charlotte Onland-Moret
Journal:  Genet Epidemiol       Date:  2009-07       Impact factor: 2.135

Review 7.  FAK expression regulation and therapeutic potential.

Authors:  Shufeng Li; Zi-Chun Hua
Journal:  Adv Cancer Res       Date:  2008       Impact factor: 6.242

8.  Genes that mediate breast cancer metastasis to the brain.

Authors:  Paula D Bos; Xiang H-F Zhang; Cristina Nadal; Weiping Shu; Roger R Gomis; Don X Nguyen; Andy J Minn; Marc J van de Vijver; William L Gerald; John A Foekens; Joan Massagué
Journal:  Nature       Date:  2009-05-06       Impact factor: 49.962

9.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.

Authors:  Christos Sotiriou; Pratyaksha Wirapati; Sherene Loi; Adrian Harris; Steve Fox; Johanna Smeds; Hans Nordgren; Pierre Farmer; Viviane Praz; Benjamin Haibe-Kains; Christine Desmedt; Denis Larsimont; Fatima Cardoso; Hans Peterse; Dimitry Nuyten; Marc Buyse; Marc J Van de Vijver; Jonas Bergh; Martine Piccart; Mauro Delorenzi
Journal:  J Natl Cancer Inst       Date:  2006-02-15       Impact factor: 13.506

10.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

View more
  10 in total

1.  Pan-cancer analysis of TCGA data reveals notable signaling pathways.

Authors:  Richard Neapolitan; Curt M Horvath; Xia Jiang
Journal:  BMC Cancer       Date:  2015-07-14       Impact factor: 4.430

2.  Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

Authors:  Nilotpal Chowdhury; Shantanu Sapru
Journal:  PLoS One       Date:  2015-06-16       Impact factor: 3.240

3.  Analysis of functional germline variants in APOBEC3 and driver genes on breast cancer risk in Moroccan study population.

Authors:  Chaymaa Marouf; Stella Göhler; Miguel Inacio Da Silva Filho; Omar Hajji; Kari Hemminki; Sellama Nadifi; Asta Försti
Journal:  BMC Cancer       Date:  2016-02-26       Impact factor: 4.430

4.  Genomic contributors to atrial electroanatomical remodeling and atrial fibrillation progression: Pathway enrichment analysis of GWAS data.

Authors:  Daniela Husser; Laura Ueberham; Borislav Dinov; Jedrzej Kosiuk; Jelena Kornej; Gerhard Hindricks; M Benjamin Shoemaker; Dan M Roden; Andreas Bollmann; Petra Büttner
Journal:  Sci Rep       Date:  2016-11-18       Impact factor: 4.379

5.  Genomic Contributors to Rhythm Outcome of Atrial Fibrillation Catheter Ablation - Pathway Enrichment Analysis of GWAS Data.

Authors:  Daniela Husser; Petra Büttner; Laura Ueberham; Borislav Dinov; Philipp Sommer; Arash Arya; Gerhard Hindricks; Andreas Bollmann
Journal:  PLoS One       Date:  2016-11-21       Impact factor: 3.240

6.  Assessment of the prognostic role of a 94-single nucleotide polymorphisms risk score in early breast cancer in the SIGNAL/PHARE prospective cohort: no correlation with clinico-pathological characteristics and outcomes.

Authors:  Elsa Curtit; Xavier Pivot; Julie Henriques; Sophie Paget-Bailly; Pierre Fumoleau; Maria Rios; Hervé Bonnefoi; Thomas Bachelot; Patrick Soulié; Christelle Jouannaud; Hugues Bourgeois; Thierry Petit; Isabelle Tennevet; David Assouline; Marie-Christine Mathieu; Jean-Philippe Jacquin; Sandrine Lavau-Denes; Ariane Darut-Jouve; Jean-Marc Ferrero; Carole Tarpin; Christelle Lévy; Valérie Delecroix; Véronique Trillet-Lenoir; Oana Cojocarasu; Jérôme Meunier; Jean-Yves Pierga; Pierre Kerbrat; Céline Faure-Mercier; Hélène Blanché; Mourad Sahbatou; Anne Boland; Delphine Bacq; Céline Besse; Gilles Thomas; Jean-François Deleuze; Iris Pauporté; Gilles Romieu; David G Cox
Journal:  Breast Cancer Res       Date:  2017-08-22       Impact factor: 6.466

7.  Exploring the molecular mechanism associated with breast cancer bone metastasis using bioinformatic analysis and microarray genetic interaction network.

Authors:  Xinhua Chen; Zhe Pei; Hao Peng; Zhihong Zheng
Journal:  Medicine (Baltimore)       Date:  2018-09       Impact factor: 1.817

8.  ce-Subpathway: Identification of ceRNA-mediated subpathways via joint power of ceRNAs and pathway topologies.

Authors:  Chenchen Feng; Chao Song; Ziyu Ning; Bo Ai; Qiuyu Wang; Yong Xu; Meng Li; Xuefeng Bai; Jianmei Zhao; Yuejuan Liu; Xuecang Li; Jian Zhang; Chunquan Li
Journal:  J Cell Mol Med       Date:  2018-11-12       Impact factor: 5.310

9.  Inbreeding and homozygosity in breast cancer survival.

Authors:  Hauke Thomsen; Miguel Inacio da Silva Filho; Andrea Woltmann; Robert Johansson; Jorunn E Eyfjörd; Ute Hamann; Jonas Manjer; Kerstin Enquist-Olsson; Roger Henriksson; Stefan Herms; Per Hoffmann; Bowang Chen; Stefanie Huhn; Kari Hemminki; Per Lenner; Asta Försti
Journal:  Sci Rep       Date:  2015-11-12       Impact factor: 4.379

10.  Personalized analysis of breast cancer using sample-specific networks.

Authors:  Ke Zhu; Cong Pian; Qiong Xiang; Xin Liu; Yuanyuan Chen
Journal:  PeerJ       Date:  2020-05-15       Impact factor: 2.984

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.