Literature DB >> 28273134

Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus.

Albert Rosenberger1, Melanie Sohns1, Stefanie Friedrichs1, Rayjean J Hung2,3, Gord Fehringer2, John McLaughlin4, Christopher I Amos5, Paul Brennan6, Angela Risch7, Irene Brüske8, Neil E Caporaso9, Maria Teresa Landi9, David C Christiani10, Yongyue Wei10, Heike Bickeböller1.   

Abstract

INTRODUCTION: Gene-set analysis (GSA) is an approach using the results of single-marker genome-wide association studies when investigating pathways as a whole with respect to the genetic basis of a disease.
METHODS: We performed a meta-analysis of seven GSAs for lung cancer, applying the method META-GSA. Overall, the information taken from 11,365 cases and 22,505 controls from within the TRICL/ILCCO consortia was used to investigate a total of 234 pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.
RESULTS: META-GSA reveals the systemic lupus erythematosus KEGG pathway hsa05322, driven by the gene region 6p21-22, as also implicated in lung cancer (p = 0.0306). This gene region is known to be associated with squamous cell lung carcinoma. The most important genes driving the significance of this pathway belong to the genomic areas HIST1-H4L, -1BN, -2BN, -H2AK, -H4K and C2/C4A/C4B. Within these areas, the markers most significantly associated with LC are rs13194781 (located within HIST12BN) and rs1270942 (located between C2 and C4A).
CONCLUSIONS: We have discovered a pathway currently marked as specific to systemic lupus erythematosus as being significantly implicated in lung cancer. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of lung cancer and systemic lupus erythematosus is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary.

Entities:  

Mesh:

Year:  2017        PMID: 28273134      PMCID: PMC5342225          DOI: 10.1371/journal.pone.0173339

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Since the beginning of the 20th century, lung cancer (LC) occurrence has been increasing rapidly and has become the most common cancer in males. It is the main cause of cancer-related death worldwide [1] and tobacco smoke is its major risk factor. The risk of developing LC in current smokers is 7.6 to 9.3 times higher compared to that of never smokers [2]. However, around every fourth LC case is not attributable to smoking [3]. A five-fold increased risk of developing early-onset LC in the presence of a family history of early-onset LC in any first-degree relatives has also been observed [4, 5]. This and other evidence has led to the general acceptance that a genetic component in early-onset LC development exists. However, an increased risk of developing LC has also been observed in patients with other disease, such as COPD, pneumonia, tuberculosis, or the autoimmune disorder systemic lupus erythematosus (SLE) [6, 7]. In the case of patients with SLE, an increased relative risk (RR) of developing LC was observed as being 1.68 (95%-CI: 1-33-2.13) [6]. In spite of multiform clinical manifestations and outcomes, it is generally accepted that genetics plays a role in SLE [8]. In light of the results of this investigation, we will discuss a shared genetic susceptibility as a possible connection between SLE and LC. Genome-wide association studies (GWASs) have revealed that genomic variations at e.g. 5p15.33, 6p21-22 and 15q25 influence LC risk in European populations [9-16]. Further weakly associated single markers in at least 12 genes have been found given their known role within certain molecular mechanisms [17-21]. Since associated genes are elements of respective pathways, one may assume that nicotine dependency [14], inflammation [16, 22], or DNA repair [23], among others, play a role in an individual’s susceptibility to developing LC. The usual approach to identify such molecular mechanisms with GWAS is primarily to investigate single-marker-association and then allocate these markers to genes and finally the genes to pathways. Doing so, either the marginal effect of a single marker and/or the sample size needs to be large, because a low genome-wide level of significance of 1 x 10−7 or smaller is needed owing to multiple testing. Gene-set analysis (GSA) strategies were proposed as complementary approaches in the investigation of the genetic basis of a disease using GWAS results [24-26], by seeking to identify sets of genes (GS) with sufficient enrichment of marker-specific significance for an association with a phenotype. GSA approaches provide no effect estimates of the association, but only p-values (p). To pool the p-values of several GSAs, it is important to take into account the concordance across studies of all single-marker-association point estimates related to every gene in a considered gene set [27]. However, one only needs to correct for multiple testing using the lower number of GSs being investigated instead of the larger number of genotyped markers. Once a GS has been found to be significantly associated, a search may be conducted for the genes that drive its significance and for the hosted markers which are concordant across studies based on their observed associations. Here we aimed to identify pathways taken from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [28] as being associated with LC. KEGG provides a collection of manually drawn pathway maps representing an up-to-date knowledge on the molecular interaction and reaction networks. This includes pathways for metabolisms (e.g. nicotinate and nicotinamide metabolism), for genetic information processing (e.g. DNA repair), for environmental information processing (e.g. Wnt signaling), for cellular processes (e.g. cell cycle), for organismal systems (e.g. circadian rhythm) and last but not least for human diseases (e.g. LC or SLE) [29]. We refrained from restricting the KEEG collection, because pathways that are potentially involved in the etiology of LC (examples are given above in brackets) are contained in every upper mentioned category. Our subsequent goal was to determine the driving genes in the pathways identified in the first step. To this end, we combined the results of seven LC GWASs from the Transdisciplinary Research in Cancer of the Lung / International Lung Cancer Consortium (TRICL / ILCCO) in a meta-analysis.

Materials and methods

Description of studies

The meta-analysis was based on summary data from seven previously reported LC GWASs form TRICL / ILCCO (Fig 1). We included 11,365 LC cases and 22,505 controls of European descent in the analysis. An overview as well as study name abbreviations are given in Table 1. Details and references are provided Supplement S1 File.
Fig 1

Study selection flow cart.

Table 1

Characteristics of lung cancer GWASs of the International Lung Cancer Consortium (ILCCO).

StudyCasesControlsLocationStudy designIllumina genotyping platformNumber of SNPs
Scanning phase
MDACCa1 1501 134Texas, USAHospital-based case–control317K312 829
TORONTOb331499Toronto, CAHospital-based case–control317K314 285
CE (IARCc)1 8542 453Romania, Hungary, Slovakia, Poland, Russia, Czech RepublicMulticenter hospital-based case–control317K, 370Duo
GLCd487480GermanyPopulation-based case–control (<50 years)HumanHap550K503 381
Replication phase
DeCODE Genetics83011 228IcelandPopulation-based case–control317K, 370Duo290 386
HARVARD984970Massachusetts, USAHospital-based case–control610Quad543 697
NCI GWAS506 062
    EAGLEe1 9201 979ItalyPopulation-based case–controlHumanHap550v3_B, 610Quad
    ATBCf1 7321 271FinlandCohortHumanHap550K, HumanHap610
    PLCOg1 3801 81710 US CentersCohort-Cancer Prevention Trial317K / 240S, HumanHap550v3_B, HumanHap610
    CPS-IIh697674all US statesCohortHumanHap550K, 610Quad
Overall11 36522 505

a MD Anderson Cancer Center.

b Toronto study by Lunenfeld-Tanenbaum Research Institute.

c Central Europe Study of the International Agency for Research on Cancer.

d German Lung Cancer Study.

e Environment And Genetics in Lung cancer Etiology study.

f Alpha-Tocopherol, Beta-Carotene Cancer Prevention study.

g Prostate, Lung, Colon, Ovary screening trial.

h Cancer Prevention Study II nutrition cohort.

a MD Anderson Cancer Center. b Toronto study by Lunenfeld-Tanenbaum Research Institute. c Central Europe Study of the International Agency for Research on Cancer. d German Lung Cancer Study. e Environment And Genetics in Lung cancer Etiology study. f Alpha-Tocopherol, Beta-Carotene Cancer Prevention study. g Prostate, Lung, Colon, Ovary screening trial. h Cancer Prevention Study II nutrition cohort.

Strategy and methods

In the original GWASs, a log-additive mode of inheritance was fitted for each marker, adjusting for age, sex, smoking status, study center (if applicable), and the first three principal components to account for hidden genomic structure. The results of marker-by-marker association testing were used as input information for the GSAs. For this meta-analysis, we set up a two-phase seamless design consisting of a screening phase and a replication phase. In the screening phase, the results of MDACC, TORONTO, GLC, and CE were combined, because GSA of these studies was performed for 234 KEGG pathways previously [30, 31]. In the replication phase, the results of the remaining studies NCI, deCODE, and HARVARD were combined to investigate only those pathways whose findings in the screening phase proved promising. If necessary, GSA was performed using the program ALIGATOR [32]. The method META-GSA [27] was performed to pool GSA results (p-values p) at each stage. The aim of META-GSA is to increase statistical evidence by pooling the p-values p of GSAs, taking also into account the concordance of the signs of single-marker-association point estimates and related p-values of all markers (p) assigned to genes contained in the GS [27]. The core element of this approach is a directed p-value (PDR), combining significance and direction of single markers and LD to other markers. Necessary estimates of LD were based on the genotype data of GLC, with imputation of missing markers based on the 1000-Genome Project [33], the 1000-GenomePilot 1-Panel or the HapMap3-Panel as available using the SNAP online tool [34]. The SNP-to-gene annotation (StG) for humans of the ENSEMBL database [35] was used. Markers with LD of at least r2≥0.8 to any marker inside a gene were additionally assigned to that gene [36]. All genes were then annotated to 234 gene sets from the KEGG database (gene-to-pathway annotation (GtP)). Both phases can be considered as the first and the second stage of a seamless, adaptive study with interim selection of gene sets (“drop-loser design” [37]). The investigation of every KEGG pathway with a pooled p < β1 = 1/234 in the screening phase was stopped early for futility. The significance, combining screening and replication phase, was assessed according to the “method based on the sum of p-values” (MSP) [37, 38]. The p-value was then calculated by the equation . This p needs to be corrected for multiple testing by taking into account the total number of 234 pathways. Due to pathway overlap we estimated the number of independent tests t according to the lowest slope method (LSM) [39] considering all p-values of the screening phase. Applying a Bonferroni-like correction then yields the final p-value p = min(1,t ⋅ p). Furthermore, META-GSA was also applied to all seven studies and all pathways surviving the screening phase to take into account the concordance of single-marker-association point estimates across all considered studies at the same time. The next step was to identify the main genes driving the significance of gene sets (denoted as p–driving genes). Thus we contrasted the mean of PDRs across studies for each gene ( as a measure of concordance) with pooled p-values regarding the gene-level statistics (p as measure of significance, calculated according to Fisher’s χ2-method). To judge these findings adequately, we also calculated for the known LC-related genes CLTM1L, TERT, CHRNB4, CHRNA3, CHRNA5, MSH5, BAG6, RAD52 and CDKN2B. Within these genes we looked markers with a large mean of PDRs across studies (). Finally, we performed a sub-group meta-analysis for the one identified KEGG pathway according to histological subtype (AdenoLC, SqCLC, SCLC and LCLC), sex, age (older or younger than 50 years), and smoking behavior (current, former, ever and never smokers). During this investigation the region 6p21-22 became of interest. Respective correlation of marker genotypes and gene expression (eQTL) was previously measured in non-neoplastic pulmonary parenchymal samples taken some distance from the primary tumor in LC patients [40]. We used the estimated correlation between every SNP located between 31.6MB and 32.2 MB (all within 6p21-22) and the expression of the genes APOM, BAG6, MSH5 (reported as relevant in LC), C2, C4B, SKIV2L, STK19 (closely located to genes driving the significance in this META-GSA application) and TNXB (reported as relevant for SLE), in total 5,572 estimated correlations. Estimating t = 5309 independent tests (by LSM) yields a global threshold for significance of 1x10-7.

Results

Association of pathways: Screening and replication phase

Only three of the 234 pathways investigated revealed a p-value lower than the futility threshold and were selected for the replication phase: hsa05322: systemic lupus erythematosus (SLE), hsa00790: folate biosynthesis and hsa04940: type I diabetes mellitus (Table 2). Only for the SLE pathway we were able to achieve a low p-value when combining screening and replication phase and correcting for multiple testing (p = 0.0615). Combining all seven studies in a single META-GSA, in order to take the concordance of single-marker-association point estimates of all studies into account adequately, yielded a pGS-value of 0.0306 for this SLE pathway. This indicates sufficient enrichment and satisfactory concordance of marker-specific significance for an association with LC.
Table 2

Significant results of META-GSA.

KEGG pathwaysnumber of genesscreeningreplicationMSP combinationall
4 studies3 studies7 studies
n genespscr.pscr.corr.$prep.pGSpGS,corr.$pGS
hsa05322 (SLE)128***0.0003*0.04570.0857***0.00040.0615*0.0306
hsa00790 (folate bio.)13***0.00030.05430.9122***0.00460.66720.3154
hsa04940 (T1DM)42***0.00110.19400.4890***0.00240.35700.3952
231 other gene sets>0.0043futilitystopping

SLE—systemic lupus erythematosus; folate bio folate biosynthesis; T1DM—type I diabetes mellitus, MSP—combined p-values according to the method based on the sum of p-values (adaptive designed approach for early futility stopping); p—p-value of the screening phase; p—p-value of the screening phase corrected for multiple testing; p—p-value of the replication phase, p—p-value of the gene set (combining p and p); p—p-value of the gene set corrected for multiple testing; effective number of independent gene sets according the lowest slope method (LSM).

$: t = 171.5.

* P ≤ 0.05.

** P ≤ 0.01.

*** P ≤ 0.001.

SLEsystemic lupus erythematosus; folate bio folate biosynthesis; T1DM—type I diabetes mellitus, MSP—combined p-values according to the method based on the sum of p-values (adaptive designed approach for early futility stopping); p—p-value of the screening phase; p—p-value of the screening phase corrected for multiple testing; p—p-value of the replication phase, p—p-value of the gene set (combining p and p); p—p-value of the gene set corrected for multiple testing; effective number of independent gene sets according the lowest slope method (LSM). $: t = 171.5. * P ≤ 0.05. ** P ≤ 0.01. *** P ≤ 0.001.

Genes driving significance

Four genes of the SLE pathway (HIST1-H4L,-1BN, -H2AK, -H4K) and their close neighbor HIST1H2BN strike out by concordance of marker-specific association () across studies and a gene-level pvalue lower than 0.01 (Table 3). All five genes belong to the histone cluster 1 and are closely located within 41 kb of each other on 6p22.1. Weaker concordance was observed for further two less significant genes (pvalue < 0.05): C4A ( = -0.41) and C2 ( = 0.33).
Table 3

Significance and concordance of selected genes of interest.

genelocationnumber of studies withconcordancesignificance
pgene,study < 5%PDR¯gpgene
significant genes belonging to the significant gene set hsa05322 (SLE)
HIST1H4K6p22.12-0.840.0056
HIST1H2BN6p22.12-0.800.0091
HIST1H2AK6p22.12-0.800.0091
HIST1H1B6p22.12+0.750.0093
HIST1H2AL6p22.12+0.750.0093
C26p21.32+0.330.0109
C4A6p21.31-0.410.0319
genes known to be associated with LC (for comparison only)
CLPTM1L5q15.334-0.53< .0001
TERT5q15.334+0.490.0013
CHRNB415q243-0.63< .0001
CHRNA315q244-0.58< .0001
CHRNA515q243-0.450.0009
MSH56p21.33+0.67< .0001
BAG66p21.3--+0.390.1425
RAD5212p13.331+0.230.3143
CDKN2B9p21.3---0.130.6729

p is the study specific p-value for gene; is the mean of study specific PDRs for a gene (95% random interval derived from all 16.000 assigned genes: [±0.306]); pooled p—p-values combined by Fisher’s inverse χ2-method.

p is the study specific p-value for gene; is the mean of study specific PDRs for a gene (95% random interval derived from all 16.000 assigned genes: [±0.306]); pooled p—p-values combined by Fisher’s inverse χ2-method.

Markers driving significance

The markers rs13194781, rs1270942 and rs389884 are those with the largest -values (all >0.7) and the strongest associations with LC (in terms of OR). For rs13194781, which is located within HIST1H2BN (ENSEMBL definition), an OR of 1.23 (p = 0.0032) was estimated. The markers rs1270942 and rs389884 are perfect proxies for each other according to the 1000-Genome Pilot 1-panel [33]. They are closely located upstream of C2 and downstream of C4A, respectively. There is no LD with the first marker rs13194781 (Table 4).
Table 4

Markers with <0.5 in genes of interest on 6p21-22.

SNPallocated toPositionMAFr2 toD‘ toPDR¯mLCSqCLC
(A)(B)(A)(B)ORp-valueORp-value
rs200991HIST1++278477160.12a0.646b1a0.5981.140.00211.163.1×10−5
rs13194781 (A)HIST1++278478610,08110.7191.230.00321.229.7×10−6
rs9262143MDC1306850040.16§0.7691.250.00271.251.3×10−7
rs3094127MDC1307296700.180.6640.840.00291.104.0×10−2
rs3128982HCP5314494140.300.5781.070.00321.121.1×10−3
rs3117582BAG6316527430.09b0.881b1b0.4851.270.00491.304.5×10−10
rs3131379MSH5317532560.09b0.881b1b0.4611.200.00741.283.8×10−7
rs652888C2318834570.170.336b1b0.5381.140.00131.181.3×10−4
rs535586C2318925600.350.131b1b0.6061.090.00011.111.2×10−3
rs659445C2318965270.350.131b1b0.7111.093.7×10−61.103.1×10−3
rs1270942 (B)C2319510830.09b11b0.7281.270.00901.295.8×10−6
rs438999C2319605290.060.005b1b-.5170.910.00270.851.0×10−2
rs454212C4A319665950.08-.5560.950.00340.841.7×10−2
rs389884C4A319731200.09b1$1b0.7241.270.00801.287.2×10−6

Odds ratios (OR), corresponding p-values from a random effects meta-analysis model; single study ORs were adjusted for age, sex, smoking and genetic background; r2 and D’ were calculated according to the HapMap3-panel.

(a) or the 1000 Genome Pilot 1-panel.

(b) using SNAP Version 2.2; HIST1++ denotes the gene cluster HIST1-H4L/H2BN/H2AK/H2BN/H4K; LC—lung cancer (all histological subtypes), SqCLC – squamous-cell lung cancer; markers with largest with genes driving the significance of the SLE gene set (HIST1++, C2 and C4A) are printed in bold. Position of SNPs is given according to NCBI Build 37. MAF … minor allele frequencies in controlls.

Odds ratios (OR), corresponding p-values from a random effects meta-analysis model; single study ORs were adjusted for age, sex, smoking and genetic background; r2 and D’ were calculated according to the HapMap3-panel. (a) or the 1000 Genome Pilot 1-panel. (b) using SNAP Version 2.2; HIST1++ denotes the gene cluster HIST1-H4L/H2BN/H2AK/H2BN/H4K; LC—lung cancer (all histological subtypes), SqCLC – squamous-cell lung cancer; markers with largest with genes driving the significance of the SLE gene set (HIST1++, C2 and C4A) are printed in bold. Position of SNPs is given according to NCBI Build 37. MAF … minor allele frequencies in controlls.

Subgroup meta-analysis

We revealed more evidence for an association of the SLE pathway with AdenoLC (p = 0.0030) than for any other histotype. We also found the association to be significant in women (p = 0.0112) but not in men (p = 0.1453) and in older cases (p = 0.0002) but not in younger (p = 0.0588). No significant association was observed when stratifying according to smoking behavior (Table 5). Significance within the considered subgroups is driven by same p-driving genes of the region 6p22.1–22.2 as in the total sample (C2 and the genes of the histone 1 cluster). Also, most of the more moderate concordant genes that drive significance of hsa05322 in at least one of the considered subgroups are histone-coding genes.
Table 5

Subgroup analysis for hsa05322: histological subtypes, sex, age, smoking.

hsa05322: SLEMETA-GSAGeneLocationconcordancesignificance
pGSPDR¯gpgene
AdenoLC0.0030HIST2-1q21.21q21.2-0.60.1666
SqCLC0.0376H2AFV7p130.50.7209
SCLC0.0626C1QA1p36.12-0.50.7101
HIST2-1q21.21q21.2-0.50.0577
ELANE19p13.30.50.4864
HIST1-6p22.26p22.20.50.2177
HIST1-6p22.26p22.20.50.2177
LCLC0.2056--
male0.1453HIST1H3C6p22.2-0.50.3726
female0.0112HIST1H2AL6p22.10.50.1229
old (>50)0.0002HIST1-6p22.1a6p22.1-0.70.0054
HIST1-6p22.1b6p22.10.50.1578
C26p21.30.50.0013
H2AFV7p10.60.4005
young (≤50)0.0588--
current smokers0.3563HIST1-6p22.1a6p22.10.40.1720
H3F3C12p11.210.40.5821
HIST3H31q420.40.6468
HIST1-6p22.1b6p22.10.40.2028
HIST1-6p22.1c6p22.10.40.3375
former smokers0.4691--
ever smokers0.5132HIST1-6p22.1a6p22.10.50.0462
HIST1-6p22.1c6p22.10.50.1587
never smokers0.5429FCGR3A1q23-0.50.2300
CTSG14q11.2-0.50.3403

Listed are genes, respectively regions containing genes with .

HIST2-1q21.2: HIST2H2AA3 / HIST2H2AA4 / HIST2H3C / HIST2H4B.

HIST3-1q42: HIST3H2A / HIST3H2BB / HIST3H3.

HIST1-6p22.1a: HIST1H4K / HIST1H2AK / HIST1H2AL / HIST1H2BM / HIST1H2BN / HIST1H3I / HIST1H4L / HIST1H3J / HIST1H4J (27.800K).

HIST1-6p22.1b: HIST1H2AG / HIST1H2BK (27.150 K).

HIST1-6p22.1c: HIST1H2BI / HIST1H3G / HIST1H4H (26.280 K).

HIST1-6p22.2: HIST1H3E / HIST1H2AE / HIST1H2BG / HIST1H4E (26.200 K).

The numbers in brackets are the approximate locations according to dbGENE.

Listed are genes, respectively regions containing genes with . HIST2-1q21.2: HIST2H2AA3 / HIST2H2AA4 / HIST2H3C / HIST2H4B. HIST3-1q42: HIST3H2A / HIST3H2BB / HIST3H3. HIST1-6p22.1a: HIST1H4K / HIST1H2AK / HIST1H2AL / HIST1H2BM / HIST1H2BN / HIST1H3I / HIST1H4L / HIST1H3J / HIST1H4J (27.800K). HIST1-6p22.1b: HIST1H2AG / HIST1H2BK (27.150 K). HIST1-6p22.1c: HIST1H2BI / HIST1H3G / HIST1H4H (26.280 K). HIST1-6p22.2: HIST1H3E / HIST1H2AE / HIST1H2BG / HIST1H4E (26.200 K). The numbers in brackets are the approximate locations according to dbGENE.

SNP ⨯ eQTL correlation

Both aforementioned SNPs belonging to C2/C4A, rs1270942 and rs389884, are significant correlated with the expression of the gene APOM (p<10−13), which is located about 500 kb away (Fig 2). However, the expression pattern is this region is puzzling, since other markers within C2 (rs537160, rs622871, rs630379) are also correlated with the gene expression in non-neoplastic samples of LC patients of the neighboring gene C4B (not part of the investigated KEGG pathway, although related to SLE). It is also remarkably that the correlation of SNPs belonging to C2/C4A with the expression of C2 is less significant (p ~10−3) than with the expression of SKIV2L (p ~10−5), which is not related to SLE.
Fig 2

Association and correlation with gene expression in the chromosome 6p21-22 region.

LC—lung cancer, SLE—systemic lupus erythematosus; correlation to gene expression: pooled p-values as reported by Nguyen et al., 2014 [40]; association with LC: pooled p-values as reported by Timofeeva et al. 2012 [13].

Association and correlation with gene expression in the chromosome 6p21-22 region.

LC—lung cancer, SLEsystemic lupus erythematosus; correlation to gene expression: pooled p-values as reported by Nguyen et al., 2014 [40]; association with LC: pooled p-values as reported by Timofeeva et al. 2012 [13].

Discussion

We could demonstrate an accumulation of genomic association with LC in the KEGG pathway hsa05322, which comprises genes related to SLE. This suggests some cross-phenotype (CP) association with LC and SLE. The significance was higher in the subgroup of AdenoLC patients than within other histological subtypes and in women compared to men. This fits our expectations in view of women, who predominantly develop AdenoLC, are more often affected with SLE than men [41], who predominantly develop smoking-related SqCLC [1, 42]. All p–driving genes identified in this meta-analysis are located within or next to the major histocompatibility complex (MHC) on chromosome 6p21-22 (Fig 2), albeit in two separate areas, about 3000 kb apart. The first area comprises the genes of histone cluster I: HIST1-H4L, -1BN, -2BN, -H2AK, -H4K (the strongest associated marker is rs13194781; OR = 1.23, p = 0.0032). It is well known that a variety of histone related modifications are either related to cancer or to SLE, or to both [8, 43]. They play a role e.g. in DNA repair, cell cycle or gene expression [8, 44], which by themselves are associated to LC or SLE, respectively [23, 45]. Interestingly enough, we detected associations to LC of the DNA signature of histone coding genes, rather than with respect to some kind of epigenetic outcome. The second area comprises the genes C2, C4A, and C4B (the strongest associated markers are rs1270942 and rs389884; OR = 1.27, p = 0.009). It is well established, that reduced gene expression of C2 and C4A can predispose to SLE [46]. This two genes, and perhaps also C4B, are involved in the clearance of apoptotic bodies [8]. This is in turn crucially important for controlling inflammation, which plays a role in the development of LC [3]. However, the identification of disease-relevant genes in the MHC region (6p21–6p22) and far beyond is complicated owing to the strong and extensive LD across both common and rare haplotypes [47]. Hence any observed CP association will probably tag plenty of genes. An association of the gene area APOM/BAG6/MSH5 in the MHC region with LC has previously been reported, which is strongest for SqCLC and AdenoLC [9, 13]. The strongest associations with SqCLC in this area was previously reported for the markers rs3117582 (located within BAG6 and APOM; OR = 1.3, p = 4.5×10−10), which was found associated also with SLE (OR = 2.2, p = 4.2×10-21) [48]. This marker is about 220 kB apart but in strong LD with the newly identified markers rs1270942 and rs389884 (located close to C2; Table 4 and Fig 2). More important, a highly significant correlation between markers of the area C2/C4A/C4B with the expression of the gene APOM in non-neoplastic samples taken from LC patients was also recently reported [40] (Fig 2). APOM is involved in lipid transport and is linked with high-density lipoprotein cholesterol in the pathogenesis of emphysema, which is on the other hand considered as associated with LC [49, 50]. But other explanations of the observed associations have been given, too; for instant a connection to embryonic lethality with defects in the development of the lung (related to the function of BAG6) or deficits in mismatch excision repair (related to the function of MSH5) [13]. Moreover, the association of MSH5 with SLE was reported as not shared with other autoimmune/inflammatory diseases [51]. Apart from all this, some remarks about the applied method need to be made. The whole approach is an intensive investigation of p-values, which—in the context of this project—are indicators of evidence for or against the rejection of a null-hypothesis of no genetic association. We used the program ALIGATOR to perform GSA, which circumvents bias due to uneven counts of markers per gene as well as genes per gene set [32]. Choosing another algorithm would probably lead to different results [31]. In addition, a p-value can be used to justify the existences of an association; however it is not solely determined by the strength of the observed effect, but also by factors like sample size, the used statistical model and the applied test procedure. Hence we can present significance of our findings but are unable to estimate the part of LC risk that can be attributed to the identified genes or gene sets.

Conclusion

We were able to identify CP risk factors by first pooling results of gene set analyses and looking afterwards for those genes driving the significance of discovered gene sets. In doing so, we have discovered a pathway that is currently marked as specific to SLE as being significantly implicated in LC. The gene region 6p21-22 in this pathway appears to be more extensively associated with lung cancer than previously assumed. Given wide-stretched linkage disequilibrium to the area APOM/BAG6/MSH5, there is currently simply not enough information or evidence to conclude whether the potential pleiotropy of LC and SLE is spurious, biological, or mediated. Further research into this pathway and gene region will be necessary.

Detailed study description.

(DOCX) Click here for additional data file.

PRISMA Checklist.

(DOCX) Click here for additional data file.

Meta-analysis on Genetic Association Studies Checklist | PLOS ONE.

(DOCX) Click here for additional data file.
  48 in total

1.  Association of the 15q25 and 5p15 lung cancer susceptibility regions with gene expression in lung tumor tissue.

Authors:  Gord Fehringer; Geoffrey Liu; Melania Pintilie; Jenna Sykes; Dangxiao Cheng; Ni Liu; Zhuo Chen; Lesley Seymour; Sandy D Der; Frances A Shepherd; Ming-Sound Tsao; Rayjean J Hung
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2012-04-26       Impact factor: 4.254

2.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

3.  Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder.

Authors:  Peter Holmans; Elaine K Green; Jaspreet Singh Pahwa; Manuel A R Ferreira; Shaun M Purcell; Pamela Sklar; Michael J Owen; Michael C O'Donovan; Nick Craddock
Journal:  Am J Hum Genet       Date:  2009-06-18       Impact factor: 11.025

Review 4.  XRCC3 and XPD/ERCC2 single nucleotide polymorphisms and the risk of cancer: a HuGE review.

Authors:  Maurizio Manuguerra; Federica Saletta; Margaret R Karagas; Marianne Berwick; Fabrizio Veglia; Paolo Vineis; Giuseppe Matullo
Journal:  Am J Epidemiol       Date:  2006-05-17       Impact factor: 4.897

5.  Hierarchical modeling identifies novel lung cancer susceptibility variants in inflammation pathways among 10,140 cases and 11,012 controls.

Authors:  Darren R Brenner; Paul Brennan; Paolo Boffetta; Christopher I Amos; Margaret R Spitz; Chu Chen; Gary Goodman; Joachim Heinrich; Heike Bickeböller; Albert Rosenberger; Angela Risch; Thomas Muley; John R McLaughlin; Simone Benhamou; Christine Bouchardy; Juan Pablo Lewinger; John S Witte; Gary Chen; Shelley Bull; Rayjean J Hung
Journal:  Hum Genet       Date:  2013-02-01       Impact factor: 4.132

6.  Previous lung diseases and lung cancer risk: a pooled analysis from the International Lung Cancer Consortium.

Authors:  Darren R Brenner; Paolo Boffetta; Eric J Duell; Heike Bickeböller; Albert Rosenberger; Valerie McCormack; Joshua E Muscat; Ping Yang; H-Erich Wichmann; Irene Brueske-Hohlfeld; Ann G Schwartz; Michele L Cote; Anne Tjønneland; Søren Friis; Loic Le Marchand; Zuo-Feng Zhang; Hal Morgenstern; Neonila Szeszenia-Dabrowska; Jolanta Lissowska; David Zaridze; Peter Rudnai; Eleonora Fabianova; Lenka Foretova; Vladimir Janout; Vladimir Bencko; Miriam Schejbalova; Paul Brennan; Ioan N Mates; Philip Lazarus; John K Field; Olaide Raji; John R McLaughlin; Geoffrey Liu; John Wiencke; Monica Neri; Donatella Ugolini; Angeline S Andrew; Qing Lan; Wei Hu; Irene Orlow; Bernard J Park; Rayjean J Hung
Journal:  Am J Epidemiol       Date:  2012-09-17       Impact factor: 4.897

7.  Lung cancer risk in never-smokers: a population-based case-control study of epidemiologic risk factors.

Authors:  Darren R Brenner; Rayjean J Hung; Ming-Sound Tsao; Frances A Shepherd; Michael R Johnston; Steven Narod; Warren Rubenstein; John R McLaughlin
Journal:  BMC Cancer       Date:  2010-06-14       Impact factor: 4.430

8.  Transancestral mapping of the MHC region in systemic lupus erythematosus identifies new independent and interacting loci at MSH5, HLA-DPB1 and HLA-G.

Authors:  Michelle M A Fernando; Jan Freudenberg; Annette Lee; David Lester Morris; Lora Boteva; Benjamin Rhodes; María Francisca Gonzalez-Escribano; Miguel Angel Lopez-Nevot; Sandra V Navarra; Peter K Gregersen; Javier Martin; Timothy J Vyse
Journal:  Ann Rheum Dis       Date:  2012-01-10       Impact factor: 19.103

9.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

10.  META-GSA: Combining Findings from Gene-Set Analyses across Several Genome-Wide Association Studies.

Authors:  Albert Rosenberger; Stefanie Friedrichs; Christopher I Amos; Paul Brennan; Gordon Fehringer; Joachim Heinrich; Rayjean J Hung; Thomas Muley; Martina Müller-Nurasyid; Angela Risch; Heike Bickeböller
Journal:  PLoS One       Date:  2015-10-26       Impact factor: 3.240

View more
  7 in total

Review 1.  Lung cancer mimicking systemic lupus erythematosus: case-based review.

Authors:  Jia Liu; Song Hu; Min Niu; Hua Wang; Yan Wang; Ning Tang; Bin Liu
Journal:  Rheumatol Int       Date:  2019-10-14       Impact factor: 2.631

Review 2.  A review on SLE and malignancy.

Authors:  May Y Choi; Kelsey Flood; Sasha Bernatsky; Rosalind Ramsey-Goldman; Ann E Clarke
Journal:  Best Pract Res Clin Rheumatol       Date:  2017-11-10       Impact factor: 4.098

Review 3.  The roles of cytosolic quality control proteins, SGTA and the BAG6 complex, in disease.

Authors:  Rashi Benarroch; Jennifer M Austin; Fahmeda Ahmed; Rivka L Isaacson
Journal:  Adv Protein Chem Struct Biol       Date:  2018-12-18       Impact factor: 3.507

4.  mRNA Network: Solution for Tracking Chemotherapy Insensitivity in Small-Cell Lung Cancer.

Authors:  Peixin Chen; Shengyu Wu; Jia Yu; Xuzhen Tang; Chunlei Dai; Hui Qi; Junjie Zhu; Wei Li; Bin Chen; Jun Zhu; Hao Wang; Sha Zhao; Hongcheng Liu; Peng Kuang; Yayi He
Journal:  J Healthc Eng       Date:  2021-09-28       Impact factor: 2.682

Review 5.  What advances may the future bring to the diagnosis, treatment, and care of male sexual and reproductive health?

Authors:  Christopher L R Barratt; Christina Wang; Elisabetta Baldi; Igor Toskin; James Kiarie; Dolores J Lamb
Journal:  Fertil Steril       Date:  2022-02       Impact factor: 7.490

6.  Association between SNAP25 and human glioblastoma multiform: a comprehensive bioinformatic analysis.

Authors:  Cheng Yu; Jianxing Yin; Xiefeng Wang; Lijiu Chen; Yutian Wei; Chenfei Lu; Yongping You
Journal:  Biosci Rep       Date:  2020-06-26       Impact factor: 3.840

7.  Gene-gene interaction of AhRwith and within the Wntcascade affects susceptibility to lung cancer.

Authors:  Albert Rosenberger; Nils Muttray; Rayjean J Hung; David C Christiani; Neil E Caporaso; Geoffrey Liu; Stig E Bojesen; Loic Le Marchand; Demetrios Albanes; Melinda C Aldrich; Adonina Tardon; Guillermo Fernández-Tardón; Gad Rennert; John K Field; Michael P A Davies; Triantafillos Liloglou; Lambertus A Kiemeney; Philip Lazarus; Bernadette Wendel; Aage Haugen; Shanbeh Zienolddiny; Stephen Lam; Matthew B Schabath; Angeline S Andrew; Eric J Duell; Susanne M Arnold; Gary E Goodman; Chu Chen; Jennifer A Doherty; Fiona Taylor; Angela Cox; Penella J Woll; Angela Risch; Thomas R Muley; Mikael Johansson; Paul Brennan; Maria Teresa Landi; Sanjay S Shete; Christopher I Amos; Heike Bickeböller
Journal:  Eur J Med Res       Date:  2022-01-31       Impact factor: 2.175

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.