| Literature DB >> 35773316 |
Wenjun Yang1,2,3, Hongliang Liu2,4, Ruoxin Zhang2,4,5,6, Jennifer A Freedman2,7, Younghun Han8, Rayjean J Hung9, Yonathan Brhane9, John McLaughlin10, Paul Brennan11, Heike Bickeboeller12, Albert Rosenberger12, Richard S Houlston13, Neil E Caporaso14, Maria Teresa Landi14, Irene Brueske15, Angela Risch16, David C Christiani17,18, Christopher I Amos19, Xiaoxin Chen20, Steven R Patierno21,22, Qingyi Wei23,24,25,26.
Abstract
Limited efforts have been made in assessing the effect of genome-wide profiling of RNA splicing-related variation on lung cancer risk. In the present study, we first identified RNA splicing-related genetic variants linked to lung cancer in a genome-wide profiling analysis and then conducted a two-stage (discovery and replication) association study in populations of European ancestry. Discovery and validation were conducted sequentially with a total of 29,266 cases and 56,450 controls from both the Transdisciplinary Research in Cancer of the Lung and the International Lung Cancer Consortium as well as the OncoArray database. For those variants identified as significant in the two datasets, we further performed stratified analyses by smoking status and histological type and investigated their effects on gene expression and potential regulatory mechanisms. We identified three genetic variants significantly associated with lung cancer risk: rs329118 in JADE2 (P = 8.80E-09), rs2285521 in GGA2 (P = 4.43E-08), and rs198459 in MYRF (P = 1.60E-06). The combined effects of all three SNPs were more evident in lung squamous cell carcinomas (P = 1.81E-08, P = 6.21E-08, and P = 7.93E-04, respectively) than in lung adenocarcinomas and in ever smokers (P = 9.80E-05, P = 2.70E-04, and P = 2.90E-05, respectively) than in never smokers. Gene expression quantitative trait analysis suggested a role for the SNPs in regulating transcriptional expression of the corresponding target genes. In conclusion, we report that three RNA splicing-related genetic variants contribute to lung cancer susceptibility in European populations. However, additional validation is needed, and specific splicing mechanisms of the target genes underlying the observed associations also warrants further exploration.Entities:
Year: 2022 PMID: 35773316 PMCID: PMC9247007 DOI: 10.1038/s41698-022-00281-9
Source DB: PubMed Journal: NPJ Precis Oncol ISSN: 2397-768X
Fig. 1Study flowchart.
CEU Caucasian, MAF minor allele frequency, FDR false discovery rate, eQTL expression quantitative trait loci.
Fig. 2Association results and functional prediction of lung cancer risk-associated potential splicing SNPs.
a Manhattan plot of the overall results. There were 295 SNPs related to RNA splicing with a nominal P < 0.05, 14 of which remained with FDR < 0.20. The x-axis indicates the chromosome number and the y-axis shows the association P values with lung cancer risk (as −log10 P values). The horizontal blue line represents P values of 0.05, while the red line indicated the FDR threshold 0.20. Regional association plot, which shows the LD between the top SNP rs329118 on JADE2 (b), rs2285521 on GGA2 (c), and rs198459 on MYRF (d), and other SNPs in the region of 500 kb up- or downstream of the top SNP. Locations, functional prediction, and position weight matrix based Sequence Logo of three SNPs. JADE2 rs329118 (e) and GGA2 rs2285521 (f) are located within one CpG island and presented strong signals of active enhancer and promoter functions (indicated by H3K4 methylation, histone modification H3K27 acetylation, and DNase hypersensitivity, respectively). MYRF rs198459 (g) is located within one CpG island and presented strong signals of active enhancer and promoter functions (indicated by H3K4 methylation and DNase hypersensitivity, respectively). The panels were adapted from the UCSC Genome Browser. Three SNPs are located on the AP2B motif (e), the MYOD1 motif (f), and the ELK3 motif (g), respectively.
SNPs associated with lung cancer risk discovered in TRICL-ILLCO consortia and validated in OncoArray dataset.
| SNP rs# | Chr | Position | Allelesa | Encode gene | Discovery in TRICL-ILLCO | Validation in OncoArray | Combination | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAF | OR (95% CI)b | FDR | MAF | OR (95% CI)b | OR (95% CI)b | ||||||||
| rs329118 | 5 | 133861663 | T/C | 0.42 | 0.93 (0.90–0.96) | 5.03E−05 | 0.032 | 0.43 | 0.94 (0.91–0.97) | 5.18E−04 | 0.94 (0.92–0.96) | 8.80E−09 | |
| rs2285521 | 16 | 23521780 | C/T | 0.16 | 1.09 (1.04–1.14) | 5.90E−04 | 0.165 | 0.15 | 1.07 (1.02–1.13) | 4.23E−03 | 1.08 (1.05–1.11) | 4.43E−08 | |
| rs198459 | 11 | 61525020 | A/G | 0.22 | 1.11 (1.06–1.16) | 2.71E−06 | 0.003 | 0.22 | 1.05 (1.01–1.10) | 0.018 | 1.07 (1.04–1.11) | 1.60E−06 | |
| rs58309239 | 4 | 25443366 | G/T | 0.05 | 0.85 (0.78–0.93) | 1.77E−04 | 0.079 | 0.05 | 0.97 (0.90–1.05) | 0.466 | 0.91 (0.84–0.99) | 4.72E−04 | |
| rs3184504 | 13 | 111884608 | T/C | 0.48 | 0.93 (0.90–0.97) | 1.75E−04 | 0.079 | 0.49 | 0.99 (0.96–1.03) | 0.617 | 0.96 (0.94–0.98) | 6.78E−04 | |
| rs2276631 | 2 | 219249013 | T/C | 0.26 | 0.93 (0.90–0.97) | 4.72E−04 | 0.159 | 0.25 | 1.00 (0.96–1.04) | 0.938 | 0.96 (0.93–0.98) | 8.45E−04 | |
Abbreviations: SNP, single nucleotide polymorphism, Chr Chromosome, MAF, minor allele frequency, OR odds ratio, CI confidence interval, FDR false discovery rate.
aEffect allele/Reference allele.
bAdjusted for top principle components.
Associations between three SNPs and lung cancer risk stratified by histologic types and smoking status in all eight lung cancer GWASs and OncoArray dataset.
| Study | Case | Control | rs329118 | rs2285521 | rs198459 | |||
|---|---|---|---|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | OR (95% CI) | ||||||
| Overall | ||||||||
| ICR | 1952 | 5200 | 0.92 (0.86–1.00) | 0.038 | 1.04 (0.94–1.15) | 0.480 | 1.03 (0.94–1.13) | 0.532 |
| MDACC | 1150 | 1134 | 0.95 (0.84–1.07) | 0.407 | 1.17 (0.99–1.39) | 0.064 | 1.18 (1.00–1.40) | 0.047 |
| IARC | 2533 | 3791 | 0.92 (0.85–0.99) | 0.027 | 1.11 (1.01–1.23) | 0.039 | 1.10 (1.00–1.21) | 0.054 |
| NCI | 5713 | 5736 | 0.94 (0.89–0.99) | 0.023 | 1.10 (1.02–1.18) | 0.012 | 1.15 (1.08–1.23) | 4.00E−05 |
| Toronto | 331 | 499 | 0.93 (0.74–1.17) | 0.528 | 0.97 (0.70–1.33) | 0.839 | 1.09 (0.83–1.42) | 0.548 |
| GLC | 481 | 478 | 0.88 (0.73–1.07) | 0.193 | 1.01 (0.78–1.29) | 0.969 | 1.14 (0.89–1.44) | 0.295 |
| Harvard | 984 | 970 | 0.93 (0.82–1.06) | 0.298 | 1.10 (0.91–1.32) | 0.350 | 0.98 (0.84–1.14) | 0.799 |
| deCODE | 1319 | 26380 | 0.94 (0.87–1.02) | 0.135 | 1.01 (0.90–1.14) | 0.867 | 1.02 (0.91–1.14) | 0.728 |
| OncoArray | 14360 | 11555 | 0.94 (0.91–0.97) | 5.18E−04 | 1.07 (1.02–1.13) | 4.23E−03 | 1.05 (1.01–1.10) | 0.018 |
| Overall | 28823 | 55743 | 0.94 (0.92–0.96) | 8.80E−09 | 1.08 (1.05–1.11) | 4.43E−08 | 1.07 (1.04–1.11) | 1.60E−06 |
| Adenocarcinoma | ||||||||
| ICR | 465 | 5200 | 1.01 (0.88–1.15) | 0.938 | 1.07 (0.88–1.29) | 0.504 | 1.03 (0.87–1.22) | 0.714 |
| MDACC | 619 | 1134 | 0.93 (0.80–1.08) | 0.328 | 1.10 (0.90–1.35) | 0.341 | 1.17 (0.96–1.42) | 0.130 |
| IARC | 517 | 2824 | 0.91 (0.79–1.04) | 0.163 | 1.15 (0.96–1.37) | 0.125 | 1.08 (0.90–1.29) | 0.425 |
| NCI | 1841 | 5736 | 0.94 (0.87–1.01) | 0.103 | 1.02 (0.92–1.14) | 0.718 | 1.16 (1.06–1.28) | 0.002 |
| Toronto | 90 | 499 | 0.85 (0.61–1.21) | 0.370 | 1.10 (0.67–1.79) | 0.713 | 0.89 (0.58–1.36) | 0.596 |
| GLC | 186 | 478 | 0.77 (0.59–1.00) | 0.047 | 0.97 (0.69–1.35) | 0.842 | 0.90 (0.65–1.25) | 0.528 |
| Harvard | 597 | 970 | 0.94 (0.81–1.09) | 0.391 | 1.11 (0.89–1.37) | 0.370 | 0.89 (0.75–1.07) | 0.217 |
| deCODE | 547 | 26380 | 0.91 (0.80–1.03) | 0.119 | 0.98 (0.82–1.17) | 0.808 | 1.08 (0.92–1.28) | 0.351 |
| OncoArray | 5161 | 11323 | 0.96 (0.91–1.00) | 0.067 | 1.02 (0.95–1.09) | 0.589 | 1.03 (0.98–1.10) | 0.259 |
| Overall | 10023 | 54544 | 0.95 (0.91–0.98) | 0.011 | 1.04 (0.99–1.09) | 0.076 | 1.05 (1.00–1.12) | 0.029 |
| Squamous cell carcinoma | ||||||||
| ICR | 611 | 5200 | 0.94 (0.83–1.06) | 0.339 | 1.13 (0.96–1.33) | 0.146 | 1.08 (0.93–1.25) | 0.300 |
| MDACC | 306 | 1134 | 1.05 (0.87–1.27) | 0.630 | 1.17 (0.90–1.51) | 0.246 | 1.14 (0.88–1.46) | 0.317 |
| IARC | 911 | 2968 | 0.87 (0.78–0.97) | 0.010 | 1.06 (0.92–1.22) | 0.421 | 1.02 (0.89–1.18) | 0.750 |
| NCI | 1447 | 5736 | 0.90 (0.83–0.98) | 0.019 | 1.22 (1.09–1.36) | 5.34E-04 | 1.12 (1.00–1.25) | 0.040 |
| Toronto | 50 | 499 | 0.92 (0.58–1.47) | 0.733 | 0.93 (0.50–1.76) | 0.835 | 1.14 (0.65–2.03) | 0.643 |
| GLC | 97 | 478 | 1.00 (0.72–1.38) | 0.977 | 1.19 (0.77–1.83) | 0.432 | 1.16 (0.77–1.76) | 0.480 |
| Harvard | 216 | 970 | 0.84 (0.67–1.06) | 0.142 | 0.86 (0.62–1.20) | 0.383 | 1.35 (1.04–1.74) | 0.023 |
| deCODE | 259 | 26380 | 0.92 (0.77–1.09) | 0.335 | 1.07 (0.82–1.39) | 0.618 | 0.91 (0.71–1.15) | 0.426 |
| OncoArray | 3529 | 11323 | 0.91 (0.86–0.96) | 3.00E−04 | 1.14 (1.06–1.22) | 5.00E−04 | 1.06 (0.99–1.14) | 0.073 |
| Overall | 7426 | 54688 | 0.91 (0.88–0.95) | 1.81E−08 | 1.13 (1.08–1.19) | 6.21E−08 | 1.08 (1.03–1.13) | 7.93E−04 |
| Ever smoking | ||||||||
| IARC | 2367 | 2508 | 0.95 (0.88-1.04) | 0.274 | 1.11 (0.99-1.24) | 0.068 | 1.12 (1.01-1.25) | 0.037 |
| Toronto | 236 | 272 | 0.91 (0.68–1.21) | 0.508 | 1.01 (0.69–1.49) | 0.948 | 1.12 (0.79–1.58) | 0.535 |
| GLC | 433 | 258 | 0.88 (0.69–1.14) | 0.337 | 0.86 (0.62–1.18) | 0.356 | 1.09 (0.80–1.49) | 0.600 |
| Harvard | 892 | 809 | 0.95 (0.83–1.10) | 0.504 | 1.11 (0.90–1.36) | 0.333 | 0.99 (0.83–1.17) | 0.870 |
| MDACC | 1150 | 1134 | 0.95 (0.84–1.07) | 0.407 | 1.17 (0.99–1.39) | 0.064 | 1.18 (1.00–1.40) | 0.047 |
| ATBC | 1732 | 1270 | 0.95 (0.85–1.06) | 0.339 | 1.14 (1.00–1.30) | 0.055 | 1.03 (0.88–1.20) | 0.693 |
| CPSII | 600 | 383 | 1.10 (0.90–1.34) | 0.355 | 1.21 (0.92–1.59) | 0.175 | 0.93 (0.74–1.18) | 0.578 |
| EAGLE | 1767 | 1339 | 0.94 (0.84–1.04) | 0.225 | 1.06 (0.91–1.22) | 0.473 | 1.27 (1.13–1.43) | 9.00E−05 |
| PLCO | 1243 | 1344 | 0.88 (0.78–0.99) | 0.039 | 0.97 (0.83–1.15) | 0.740 | 1.23 (1.06–1.43) | 0.006 |
| OncoArray | 12803 | 7613 | 0.94 (0.90–0.98) | 0.003 | 1.06 (1.01–1.12) | 0.031 | 1.09 (1.04–1.15) | 6.00E−04 |
| Overall | 23223 | 16930 | 0.94 (0.91–0.97) | 9.80E-05 | 1.07 (1.03–1.12) | 2.70E−04 | 1.12 (1.06–1.18) | 2.90E−05 |
| Never smoking | ||||||||
| IARC | 159 | 1253 | 0.87 (0.68–1.11) | 0.253 | 1.08 (0.78–1.49) | 0.647 | 1.09 (0.79–1.49) | 0.602 |
| Toronto | 95 | 217 | 0.96 (0.65–1.42) | 0.843 | 0.90 (0.50–1.61) | 0.712 | 1.04 (0.66–1.64) | 0.871 |
| GLC | 35 | 220 | 0.80 (0.47–1.36) | 0.409 | 0.49 (0.19–1.26) | 0.140 | 1.18 (0.58–2.39) | 0.652 |
| Harvard | 92 | 161 | 0.86 (0.59–1.27) | 0.461 | 1.07 (0.63–1.83) | 0.803 | 0.86 (0.55–1.35) | 0.520 |
| CPSII | 86 | 275 | 1.35 (0.92–1.97) | 0.124 | 0.96 (0.53–1.73) | 0.893 | 1.23 (0.77–1.97) | 0.384 |
| EAGLE | 138 | 634 | 1.01 (0.77–1.34) | 0.920 | 1.30 (0.87–1.93) | 0.199 | 0.95 (0.68–1.33) | 0.780 |
| PLCO | 126 | 470 | 1.01 (0.70–1.44) | 0.975 | 1.18 (0.72–1.92) | 0.513 | 1.08 (0.69–1.68) | 0.735 |
| OncoArray | 1343 | 3463 | 0.96 (0.88–1.05) | 0.397 | 1.07 (0.94–1.22) | 0.282 | 0.96 (0.86–1.07) | 0.409 |
| Overall | 2074 | 6693 | 0.96 (0.89–1.03) | 0.215 | 1.07 (0.96–1.19) | 0.155 | 0.98 (0.90–1.08) | 0.892 |
Abbreviations: GWAS genome-wide association study, AD adenocarcinoma, SC squamous cell carcinoma, OR odds ratio, CI confidence interval, I2 heterogeneity statistic.
Fig. 3Functional analyses of rs329118 on JADE2, rs2285521 on GGA2, and rs198459 on MYRF.
Correlation between JADE2 rs329118 and JADE2 mRNA expression levels in additive (a), dominant (b), and recessive (c) models in 373 blood cells from 373 Europeans individuals in 1000 genomes project (P = 0.094, 0.487 and 0.027, respectively). Correlation between GGA2 rs2285521 and GGA2 mRNA expression levels in additive (d), dominant (e), and recessive (f) models in 373 blood cells from 373 Europeans individuals in 1000 genomes project (P = 5.30 × 10−4, 0.0013 and 0.034, respectively). Correlation between GGA2 rs2285521 (g) and MYRF rs198459 (h), and mRNA expression levels in normal lung tissues or whole blood cells of GTEx project (P = 0.014 and P = 6.20 × 10−10, respectively). i, j Correlation between EARS2 rs6497670 in additive (i) and dominant (j) models in lung cancer tissues of TCGA project (P = 4.85 × 10−3 and P = 8.29 × 10−3, respectively). k Pair-wise LD plot between GGA2 rs2285521 (T>C) and EARS2 rs6497670 (C>T). a–j P value was calculated from linear regression. Center-line indicates the median expression level across all participants in that group, and the hinges represent the lower (Q1) and upper (Q3) quartile, with lower whisker indicating the smallest value within 1.5 interquartile range (IQR) below Q1 and upper whisker indicating the largest value within 1.5 IQR above Q3.
Fig. 4Diagram representation of the alternative splicing pattern of GGA2 transcripts and the amino acid (aa) sequence, protein structures, and domains of the GGA2-isoforms.
a Results of SNP rs2285521 sequencing in the 53 bp sequence of 5’ UTR region in Exon 1 of the GGA2: Genomic structure of the GGA2-X1 and GGA2-X2. Splicing pattern of the GGA2 variant with SNP rs2285521 T>C. The position of the A nucleotide in the start codon (ATG) is defined as +1. b Part of RNA secondary structure of the GGA2 rs2285521 U and GGA2 rs2285521 C. c The aa sequence of GGA2-X1 and GGA2-X2. The GGA2-X1 isoform encodes a 576 aa protein and GGA2-X2 encodes a 526 aa protein. The aa sequence of red is where GGA2-X2 starts. d Part of the secondary structures and domains of putative isoforms of GGA2-X1 and GGA2-X2. aa1-aa50 circled by red was the part two isoforms differed.