| Literature DB >> 35157032 |
Dhanya Ramachandran1, Joe Dennis2, Laura Fachal2, Peter Schürmann1, Kristine Bousset1, Fabienne Hülse1, Qianqian Mao1, Yingying Wang1, Matthias Jentschke1, Gerd Böhmer3, Hans-Georg Strauß4, Christine Hirchenhain5, Monika Schmidmayr6, Florian Müller7, Ingo Runnebaum8, Alexander Hein9, Frederik Stübs9, Martin Koch9, Matthias Ruebner9, Matthias W Beckmann9, Peter A Fasching9, Alexander Luyten10,11, Matthias Dürst9, Peter Hillemanns1, Douglas F Easton2,12, Thilo Dörk1.
Abstract
Cervical cancer is among the leading causes of cancer-related death in females worldwide. Infection by human papillomavirus (HPV) is an established risk factor for cancer development. However, genetic factors contributing to disease risk remain largely unknown. We report on a genome-wide association study (GWAS) on 375 German cervical cancer patients and 866 healthy controls, followed by a replication study comprising 658 patients with invasive cervical cancer, 1361 with cervical dysplasia and 841 healthy controls. Functional validation was performed for the top GWAS variant on chromosome 14q12 (rs225902, close to PRKD1). After bioinformatic annotation and in silico predictions, we performed transcript analysis in a cervical tissue series of 317 samples and demonstrate rs225902 as an expression quantitative trait locus (eQTL) for FOXG1 and two tightly co-regulated long non-coding RNAs at this genomic region, CTD-2251F13 (lnc-PRKD1-1) and CTD-2503I6 (lnc-FOXG1-6). We also show allele-specific effects of the 14q12 variants via luciferase assays. We propose a combined effect of genotype, HPV status and gene expression at this locus on cervical cancer progression. Taken together, this work uncovers a potential candidate locus with regulatory functions and contributes to the understanding of genetic susceptibility to cervical cancer.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35157032 PMCID: PMC9396939 DOI: 10.1093/hmg/ddac031
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 5.121
Figure 1GWAS workflow and outcomes. (A) Schematic diagram of the study design. (B) A quantile–quantile plot of the expected and observed −log10 P values on the X-axis and Y-axis, respectively. The red line indicates an ideal population with no underlying sub-structure. (C) Manhattan plot depicting −log10 P values after regression analysis on the Y-axis and chromosome and base pair position on the X-axis. A red line indicates genome-wide significance (GWS) at P = 5x10E-8 and the dark blue line is at P = 1 × 10E-5. Chromosomes are coloured alternatively in light and dark blue. (D) SNiPA regional association plot ±2500 bp near the variant rs225902 on 14q12.
Stratified analysis of the top variant at chromosome 14, rs225902, after genotyping in the Cervigen cohort, in Stage I (Oncoarray) and Stage II (non-Oncoarray), and overall combined analysis of the two cohorts
| Stratum | Stage I | Stage II | Combined | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| OR (95% CI) |
|
|
| OR (95% CI) |
|
|
| OR (95% CI) |
| |
| Low-grade dysplasia | 0 | 854 | NA | NA | 234 | 813 | 1.08 (0.82; 1.44) | 0.579 | 234 | 1667 | 1.15 (0.87; 1.51) | 0.317 |
| High-grade dysplasia | 0 | 854 | NA | NA | 1114 | 813 | 1.19 (0.99; 1.43) | 0.059 | 1114 | 1667 | 1.26 (1.08; 1.47) | 0.003 |
| Invasive | 362 | 854 | 1.75 (1.38; 2.22) | 4.7 × 10E-6 | 654 | 813 | 1.21 (0.99; 1.47) | 0.061 | 1016 | 1667 | 1.40 (1.20; 1.63) | 1.4 × 10E-5 |
| Invasive and high-grade dysplasia | 362 | 854 | 1.75 (1.38; 2.22) | 4.7 × 10E-6 | 1768 | 813 | 1.20 (1.02; 1.42) | 0.033 | 2130 | 1667 | 1.33 (1.17; 1.51) | 1.6 × 10E-5 |
| All cases | 362 | 854 | 1.75 (1.38; 2.22) | 4.7 × 10E-6 | 2002 | 813 | 1.19 (1.01; 1.40) | 0.042 | 2364 | 1667 | 1.31 (1.15; 1.49) | 2.9 × 10E-5 |
Cervical intraepithelial neoplasia (CIN) was differentiated into low-grade (CIN1 + CIN2 < 30 years) and high-grade (CIN2 ≥ 30 years + CIN3) groups. ICC was further combined with high-grade dysplasia, followed by a joint analysis over all cases. Indicated are the number of cases (nCases), number of controls (nControls), OR with CI for the minor allele and P-values (P) generated after stratified logistic regression analyses restricted to the disease subtype.
Haplotype analysis of the variants rs225902 (C>T) and rs225957 (C>T) after genotyping in Stage I (Oncoarray) and Stage II (non-Oncoarray), as well as overall combined analysis
| Haplotype | Stage I | Stage II | Overall | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Frequency | OR (95% CI) |
| Frequency | OR (95% CI) |
| Frequency | OR (95% CI) |
| |
| CC | 0.622 | 0.64 (0.54; 0.77) | 9.09 × 10−7 | 0.651 | 0.94 (0.84; 1.06) | 0.2861 | 0.642 | 0.89 (0.81; 0.98) | 0.0114 |
| CT | 0.236 | 1.19 (0.97; 1.46) | 0.086 | 0.201 | 0.96 (0.84; 1.10) | 0.5724 | 0.212 | 0.96 (0.87; 1.07) | 0.5188 |
| TT | 0.111 | 1.67 (1.29; 2.20) | 8.22 × 10−5 | 0.134 | 1.26 (1.06; 1.50) | 0.0061 | 0.127 | 1.39 (1.21; 1.59) | 1.14 × 10−6 |
| TC | 0.031 | 1.72 (1.05; 2.79) | 0.0172 | 0.013 | 0.67 (0.42; 1.09) | 0.0807 | 0.019 | 0.85 (0.61; 1.17) | 0.2893 |
Frequency of the genotype in the cohorts is indicated, OR with 95% CI and P-values (p) generated after Fisher’s chi-square test.
Figure 2Annotation of the top GWAS variant rs225902. (A) Ensembl browser GrCh37 snapshot of the chromosome 14 region surrounding the variant in detail, with transcript annotation and regulatory marks. (B) Screenshot of the Eukaryotic Promoter database (EPD) view at UCSC browser (hg19) showing putative PRKD1 promoters (PRKD1_1 and PRKD1_2), along with regulatory marks in the region (H3K4Me1, H3K4me3, Pol2, H3K27Ac, DNaseI hypersensitivity sites, Transcription Factor ChIP-seq, Common SNPs v151 and repeat sequences). (C) Circos plot generated in FUMA shows various levels of information at rs225902 locus in chromosome 14. The outer layer is the GWAS P value for the SNP rs225902, the next layer shows the position in the genomic context with darker blue indicating identified risk loci, next are genes with known chromatin interactions with the variant in orange, known eQTLs in green, and genes having evidence for both interactions with the variant, in red. (D) CONSITE prediction of TF binding motifs ±20 bp near the variant rs225902 with the common allele ‘G’ and the rare allele ‘A’.
Figure 3Promoter and enhancer analysis via luciferase assays. (A) Comparison of the putative PRKD1 promoter and pGL3 SV40 promoter to pGL3 basic construct [P values after two-sided t-test are indicated with pGL3 basic as control (denoted by ‘C’), and error bars indicate ±standard error of the mean (SEM)], in HeLa cells. NBiological replicates = 2; NTechnical replicates = 2. (B) Comparison of constructs containing enhancer sequences with ancestral (WT, rs225902 ‘G/C’, rs225957 ‘G/C’) and minor alleles (SNP, rs225902 ‘A/T’, rs225957 ‘A/T’) of rs225902 and rs225957 and their combinations to the pGL3 SV40 promoter in HeLa cells [two-sided t-test P values are indicated with pGL3 Promoter as control (‘C’) and also between selected bars indicated by a line, error bars indicate ±SEM]. NBiological replicates = 4; NTechnical replicates = 3.
Figure 4Gene transcript correlations, effect of HPV and eQTL analysis in cervical epithelial tissue cohort. (A) Correlation of transcript levels of CTD2251F13 and CTD2503I6, FOXG1 and PRKD1 [indicated are Pearson R values, P values and number of samples (n)]. (B) Log10 relative mean levels (±SEM) of various transcripts at the chromosome 14 locus were associated with the HPV status of samples (P values after t-test between HPV positive and negative groups). (C) Log10-transformed relative mean levels (±SEM) of transcripts, in all samples were tested for association with the genotype of the variant rs225902 under the allelic model of inheritance shows cis-eQTLs. (D) Log10-transformed relative mean levels (±SEM) of transcripts in samples without HPV infection were tested for association with the genotype of the variant rs225902 under the allelic model of inheritance shows cis-eQTLs. P values shown are from two-sided t-test between groups, unless otherwise indicated as P values after ANOVA (for 3 groups).
Annotation and features of the lncRNA CTD-2251F13 and CTD-2503I6 via NONCODE and LNCipedia
| Feature |
|
|
|---|---|---|
| LNCipedia transcript ID | lnc-PRKD1-1:4 | lnc-FOXG1-6:17 |
| LNCipedia gene ID | lnc-PRKD1-1 | lnc-FOXG1-6 |
| Ensembl Gene ID | ENSG00000248975 | ENSG00000257120 |
| Ensembl Transcript ID | ENST00000549360 | ENST00000550941 |
| Gene Location (GrCh37) | chr14: 30421603-30 766 249 | chr14: 30,122,015-30,127,122 |
| Strand | Minus | Plus |
| Class | Intronic | Antisense |
| Sequence Ontology term | Sense intronic ncRNA | Antisense lncRNA |
| Transcript size | 768 bp | 551 bp |
| Exons | 3 | 2 |
| Number of transcripts | 3 | 1 |
| All transcripts | AL133372.2-201 (ENST00000548124.1—922 bp); AL133372.2-202 (ENST00000549360.1—768 bp); AL133372.2-203 (ENST00000508469.2—233 bp); | AL356756.1 (ENST00000550941.1); |
| Alternative gene names | ENSG00000248975; CTD-2251F13.1; OTTHUMG00000170491.1; AL133372.2 | ENSG00000257120.1; CTD-2503I6.1; OTTHUMG00000170489.1; AL356756.1 |
| Alternative transcript names | ENST00000549360.1; CTD-2251F13.1-002; OTTHUMT00000409381.1; NONHSAT036216 | ENST00000550941.1; CTD-2503I6.1-001; OTTHUMT00000409378.1; NONHSAT036209 |
Sequence-based (non-) coding potential of the lncRNA transcripts calculated by CPC2
| Sequence ID | Strand | Label | Coding probability | Peptide length (aa) | Fickett score | Isoelectric point | ORF integrity |
|---|---|---|---|---|---|---|---|
| CTD-2251F13.1-201 | Forward | Noncoding | 0.00590617 | 21 | 0.41452 | 0.41452 | Complete |
| CTD-2251F13.1-201 | Reverse | Noncoding | 0.00800117 | 12 | 0.4106 | 0.4106 | Complete |
| CTD-2251F13.1-202 | Forward | Noncoding | 0.133846 | 101 | 0.34133 | 0.34133 | Incomplete |
| CTD-2251F13.1-202 | Reverse | Noncoding | 0.0627548 | 75 | 0.34564 | 0.34564 | Complete |
| CTD-2251F13.1-203 | Forward | Noncoding | 0.0802353 | 76 | 0.36306 | 0.36306 | Complete |
| CTD-2251F13.1-203 | Reverse | Noncoding | 0.00337841 | 18 | 0.37955 | 0.37955 | Complete |
| CTD-2503I06.1 | Forward | Noncoding | 0.0659486 | 60 | 0.42321 | 0.42321 | Complete |
| CTD-2503I06.1 | Reverse | Noncoding | 0.0237596 | 43 | 0.41716 | 0.41716 | Complete |
Shown are the label, coding probability, putative peptide length, Fickett Testcode score, putative isoelectric point and ORF integrity.
Figure 5LncRNA localization and expression studies. (A) Localization of the lncRNAs (CTD-2251F13 on the left and CTD-2503I6 on the right) in cellular compartments of HeLa cells quantified by qRT-PCR after sub-cellular fractionation and RNA isolation. P values indicate t tests between the nuclear and cytoplasmic abundance. (B) Effect of high versus low expression quartiles of lncRNA CTD-2251F13 and CTD-2503I6 on PRKD1 transcript levels. P values shown are after t test between groups, error bars ±SEM. (C) Effect of high versus low expression quartiles of PRKD1 on transcript levels of CTD-2251F13 and CTD-2503I6. P values shown are after t test between groups, error bars ±SEM. (D) Genes at chromosome 14 are highly correlated or are influenced by HPV status or rs225902 genotype (eQTL). Combined effect of genotype, HPV and gene expression at the chromosome 14 locus contributes to CC progression.