| Literature DB >> 28148898 |
Xue Fang1,2, Zhihua Yin1,2, Xuelian Li1,2, Lingzi Xia1,2, Xiaowei Quan1,2, Yuxia Zhao3, Baosen Zhou1,2.
Abstract
DNA genotype can affect gene expression, and gene expression can influence the onset and progression of diseases. Here we conducted a comprehensive study, we integrated analysis of gene expression profile and single nucleotide polymorphism (SNP) microarray data in order to scan out the critical genetic changes that participate in the onset and development of non-small cell lung cancer (NSCLC). Gene expression profile datasets were downloaded from the GEO database. Firstly, differentially expressed genes (DEGs) between NSCLC samples and adjacent normal samples were identified. Next, by STRING database, protein-protein interaction (PPI) network was constructed. At the same time, hub genes in PPI network were identified. Then, some functional SNPs in hub genes that may affect gene expression have been annotated. Finally, we carried a study to explore the relationship between functional SNPs and NSCLC risk and overall survival in Chinese female non-smokers. A total of 488 DEGs were identified in our study. There are 29 proteins with a higher degree of connectivity in the PPI network, including FOS, IL6 and MMP9. By using database annotation, we got 8 candidate functional SNPs that may affect the expression level of hub proteins. In the case-control study, we found that rs4754-T allele, rs959173-C allele and rs2239144-G allele were the protective allele of NSCLC risk. In dominant model, rs4754-CT+TT genotype were associated with a shorter survival time. In general, our study provides a novel research direction in the field of multi-omic data integration, and helps us find some critical genetic changes in disease.Entities:
Keywords: differentially expressed genes; functional single nucleotide polymorphism; non-small cell lung cancer; risk; survival
Mesh:
Year: 2017 PMID: 28148898 PMCID: PMC5386658 DOI: 10.18632/oncotarget.14836
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Volcano plot of differentially expressed genes
(A) DEGs of lung adenocarcinoma (B) DEGs of lung squamous cell carcinoma.
Figure 2PPI network of differentially expressed genes (DEGs)
Each node represents one DEG; edges indicate the interaction relationship.
Figure 3The hub genes in PPI network and their corresponding degree
(A) The number of direct interactions of genes in the PPI network.
Characteristics of NSCLC cases and cancer-free controls
| Variables | Cases (%) | Controls (%) | |
|---|---|---|---|
| Females | 402 | 395 | |
| Mean age (years) | 56.45 ± 11.45 | 56.13 ± 11.64 | 0.692 |
| Histological | |||
| Adenocarcinoma | 322 (80.1%) | ||
| Squamous cell carcinoma | 66 (16.4%) | ||
| Othersa | 14 (3.5%) |
a including adenosquamous carcinoma, and large cell lung cancer.
Single nucleotide polymorphism in hub genes
| SNP | Gene | position | Major/minor allele | Function predication | |
|---|---|---|---|---|---|
| rs4754 | chr4:88902691 | SPP1 | synonymous | C/T | Splicing (ESE or ESS)a |
| rs959173 | chr7:116182053 | CAV1 | intron | T/C | eQTLb + TFBSb |
| rs2069837 | chr7:22768026 | IL6 | intron | A/G | TFBSabc |
| rs2066992 | chr7:22768248 | IL6 | intron | T/G | TFBSabc |
| rs2239144 | chr12:6196182 | VWF | intron | G/T | TFBSbc |
| rs7306706 | chr12:6215633 | VWF | intron | G/A | eQTLb |
| rs3181385 | chr14:24787587 | ADCY4 | 3′UTR | T/C | miRNA binding sitea |
| rs423490 | chr19:6697405 | C3 | synonymous | G/A | Splicing (ESE or ESS)a |
Abbreviations: ESE, exonic splicing enhancer; ESS, exonic splicing silencer; eQTL, expression Quantitative Trait Loci; TFBS, transcription factor binding site.
apredict by SNPinfo web server; b predict by Regulome DB database, c predict by HaploReg database.
Distribution of genotypes and ORs for NSCLC cases and cancer free controls
| SNP | Genotype | NSCLC cases (%) | Controls (%) | Adjusted ORa | 95% CI | ||
|---|---|---|---|---|---|---|---|
| Rs4754 | CC | 214 (53.2) | 183 (46.3) | 0.464 | Ref | ||
| CT | 160 (39.8) | 167 (42.3) | 0.820 | 0.612, 1.100 | 0.185 | ||
| TT | 28 (7.0) | 45 (11.4) | 0.530 | 0.317, 0.884 | 0.015* | ||
| Dominant model | 0.759 | 0.574, 1.002 | 0.052 | ||||
| Recessive model | 0.583 | 0.356, 0.955 | 0.032* | ||||
| Additive model | T allele | 0.762 | 0.614, 0.946 | 0.014* | |||
| Rs959173 | TT | 373 (92.8) | 348 (88.1) | 0.686 | Ref | ||
| TC | 28 (7.0) | 46 (11.6) | 0.567 | 0.347, 0.928 | 0.024* | ||
| CC | 1 (0.2) | 1 (0.3) | 0.949 | 0.059, 15.327 | 0.971 | ||
| Dominant model | 0.576 | 0.354, 0.936 | 0.026* | ||||
| Recessive model | 1.019 | 0.063, 16.444 | 0.990 | ||||
| Additive model | C allele | 0.600 | 0.376, 0.957 | 0.032* | |||
| Rs2069837 | AA | 260 (64.7) | 264 (66.8) | 0.548 | Ref | ||
| AG | 123 (30.6) | 120 (30.4) | 1.039 | 0.766, 1.408 | 0.806 | ||
| GG | 19 (4.7) | 11 (2.8) | 1.754 | 0.819, 3.759 | 0.148 | ||
| Dominant model | 1.099 | 0.820, 1.473 | 0.527 | ||||
| Recessive model | 1.731 | 0.813, 3.688 | 0.155 | ||||
| Additive model | G allele | 1.141 | 0.888, 1.467 | 0.301 | |||
| Rs2066992 | TT | 185 (46.0) | 201 (50.9) | 0.658 | Ref | ||
| TG | 174 (43.3) | 159 (40.3) | 1.185 | 0.883, 1.590 | 0.257 | ||
| GG | 43 (10.7) | 35 (8.9) | 1.342 | 0.823, 2.190 | 0.239 | ||
| Dominant model | 1.213 | 0.918, 1.602 | 0.174 | ||||
| Recessive model | 1.229 | 0.768, 1.965 | 0.390 | ||||
| Additive model | G allele | 1.169 | 0.944, 1.447 | 0.152 | |||
| Rs2239144 | GG | 124 (30.8) | 169 (42.8) | 0.270 | Ref | ||
| GT | 190 (47.3) | 171 (43.3) | 1.508 | 1.105, 2.058 | 0.010* | ||
| TT | 88 (21.9) | 55 (13.9) | 2.183 | 1.450, 3.287 | < 0.001* | ||
| Dominant model | 1.675 | 1.252, 2.240 | 0.001* | ||||
| Recessive model | 1.733 | 1.197, 2.509 | 0.004* | ||||
| Additive model | T allele | 1.513 | 1.237, 1.850 | < 0.001* | |||
| Rs7306706 | GG | 168 (41.8) | 154 (39.0) | 0.064 | Ref | ||
| GA | 181 (45.0) | 171 (43.3) | 0.970 | 0.718, 1.313 | 0.845 | ||
| AA | 53 (13.2) | 70 (17.7) | 0.695 | 0.457, 1.056 | 0.086 | ||
| Dominant model | 0.890 | 0.670, 1.181 | 0.419 | ||||
| Recessive model | 0.705 | 0.479, 1.039 | 0.077 | ||||
| Additive model | A allele | 0.855 | 0.698, 1.047 | 0.130 | |||
| Rs3181385 | TT | 343 (85.3) | 355 (89.9) | 0.074 | Ref | ||
| TC+CC | 59 (14.7) | 40 (10.1) | 1.523 | 0.992, 2.337 | 0.054 | ||
| Additive model | A allele | 1.373 | 0.915, 2.061 | 0.126 | |||
| Rs423490 | GG | 347 (86.3) | 323 (81.8) | 0.155 | Ref | ||
| GA | 54 (13.4) | 71 (18.0) | 0.708 | 0.482, 1.041 | 0.079 | ||
| AA | 1 (0.2) | 1 (0.3) | 0.941 | 0.059, 15.126 | 0.966 | ||
| Dominant model | 0.711 | 0.485, 1.043 | 0.081 | ||||
| Recessive model | 0.993 | 0.062, 15.939 | 0.996 | ||||
| Additive model | A allele | 0.736 | 0.512, 1.058 | 0.098 |
Distribution of genotypes and survival time of patients
| SNP | Genotype | NSCLC (%) ( | MST (mon) | Log-rank | Adjusted HRa | 95% CI |
|---|---|---|---|---|---|---|
| Rs4754 | CC | 168 (53.8) | 25.124 | Ref | ||
| CT | 121 (38.8) | 20.583 | 0.054 | 1.354 | 1.051,1.743* | |
| TT | 23 (7.4) | 24.172 | 1.037 | 0.638,1.685 | ||
| Dominant model | 21.181 | 0.039* | 1.289 | 1.013,1.642* | ||
| Recessive model | 23.218 | 0.625 | 0.908 | 0.567,1.454 | ||
| Rs959173 | TT | 289 (92.6) | 22.875 | Ref | ||
| TC+CC | 23 (7.4) | 28.555 | 0.195 | 0.720 | 0.445,1.163 | |
| Rs2069837 | AA | 203 (65.1) | 23.116 | Ref | ||
| AG | 94 (30.1) | 22.876 | 0.552 | 1.013 | 0.777,1.319 | |
| GG | 15 (4.8) | 28.470 | 0.717 | 0.379,1.357 | ||
| Dominant model | 23.627 | 0.811 | 0.968 | 0.751,1.248 | ||
| Recessive model | 23.039 | 0.278 | 0.711 | 0.378,1.338 | ||
| Rs2066992 | TT | 142 (45.5) | 23.086 | Ref | ||
| TG | 135 (43.3) | 23.150 | 0.929 | 0.995 | 0.770,1.285 | |
| GG | 35 (11.2) | 24.772 | 0.919 | 0.616,1.372 | ||
| Dominant model | 23.468 | 0.886 | 0.977 | 0.767,1.244 | ||
| Recessive model | 23.110 | 0.701 | 0.930 | 0.636,1.360 | ||
| Rs2239144 | GG | 97 (31.1) | 21.946 | Ref | ||
| GT | 138 (44.2) | 23.096 | 0.262 | 0.923 | 0.698,1.220 | |
| TT | 77 (24.7) | 25.583 | 0.770 | 0.556,1.068 | ||
| Dominant model | 23.972 | 0.255 | 0.860 | 0.666,1.110 | ||
| Recessive model | 22.517 | 0.125 | 0.808 | 0.606,1.075 | ||
| Rs7306706 | GG | 134 (42.9) | 23.807 | Ref | ||
| GA | 137 (43.9) | 22.759 | 0.855 | 1.090 | 0.841,1.413 | |
| AA | 41 (13.1) | 23.553 | 1.052 | 0.719,1.539 | ||
| Dominant model | 22.926 | 0.592 | 1.074 | 0.842,1.371 | ||
| Recessive model | 23.248 | 0.976 | 0.998 | 0.699,1.426 | ||
| Rs3181385 | TT | 267 (85.6) | 23.298 | Ref | ||
| TC+CC | 45 (14.4) | 23.372 | 0.903 | 0.982 | 0.691,1.396 | |
| Rs423490 | GG | 268 (85.9) | 23.821 | Ref | ||
| GA+AA | 44 (14.1) | 19.818 | 0.197 | 1.250 | 0.889,1.758 |
Figure 4Genotypes of rs4754 SNP site in SPP1 and its association with NSCLC survival time