| Literature DB >> 33021972 |
Honggang Ke1, Yunyu Wu2, Runjie Wang3, Xiaohong Wu4.
Abstract
BACKGROUND This study aimed to identify important marker genes in lung adenocarcinoma (LACC) and establish a prognostic risk model to predict the risk of LACC in patients. MATERIAL AND METHODS Gene expression and methylation profiles for LACC and clinical information about cases were downloaded from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, respectively. Differentially expressed genes (DEGs) and differentially methylated genes (DMGs) between cancer and control groups were selected through meta-analysis. Pearson coefficient correlation analysis was performed to identify intersections between DEGs and DMGs and a functional analysis was performed on the genes that were correlated. Marker genes and clinical factors significantly related to prognosis were identified using univariate and multivariate Cox regression analyses. Risk prediction models were then created based on the marker genes and clinical factors. RESULTS In total, 1975 DEGs and 2095 DMGs were identified. After comparison, 16 prognosis-related genes (EFNB2, TSPAN7, INPP5A, VAMP2, CALML5, SNAI2, RHOBTB1, CKB, ATF7IP2, RIMS2, RCBTB2, YBX1, RAB27B, NFATC1, TCEAL4, and SLC16A3) were selected from 265 overlapping genes. Four clinical factors (pathologic N [node], pathologic T [tumor], pathologic stage, and new tumor) were associated with prognosis. The prognostic risk prediction models were constructed and validated with other independent datasets. CONCLUSIONS An integrated model that combines clinical factors and gene markers is useful for predicting risk of LACC in patients. The 16 genes that were identified, including EFNB2, TSPAN7, INPP5A, VAMP2, and CALML5, may serve as novel biomarkers for diagnosis of LACC and prediction of disease prognosis.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33021972 PMCID: PMC7549534 DOI: 10.12659/MSM.925833
Source DB: PubMed Journal: Med Sci Monit ISSN: 1234-1010
The gene expression profiling and methylation profiling datasets in this study.
| GEO accession | Platform | Total probe number | Total sample | Normal sample | Cancer sample | |
|---|---|---|---|---|---|---|
| Gene expression | GSE75037 | GPL6884 Illumina | 48803 | 166 | 83 | 83 |
| GSE33532 | GPL570 Affymetrix | 54675 | 60 | 40 | 20 | |
| GSE43458 | GPL6244 Affymetrix | 33297 | 110 | 80 | 30 | |
| GSE30219 | GPL570 Affymetrix | 54675 | 98 | 84 | 14 | |
| GSE32863 | GPL6884 Illumina | 48803 | 116 | 58 | 58 | |
| GSE10072 | GPL96 Affymetrix | 22283 | 107 | 58 | 49 | |
| Gene methylation | GSE32861 | GPL8490 Illumina | 27578 | 118 | 59 | 59 |
| GSE49996 | GPL8490 Illumina | 27578 | 88 | 44 | 44 | |
| GSE63384 | GPL8490 Illumina | 27578 | 70 | 35 | 35 | |
| GSE62948 | GPL8490 Illumina | 27578 | 56 | 28 | 28 | |
GEO – Gene Expression Omnibus.
Clinical information from The Cancer Genome Atlas (TCGA) and GSE62254 datasets.
| Clinical characteristics | TCGA (N=335) | GSE37745 (N=106) |
|---|---|---|
| Age (years, mean±SD) | 65.19±10.25 | 62.94±9.22 |
| Sex (Male/Female) | 155/180 | 46/60 |
| Pathologic M (M0/M1/–) | 226/13/96 | – |
| Pathologic N (N0/N1/N2/–) | 214/60/55/6 | – |
| Pathologic T (T1/T2/T3/T4/–) | 111/180/29/14/1 | – |
| Pathologic stage (I/II/III/IV) | 180/81/61/13 | 70/19/13/4 |
| Radiation therapy (yes/no/–) | 41/254/40 | – |
| Targeted molecular therapy (yes/no/–) | 99/194/42 | – |
| Tobacco smoking history (current/reformed/never/–) | 70/206/45/14 | – |
| Recurrence (yes/no/–) | 104/176/55 | 26/27 |
| Death (dead/alive) | 120/215 | 77/29 |
| Recurrence-free survival time (months, mean±SD) | 22.27±27.77 | 54.11±53.48 |
| Overall survival time (months, mean±SD) | 27.54±29.74 | 61.74±49.96 |
‘–’ – Represents information unavailable.
MetaQC quality control of 6 expression profiling datasets and 4 methylation profiling datasets.
| IQC | EQC | CQCg | CQCp | AQCg | AQCp | SMR | |
|---|---|---|---|---|---|---|---|
| Gene expression profiling | |||||||
| GSE75037 | 5.27 | 3.23 | 106.65 | 158.86 | 32.71 | 90.88 | 1.62 |
| GSE32863 | 4.38 | 3.16 | 64.14 | 146.51 | 26.46 | 96.74 | 2.42 |
| GSE33532 | 4.81 | 3.23 | 59.25 | 171.49 | 25.50 | 84.37 | 2.86 |
| GSE43458 | 6.09 | 1.10 | 101.10 | 114.30 | 19.53 | 29.46 | 3.92 |
| GSE30219 | 6.64 | 3.71 | 83.97 | 107.69 | 47.87 | 63.89 | 4.33 |
| GSE10072 | 8.06 | 9.19 | 12.24 | 8.92 | 9.78 | 14.52 | 7.76 |
| Methylation profiling | |||||||
| GSE32861 | 9.80 | 5.00 | 19.24 | 41.01 | 6.17 | 24.77 | 3.28 |
| GSE49996 | 6.22 | 4.96 | 46.70 | 42.02 | 8.67 | 33.56 | 3.14 |
| GSE63384 | 7.56 | 3.05 | 24.57 | 33.79 | 3.45 | 17.84 | 5.67 |
| GSE62948 | 5.11 | 3.63 | 59.25 | 60.27 | 29.49 | 84.37 | 3.25 |
IQC – internal quality control; EQC – external quality control; CQC – consistency quality control; AQC – accuracy quality control; SMR – standardized mean rank score.
Figure 1MetaQC quality control charts of (A) 5 gene expression profiles and (B) 2 gene methylation profiles. The horizontal and vertical axes represent the first and second principal components in principal component analysis. The numbers represent the corresponding datasets.
Figure 2Heatmaps of (A) significant differentially expressed genes and (B) differentially methylated genes obtained based on MetaDE screening.
Figure 3Correlation analysis of expression levels and methylation levels of 265 genes in (A) TCGA and (B) the GSE62950 dataset. The horizontal axis represents the gene expression level, the vertical axis represents the gene methylation level, the oblique line represents the trend line synthesized by points, and the red font represents the correlation coefficient (CC) and the significant P value.
Functional enrichment analysis results for 265 candidate genes.
| Category | Term | Count | P value | Genes |
|---|---|---|---|---|
| Biologic process | GO: 0032409 ~ regulation of transporter activity | 6 | 0.0002 | PLCG2, NDFIP1, PKD2, FKBP1B, NKX2-5, SYNGR3 |
| GO: 0009611 ~ response to wounding | 21 | 0.0005 | PPARA, A2M, ACHE, BMP2, UCN, FOXA2, EFEMP2, ATRN, CHST2, HOXB13, SERPING1, CD40, TNFRSF1B, THBD, PLSCR4, CTGF, PLA2G7, LTA4H, CFD, PLAU, ACVR1 | |
| GO: 0050777 ~ negative regulation of immune response | 5 | 0.0007 | A2M, IL27RA, NDFIP1, CTLA4, SERPING1 | |
| GO: 0048585 ~ negative regulation of response to stimulus | 8 | 0.0013 | PPARA, A2M, TNFRSF1B, IL27RA, NDFIP1, CTLA4, SERPING1, NT5E | |
| GO: 0015718 ~ monocarboxylic acid transport | 6 | 0.0013 | SLC16A3, SLC25A20, PPARA, SLC16A1, PLA2G1B, SLCO2A1 | |
| GO: 0055082 ~ cellular chemical homeostasis | 16 | 0.0016 | FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, GCKR, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1 | |
| GO: 0050878 ~ regulation of body fluid levels | 9 | 0.0023 | SCT, UCN, THBD, PLSCR4, FOXA2, EFEMP2, SERPING1, CD40, PLAU | |
| GO: 0006869 ~ lipid transport | 9 | 0.0028 | SLC25A20, PPARA, OSBPL3, SORL1, LIPG, PLA2G1B, VPS4B, VLDLR, SLCO2A1 | |
| GO: 0031348 ~ negative regulation of defense response | 5 | 0.0028 | A2M, TNFRSF1B, NDFIP1, SERPING1, NT5E | |
| GO: 0050801 ~ ion homeostasis | 16 | 0.0033 | FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, CPS1, FKBP1B, CKB, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1 | |
| GO: 0035295 ~ tube development | 11 | 0.0036 | BMP2, FOXA2, CTGF, CRISPLD2, TGFBR1, HOXB13, PCSK5, NKX2-5, HECA, MYCN, ACVR1 | |
| GO: 0006873 ~ cellular ion homeostasis | 15 | 0.0037 | FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1 | |
| GO: 0010876 ~ lipid localization | 9 | 0.0045 | SLC25A20, PPARA, OSBPL3, SORL1, LIPG, PLA2G1B, VPS4B, VLDLR, SLCO2A1 | |
| GO: 0019725 ~ cellular homeostasis | 17 | 0.0046 | FXYD1, TRPM8, PDIA2, IL6ST, NDFIP1, TP53, FZD2, FKBP1B, CKB, GCKR, PLCG2, CLDN1, PKD2, RGN, SV2A, KCNH2, KCNQ1 | |
| GO: 0048878 ~ chemical homeostasis | 18 | 0.0049 | FXYD1, TRPM8, IL6ST, NDFIP1, TP53, FZD2, CPS1, FKBP1B, CKB, GCKR, PLCG2, LIPG, CLDN1, PKD2, RGN, SV2A, KCNH2, | |
| KEGG pathway | hsa00562: Inositol phosphate metabolism | 5 | 0.0017 | ISYNA1, PLCG2, SYNJ2, ITPKB, INPP5A |
| hsa04610: Complement and coagulation cascades | 5 | 0.0037 | A2M, THBD, SERPING1, CFD, PLAU | |
| hsa04070: Phosphatidylinositol signaling system | 5 | 0.0046 | PLCG2, SYNJ2, ITPKB, CALML5, INPP5A | |
| hsa00532: Chondroitin sulfate biosynthesis | 3 | 0.0060 | B3GAT1, XYLT1, CHSY1 | |
| hsa05217: Basal cell carcinoma | 4 | 0.0393 | BMP2, TP53, WNT11, FZD2 | |
| hsa00534: Heparan sulfate biosynthesis | 3 | 0.0081 | B3GAT1, XYLT1, HS3ST1 | |
| hsa00590: Arachidonic acid metabolism | 4 | 0.0082 | AKR1C3, CYP2C18, PLA2G1B, LTA4H | |
| hsa04514: Cell adhesion molecules (CAMs) | 6 | 0.0093 | NRCAM, CDH15, CLDN1, CTLA4, CD40, SDC2 | |
| hsa00340: Histidine metabolism | 3 | 0.0098 | HDC, LCMT2, MAOB |
KEGG – Kyoto Encyclopedia of Genes and Genomes.
Tumor marker genes significantly associated with prognosis.
| Gene | Coefficient | Hazard ratio | Lower.95 | Upper.95 | P value |
|---|---|---|---|---|---|
| EFNB2 | 0.7121 | 2.0384 | 1.5210 | 2.7317 | <0.0001 |
| TSPAN7 | −0.5824 | 0.5586 | 0.4380 | 0.7123 | <0.0001 |
| INPP5A | −1.4730 | 0.2292 | 0.1103 | 0.4762 | <0.0001 |
| VAMP2 | 1.4277 | 4.1690 | 2.0004 | 8.6885 | 0.0001 |
| CALML5 | 0.2006 | 1.2221 | 1.0996 | 1.3582 | 0.0002 |
| SNAI2 | 0.5449 | 1.7245 | 1.2434 | 2.3916 | 0.0011 |
| RHOBTB1 | 0.6348 | 1.8867 | 1.2467 | 2.8552 | 0.0027 |
| CKB | −0.3511 | 0.7039 | 0.5578 | 0.8884 | 0.0031 |
| ATF7IP2 | −0.4666 | 0.6272 | 0.4299 | 0.9149 | 0.0155 |
| RIMS2 | 0.1523 | 1.1645 | 1.0227 | 1.3259 | 0.0215 |
| RCBTB2 | −0.6106 | 0.5430 | 0.3189 | 0.9247 | 0.0246 |
| YBX1 | 0.7766 | 2.1740 | 1.0909 | 4.3325 | 0.0273 |
| RAB27B | 0.2554 | 1.2909 | 1.0276 | 1.6218 | 0.0283 |
| NFATC1 | −0.5289 | 0.5892 | 0.3660 | 0.9487 | 0.0295 |
| TCEAL4 | −0.6401 | 0.5272 | 0.2933 | 0.9476 | 0.0324 |
| SLC16A3 | −0.4125 | 0.6620 | 0.4520 | 0.9696 | 0.0341 |
Univariate and multivariate Cox regression analyses of clinical factors.
| Clinical characteristics | Univariate Cox regression | Multivariate Cox regression | ||
|---|---|---|---|---|
| P value | HR (95%CI) | P value | HR (95%CI) | |
| Age (above/below median, 65 years) | 0.4370 | 1.155 (0.804~1.659) | – | – |
| Sex (Male/Female) | 0.7450 | 1.062 (0.741~1.52) | – | – |
| Pathologic M (M0/M1) | 0.1310 | 1.692 (0.848~3.378) | – | – |
| Targeted molecular therapy (yes/no) | 0.1601 | 1.366 (0.883~ 2.114) | – | v |
| Tobacco smoking history (current/reformed/never) | 0.9900 | 1.002 (0.737~1.362) | – | – |
| Radiation therapy (yes/no) | 0.0035 | 2.033 (1.25~3.307) | 0.5924 | 1.163 (0.669~2.019) |
| Pathologic N (N0/N1/N2) | <0.0001 | 1.85 (1.494~2.29) | 0.0471 | 1.439 (1.005~2.060) |
| Pathologic T (T1/T2/T3/T4) | 0.0002 | 1.537 (1.223~1.932) | 0.0169 | 1.236 (0.914~1.672) |
| Pathologic stage (I/II/III/IV) | <0.0001 | 1.671 (1.413~1.976) | 0.0103 | 1.279 (0.952~1.718) |
| New tumor (yes/no) | <0.0001 | 2.362 (1.535~3.634) | 0.0001 | 2.395 (1.533~3.742) |
HR – hazard ratio.
Figure 4Bidirectional hierarchical cluster heatmaps based on 16 gene expression and methylation levels. The first line under the cluster tree represents pathologic N information, and the change from light orange to deep orange represents N0 to N2. The second line represents the pathologic T information, and the change from light blue to dark blue represents T1 to T4. The third line represents pathologic stage information, and the change from light green to dark green represents stages I to IV. The fourth line represents new tumor information, and the blue and gold represent the samples without and with new tumor, respectively.
Prognostic time for different risk classification models of the TCGA and GSE37745 dataset.
| Overall survival time (months, mean±SD) | Recurrence-free survival time (months, mean±SD) | ||||
|---|---|---|---|---|---|
| Low-risk | High-risk | Low-risk | High-risk | ||
| TCGA | Gene expression model | 33.64±37.17 | 21.41±17.71 | 28.18±35.04 | 15.79±13.97 |
| Clinic factor model | 29.03±35.46 | 27.01±27.76 | 25.74±32.68 | 19.32±22.79 | |
| Combined model | 33.55±40.09 | 22.33±17.78 | 28.33±35.87 | 16.31±14.57 | |
| GSE37745 | Gene expression model | 68.66±47.08 | 53.69±52.46 | 70.53±57.47 | 37.06±43.82 |
| Clinic factor model | 84.51±62.38 | 54.51±50.67 | 62.53±55.77 | 30.66±39.29 | |
| Combined model | 84.50±62.38 | 54.51±50.69 | 62.53±55.77 | 30.66±39.29 | |
TCGA – The Cancer Genome Atlas.
Figure 5(A) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (B) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (C) AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. (D) The Kaplan-Meier curves for the risk prediction model based on clinical factors and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (E) The Kaplan-Meier curves for the risk prediction model based on clinical factors and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (F) The AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. (G) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes combined with clinical factors and OS prognosis (left) and recurrence prognosis (right) in TCGA training set. (H) The Kaplan-Meier curves for the risk prediction model based on tumor marker genes combined with clinical factors and OS prognosis (left) and recurrence prognosis (right) in the GSE37745 validation set. (I) The AUROC curves for the prognosis prediction model and OS prognosis and recurrence prognosis in TCGA training set and the GSE37745 verification set. The green and blue curves in (C, F, I) represent the AUROC curves for OS prognosis and recurrence prognosis in TCGA and the black and red curves represent the AUROC curves of OS prognosis and recurrence prognosis in the GSE37745 verification set.
Clinical and chi-square test information for samples in clusters 1 and 2.
| Clinical characteristics | Cluster 1 | Cluster 2 | X-squared | P value | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pathologic N (N0/N1/N2) | 93 | 36 | 28 | – | 121 | 24 | 27 | – | 5.4091 | 0.0467 |
| Pathologic T (T1/T2/T3/T4) | 48 | 93 | 14 | 5 | 63 | 87 | 15 | 9 | 2.8225 | 0.4198 |
| Pathologic stage (I/II/III/IV) | 82 | 42 | 31 | 5 | 98 | 39 | 30 | 8 | 1.5735 | 0.6654 |
| New tumor (yes/no) | 47 | 84 | – | – | 57 | 92 | – | – | 0.0823 | 0.7742 |