Yu Zhang1, Lan Lyu2, Wei Wang3, Liwei Zhang4. 1. Xinjiang Medical University, Department of thoracic surgery, Feicheng Hospital Affiliated to Shandong First Medical University, China. 2. Department of Plastic Surgery, Feicheng Hospital Affiliated to Shandong First Medical University, China. 3. Department of Expert's Outpatient, Feicheng Hospital Affiliated to Shandong First Medical University, China. 4. Xinjiang Medical University, China.
Abstract
Adenocarcinoma is the most common pathological type of lung cancer. The E2F7 transcription factor has been confirmed to be related to the occurrence and development of a variety of solid tumors, but the relationship with the prognosis of lung cancer is still unclear. Therefore, we conducted this study to explore the prognostic value of E2F7 for lung adenocarcinoma (LUAD) patients. In this study, we analyzed samples from the Cancer Genome Atlas (TCGA) to study the correlation between the expression of E2F7 and clinical features, the difference in expression between tumors and normal tissues, the prognostic and diagnostic value, and Enrichment analysis of related genes. All statistical analysis uses R statistical software (version 3.6.3). The result shows that the expression level of E2F7 in LUAD was significantly higher than that of normal lung tissue (P = 1e-34). High expression of E2F7 was significantly correlated with gender (P = .034), pathologic stage (P = .046) and M stage (P = .025). Multivariate Cox analysis confirmed that E2F7 is an independent risk factor for OS in LUAD patients (P = .027). Genes related to cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase showed differential enrichment in the E2F7 high expression group. In short, high expression of E2F7 is an independent risk factor for OS in LUAD patients and has a high diagnostic value.
Adenocarcinoma is the most common pathological type of lung cancer. The E2F7 transcription factor has been confirmed to be related to the occurrence and development of a variety of solid tumors, but the relationship with the prognosis of lung cancer is still unclear. Therefore, we conducted this study to explore the prognostic value of E2F7 for lung adenocarcinoma (LUAD) patients. In this study, we analyzed samples from the Cancer Genome Atlas (TCGA) to study the correlation between the expression of E2F7 and clinical features, the difference in expression between tumors and normal tissues, the prognostic and diagnostic value, and Enrichment analysis of related genes. All statistical analysis uses R statistical software (version 3.6.3). The result shows that the expression level of E2F7 in LUAD was significantly higher than that of normal lung tissue (P = 1e-34). High expression of E2F7 was significantly correlated with gender (P = .034), pathologic stage (P = .046) and M stage (P = .025). Multivariate Cox analysis confirmed that E2F7 is an independent risk factor for OS in LUAD patients (P = .027). Genes related to cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase showed differential enrichment in the E2F7 high expression group. In short, high expression of E2F7 is an independent risk factor for OS in LUAD patients and has a high diagnostic value.
Lung cancer is the second most common malignant tumor in the world. In 2020, there will be approximately 2.2 million new cases of lung cancer worldwide. Lung cancer has become the leading cause of cancer deaths, accounting for about 18% of the total cancer deaths (about 1.8 million cases).[ Among them, lung adenocarcinoma is the most common pathological type.[ Early lung cancer has no obvious symptoms, so most patients are already in the advanced stage when they are diagnosed, which leads to a generally low survival rate of lung cancer patients.The E2F transcription factor family play a key role in the occurrence and development of tumors due to its important cell functions related to cell cycle regulation and apoptosis.[ As a newly discovered member of the E2F family in recent years, unlike other family members, E2F7 has two special DNA-binding domains (DBD) in structure, lacks the binding domain to the RB protein, and does not need to bind to dimerizing proteins to enter the nucleus.[E2F7 is a priming factor involved in cell cycle regulation, apoptosis and differentiation, involved in the late stage of mitosis, embryonic development, DNA stress response, and is likely to participate in the occurrence of tumors.[ As an epithelial transcription inhibitor, amplification, overexpression or deletion of E2F7 can be observed in many malignant tumors, and it can affect tumor differentiation, proliferation and metastasis by interacting with different downstream targets. E2F7 is abnormally expressed in glioma,[ colon cancer[ and breast cancer,[ and has an important influence on the occurrence and development of a variety of tumors.In view of this, we conducted this study to explore the expression of E2F7 in lung adenocarcinoma (LUAD) and analyze its correlation with clinical parameters, diagnostic and prognostic value of LUAD patients.
2. Materials and Method
2.1. Patient data set
E2F7 mRNA expression data (including 594 samples, data format: FPKM) and clinical characteristics data are downloaded from the TCGA database. The data for pan-cancer analysis is from UCSC XENA (https://xenabrowser.net/datapages/). It is the RNAseq data in TPM format of TCGA and GTEx that has been uniformly converted by the Toil process. Inclusion criteria: 1. Sufficient survival information; 2. Definite gene expression value.All our data come from public databases such as GEO and TCGA. The patients involved in the database have obtained ethical approval. Our research is based on open-source data and therefore does not require ethics committee approval for the study.
2.2. Statistical analysis
The median of E2F7 expression was selected as the critical value, and the Wilcoxon signed rank test was used to test the differential expression of E2F7 in LUAD and normal tissues, and the results were displayed by box plots. Wilcoxon rank sum test and Dunn's test were used to testing whether the expression of E2F7 is related to clinical features in LUAD. Kaplan-Meier analysis was performed to compare the differences in OS and DSS between E2F7 high and low expression groups, and to draw survival curves.[ The pROC package and the ggplot2 package are used to study the role of E2F7 prognosis and draw the ROC curve, where AUC represents the diagnostic value. Univariate Cox regression analysis was used to screen potential prognostic factors, and multivariate Cox regression was used to verify the independent predictive value of multiple indicators including E2F7 for prognosis. The rms package and survival package are used to draw nomograms to show the relationship between various variables and survival rates. The clusterProfiler package and the org.Hs.eg.db package are used for the enrichment analysis of GO and KEGG.[ The clusterProfiler package and the ggplot2 package are used to perform GSEA enrichment analysis and plotting. In addition, we used an independent GEO data set (GSE50081) for external verification. The difference in the expression of E2F7 in pan-tumor and normal tissues is verified in UCSC XENA (https://xenabrowser.net/datapages/)[ and Timmer database (https://cistrome.shinyapps.io/timmer/). All statistical analysis uses R statistical software (version 3.6.3).
3. Result
3.1. Baseline characteristics of included patients
A total of 535 patients diagnosed with lung adenocarcinoma were included in this study, and the data of these patients were all obtained through the TCGA data portal. The detailed clinical characteristics are shown in Table 1. Among the included patients, 249 were males (46.5%) and 286 were females (53.5%). Regarding the TNM staging of lung cancer: 294 patients were stage I, 123 patients were stage II, 84 patients were stage III, and 26 patients were stage IV. The median age of patients in the E2F7 high and low expression group was 65 and 67 years, respectively, and the results were not statistically different (P = .346). Regarding surgical treatment: the number of patients undergoing R0, R1, and R2 resection was 355, 13, and 4, respectively, and there was no significant difference between the groups (P = .186). In gender (P = .041), number pack year smoked (P = .018), M stage (P = .034) and OS event (P = .041), there are significant differences between the 2 groups.
We compared the expression levels of E2F7 in LUAD and normal lung tissues. Taking the median of the gene expression level of CCNA2 as the cutoff value, the patients were divided into high expression group and low expression group. The results of the study on unpaired samples showed that the expression of E2F7 in LUAD was higher than that of normal lung tissue (P = 1e-34) (Fig. 1A). In the paired samples of LUAD and normal lung tissue, this conclusion was verified. (P = 2.7e-10) (Fig. 1B).
Figure 1.
E2F7 expression in LUAD tissues. A. E2F7expression in normal and tumor tissue. B. E2F7 expression in paired tissue.
E2F7 expression in LUAD tissues. A. E2F7expression in normal and tumor tissue. B. E2F7 expression in paired tissue.
3.3. E2F7 expression and clinical characteristics
The logistic regression analysis results of the correlation between E2F7 expression level and clinical characteristics are summarized in Table 2. The high expression of E2F7 was significantly correlated with gender (P = .034), pathologic stage (P = .046) and M stage (P = .025). As shown in Figure 2, the Mann-Whitney U test results verify the correlation between E2F7 expression and gender (P = .029) and the number pack-years smoked (P = .002). The results of multiple hypothesis test (Dunn's test) using Bonferroni method to correct the significance level show that the difference between SD and PD (P.adj = .037), CR and PD (P.adj = .001) was statistically significant. The same result appeared in the comparison of tumor and normal tissue (P < .001).
Table 2
Logistic analysis of the association between E2F7 expression and clinical characteristics.
Characteristics
Total (N)
OR (95%CI)
P value
Age (>65 vs <=65)
516
0.817 (0.578–1.154)
.253
Gender (Male vs Female)
535
1.447 (1.030–2.038)
.034
Race (Asian&White vs Black or African American)
468
0.806 (0.455–1.415)
.453
Smoker (Yes vs No)
521
1.457 (0.892–2.402)
.135
Pathologic stage (Stage IV vs Stage I)
320
2.408 (1.047–6.029)
.046
T stage (T2&T3&T4 vs T1)
532
1.383 (0.963–1.993)
.080
N stage (N1 vs N0)
443
0.983 (0.623–1.548)
.941
M stage (M1 vs M0)
386
2.794 (1.186–7.337)
.025
Anatomic neoplasm subdivision (Left vs Right)
520
1.105 (0.777–1.572)
.577
Anatomic neoplasm subdivision2 (Central Lung vs Peripheral Lung)
Box plot of E2F7 expression of LUAD patients according to different clinical characteristics. A. Gender. B. Primary therapy outcome. C. Number pack years smoked. D. Cancer status. ns, P ≥ .05; *P < .05; **P < .01; ***P < .001.
Logistic analysis of the association between E2F7 expression and clinical characteristics.CR = complete response, PD = progressive disease, PR = partial response, SD = stable disease.Box plot of E2F7 expression of LUAD patients according to different clinical characteristics. A. Gender. B. Primary therapy outcome. C. Number pack years smoked. D. Cancer status. ns, P ≥ .05; *P < .05; **P < .01; ***P < .001.
3.4. E2F7 high expression is an independent prognostic risk factor
Kaplan-Miere survival analysis of all adenocarcinoma patients showed that high expression of E2F7 was associated with shorter OS (P = .002) (Fig. 3A). The results of subgroup analysis showed that in patients with T2/T3/T4 (P = .001) (Fig. 3B), N0 (P = .001) (Fig. 3C), M0 (P < 0.001) (Fig. 3D), E2F7 was highly expressed Significantly related to shorter OS. In terms of DSS (P = .005), E2F7 showed similar results (Fig. 3E), at T2/T3 (P = .021) (Fig. 3F) The prognostic value of E2F7 in the subgroups of, N0 (P < .001) (Fig. 3G), M0 (P = .023) (Fig. 3H) and smoker (P = .019) (Fig. 3I) was also verified.
Figure 3.
Kaplan-Meier curve for survival in LUAD. A. OS of all patients. B. OS of T2/T3/T4. C. OS of N0. D. OS of M0. E. DSS of all patients. F. DSS of T2/T3. G. DSS of N0. H. DSS of stage I. DSS of smoker.
Kaplan-Meier curve for survival in LUAD. A. OS of all patients. B. OS of T2/T3/T4. C. OS of N0. D. OS of M0. E. DSS of all patients. F. DSS of T2/T3. G. DSS of N0. H. DSS of stage I. DSS of smoker.Univariate Cox regression results show T stage (P = .002), N stage (P < .001), M stage (P = .006), pathologic stage (P < .001), primary therapy outcome (P < .001), residual tumor (P < .001) and E2F7 (P = .002) were significantly related to poor prognosis. Further multivariate Cox analysis confirmed that pathologic stage (P = .040), primary therapy outcome (P < .001), residual tumor (P = .001) and E2F7 (P = .027) are independent factors affecting the prognosis of LUAD patients (Fig. 4, Table 3).
Figure 4.
Forest plot of the multivariate Cox regression analysis of OS in LUAD.
Table 3
Univariate and multivariate Cox regression analysis of the relationship between clinical characteristics and overall survival.
Characteristics
Total (N)
Univariate analysis
Multivariate analysis
Hazard ratio (95% CI)
P value
Hazard ratio (95% CI)
P value
T stage (T2&T3&T4 vs T1)
523
1.728 (1.229–2.431)
.002
1.202 (0.696–2.074)
.509
N stage (N1&N2&N3 vs N0)
510
2.601 (1.944–3.480)
<.001
1.494 (0.916–2.437)
.107
M stage (M1 vs M0)
377
2.136 (1.248–3.653)
.006
1.225 (0.468–3.208)
.680
Age (>65 vs <=65)
516
1.223 (0.916–1.635)
.172
Gender (Male vs Female)
526
1.070 (0.803–1.426)
.642
Pathologic stage (Stage III&Stage IV vs Stage I&Stage II)
Univariate and multivariate Cox regression analysis of the relationship between clinical characteristics and overall survival.CR = complete response, PD = progressive disease, PR = partial response, SD = stable disease.Forest plot of the multivariate Cox regression analysis of OS in LUAD.
3.5. The diagnostic value of E2F7
We performed ROC curve analysis on the expression data of E2F7, and the results showed that this index has a high diagnostic value for patients with LUAD (AUC = 0.913, 95%CI: 0.888–0.939) (Fig. 5A). Further subgroup analysis verified its diagnostic value in stage I/II (AUC = 0.912) (Fig. 5B), stage III/IV (AUC = 0.929) (Fig. 5C), T1/T2 (AUC = 0.910) (Fig. 5D), T3/T4 (AUC = 0.932) (Fig. 5E), N0 (AUC = 0.906) (Fig. 5F), N1–3 (AUC = 0.930) (Fig. 5G) and M0 (AUC = 0.910) (Fig. 5H).
Figure 5.
ROC curve of E2F7 expression in LUAD. A. Normal vs LUAD. B. Stage I/II. C. Stage III/IV. D. T1/T2. E. T3/T4. F. N0. G. N1-3. H. M0.
ROC curve of E2F7 expression in LUAD. A. Normal vs LUAD. B. Stage I/II. C. Stage III/IV. D. T1/T2. E. T3/T4. F. N0. G. N1-3. H. M0.Due to its high diagnostic value, we combined E2F7 with clinical variables widely considered to be related to prognosis to construct a nomogram to predict the 1-, 3-, and 5-year survival probability (Fig. 6).
Figure 6.
Nomogram for predicting the probability of LUAD patients with 1-, 3-, 5-year OS.
Nomogram for predicting the probability of LUAD patients with 1-, 3-, 5-year OS.
3.6. E2F7 related signal pathways
We performed GO/KEGG enrichment analysis on E2F7. Under the conditions of P.adj < 0.1 and q value<0.2, there are 6 BPs, 12 CCs, 1 MF, and KEGG 2 signal pathways (Table 4).
Table 4
Gene sets enriched in the high E2F7 expression phenotype.
BP = biological process, CC = cellular component, KEGG = Kyoto Encyclopedia of Genes and Genomes, MF = molecular function.
P.adj<.05 and FDR q-value<.2 were considered as significantly enriched.
Gene sets enriched in the high E2F7 expression phenotype.BP = biological process, CC = cellular component, KEGG = Kyoto Encyclopedia of Genes and Genomes, MF = molecular function.P.adj<.05 and FDR q-value<.2 were considered as significantly enriched.We performed GSEA on the data set of high and low expression of E2F7 to determine the differentially activated signaling pathways in LUAD. A total of 39 data sets satisfy FDR (q value) <0.25 and P.adjust < 0.05. Cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase and other pathway-related genes showed enrichment in the high E2F7 expression group (Fig. 7).
Figure 7.
Enrich plots from GSEA. The possible biological processes or signaling pathways of E2F7.
Enrich plots from GSEA. The possible biological processes or signaling pathways of E2F7.
3.7. Verification through other independent external databases
We used an independent GEO dataset (GSE50081) containing 127 LUAD patients to further verify the above results. The results of the Kaplan-Meier survival analysis confirmed the prognostic value of E2F7 for LUAD patients (Fig. 8A–C).
Figure 8.
Kaplan-Meier curve for survival in LUAD patients in the validation datasets GSE50081. A. OS of all patients. B. OS of ex-smoker. C. Disease-free survival (DFS) of all patients.
Kaplan-Meier curve for survival in LUAD patients in the validation datasets GSE50081. A. OS of all patients. B. OS of ex-smoker. C. Disease-free survival (DFS) of all patients.We used the Timmer database to perform pan-tumor E2F7 expression analysis and showed that E2F7 is highly expressed in a variety of solid tumors including LUAD (Fig. 9A). We also integrated the pan-tumor analysis of the two databases of TCGA and GTEx and reached similar conclusions (Fig. 9B).
Figure 9.
Expression analysis of E2F7 in different types of human tumors. A. Timmer databases. B. TCGA+ GTEx.
Expression analysis of E2F7 in different types of human tumors. A. Timmer databases. B. TCGA+ GTEx.
4. Discussion
In our study, the expression of E2F7 in many tumors including LUAD was higher than normal, and its expression level was higher in men and patients greater than 40 number pack-year, and it was related to the primary therapy outcome of disease, that is It is said that patients with the progressive disease have higher expression of E2F7. Subsequent survival analysis also showed that high expression of E2F7 is an independent risk factor for OS, and it has a high diagnostic value. This provides a basis for E2F7 to judge the prognosis of LUAD patients in future clinical work. Genes related to cell cycle checkpoints, DNA damage, telomere stress-induced senescence, DNA methylation, chromosome maintenance, and mitogenic pathways showed significant enrichment in the E2F7 high expression group, suggesting that E2F7 affects lung adenocarcinoma The potential mechanism of occurrence and development provides an important reference for further exploration of its mechanism through experiments in the future.The occurrence and development of malignant tumors is a complex process involving multiple genes and their expressed proteins. Transcription is the beginning of gene expression and is strictly regulated by transcription factors (TFs) and its cofactors, RNA polymerase, and chromatin-modifying proteins.[E2Fs are an important family of transcription factors, which have been confirmed to be involved in the process of cell proliferation,[ differentiation,[ apoptosis,[ cycle regulation[ and DNA damage response.[ So far, a total of 8 family members have been discovered (E2F1-E2F8). According to their different functions, E2Fs are divided into transcription activators (E2F1–3) and transcription repressors (E2F4–8), and according to their structure, they are divided into typical E2Fs (E2F1–6) and atypical E2Fs (E2F7–8). The clinical value of many E2Fs members in the diagnosis and treatment of many solid tumors has been affirmed.[E2F7 is different from the typical E2Fs members in that it binds to DNA in a non-DP protein way to play a transcriptional inhibitory effect.[ Studies have shown that E2F7 can inhibit cell proliferation by inhibiting the transcription of proliferation-related miRNAs.[ However, in recent years, more and more studies have shown that E2F7 plays a role in promoting tumor occurrence and development in tumors. Chu et al. reported that the overexpression of E2F7 in breast cancer can inhibit miR-15a/16 transcription, cause Cyclin E1 and Bcl-2 to participate in tumor invasion and metastasis, and increase the resistance of breast cancer cells to tamoxifen.[In our study, the expression of E2F7 in a variety of solid tumors was analyzed through the Timmer database and UCSC XENA. The results showed that E2F7 is highly expressed in LUAD, lung squamous cell carcinoma (LUSC), esophageal squamous cell carcinoma (ESCA) and other solid tumors.In previous existing studies, there is no content about the prognostic value of E2F7 expression in LUAD patients. In this study, the diagnostic value of E2F7 was analyzed on the TCGA database by means of bioinformatics analysis. Kaplan-Meier survival analysis showed that high expression of E2F7 was associated with shorter OS and DSS, and this conclusion was verified in the GEO dataset. Multivariate Cox analysis further confirmed that the expression of E2F7 is independently related to OS of patients with LUAD. Other clinical features, such as local advanced stage, lymph node metastasis, distant metastasis, later TNM staging, and the degree of surgical resection are closely related, and are also related to poor prognosis. We further constructed a nomogram of the prognosis of LUAD patients based on clinical variables and the expression of E2F7, which provided a basis for clinicians to predict the survival rate of individual patients.The mechanism by which E2F7 mediates the development of tumors is not completely clear. It may promote tumor proliferation, differentiation, infiltration and metastasis through the following methods: (1) E2F7 up-regulates Beclin-1 and mediates autophagy induced by miR-129 Trigger autophagy flux[; (2) E2F7 increases the expression level of vimentin, reduces the expression of E-cadherin protein, and promotes the EMT process[; (3) As the transcriptional activators of VEGFA, E2F7 cooperates with HIF-1α to induce the transcription of VEGFA and promote blood vessel Generation[; (4) Induce the transcription of collagen and calcium-binding domains and Flt to promote the generation of lymphatic vessels.[ Our study found that the expression of E2f7 is related to pathways such as cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase. Our research results are related to the above-mentioned mechanisms, but these mechanisms need further research to confirm.Although our study provides a new method to explore the relationship between E2F7 and the prognosis of lung adenocarcinoma, it still has many limitations. First of all, although we have adopted the GEO database to verify the results of the TCGA database analysis, the study object is still only patients in the public database, which will lead to bias. Secondly, due to the limited sample size and clinical indicator, our research conclusions need to be further confirmed by a large sample of research. Finally, we need further experiments to explore the role of E2F7 in tumor progression and its mechanism of affecting tumors.In short, high expression of E2F7 is an independent risk factor for OS in LUAD patients, and has a high diagnostic value. cell cycle checkpoints, DNA damage telomere stress-induced senescence, DNA methylation, chromosome maintenance and mitotic prophase may be the key pathways through which LUAD is regulated by E2F7.
Authors: Anne F Nichols; Toshiki Itoh; Francesca Zolezzi; Stephanie Hutsell; Stuart Linn Journal: Nucleic Acids Res Date: 2003-01-15 Impact factor: 16.971
Authors: T Itoshima; T Fujiwara; T Waku; J Shao; M Kataoka; W G Yarbrough; T J Liu; J A Roth; N Tanaka; M Kodama Journal: Clin Cancer Res Date: 2000-07 Impact factor: 12.531
Authors: I Thurlings; L M Martínez-López; B Westendorp; M Zijp; R Kuiper; P Tooten; L N Kent; G Leone; H J Vos; B Burgering; A de Bruin Journal: Oncogene Date: 2016-07-25 Impact factor: 9.867