Liyuan Yin1, Yonggang Wang2, Guangzhi Ma1,3, Yunfu Deng1, Qinghua Zhou1. 1. Lung cancer centre, West China Hospital, Sichuan University, Chengdu. 2. Department of Respiratory Medicine, Shandong Provincial Hospital Affiliated to Shandong University, Jinan. 3. Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, P. R. China.
Abstract
Similarities between embryonic development and tumorigenesis are reflected in biological behavior and gene expression. Although the gene signature during development and the clinical phenotype of different cancers show certain correlation pattern, the correlation between early embryo development and cancer remains largely unexplored. To compare the gene expression profile between development and cancer, our study analyzed the gene expression of chorionic villi samples at different gestational ages (6, 7, 8, 9, 10, 40 weeks) obtained from gene expression omnibus (GEO) datasets using correlation test. Then the villi development-related genes that gradually showed a positive correlation (upregulated) (n = 394) or negative correlation (downregulated) (n = 325) with time were used to construct protein-protein interaction (PPI) networks. Three subnetworks among the gradually upregulated genes and 3 subnetworks among the downregulated genes were identified using the molecular complex detection (MCODE) plugin in Cytoscape software. The most significant GO terms for villi-correlated genes were immune, inflammatory response and cell division. These gene clusters were also dysregulated in lung squamous cell carcinoma (SCC). Moreover the prognostic value of the gene clusters was then analyzed with TCGA lung SCC data, which showed 4 clusters that were associated with prognosis. Our results demonstrate the gene expression similarity between development and lung SCC and identified development-associated gene clusters that could contain prognostic information for lung SCC patients.
Similarities between embryonic development and tumorigenesis are reflected in biological behavior and gene expression. Although the gene signature during development and the clinical phenotype of different cancers show certain correlation pattern, the correlation between early embryo development and cancer remains largely unexplored. To compare the gene expression profile between development and cancer, our study analyzed the gene expression of chorionic villi samples at different gestational ages (6, 7, 8, 9, 10, 40 weeks) obtained from gene expression omnibus (GEO) datasets using correlation test. Then the villi development-related genes that gradually showed a positive correlation (upregulated) (n = 394) or negative correlation (downregulated) (n = 325) with time were used to construct protein-protein interaction (PPI) networks. Three subnetworks among the gradually upregulated genes and 3 subnetworks among the downregulated genes were identified using the molecular complex detection (MCODE) plugin in Cytoscape software. The most significant GO terms for villi-correlated genes were immune, inflammatory response and cell division. These gene clusters were also dysregulated in lung squamous cell carcinoma (SCC). Moreover the prognostic value of the gene clusters was then analyzed with TCGA lung SCC data, which showed 4 clusters that were associated with prognosis. Our results demonstrate the gene expression similarity between development and lung SCC and identified development-associated gene clusters that could contain prognostic information for lung SCC patients.
Emerging studies have explored both cellular behavior similarities and molecular resemblance between ontogenesis and carcinogenesis.[ In terms of cellular mobility and invasiveness, both processes involve epithelial-to-mesenchymal transition (EMT).[ To obtain the ability to grow in the host or mother, both tumors and embryos must avoid immune monitoring and form blood vessels to obtain nutrients.[ In addition, important molecules and crucial pathways have been documented to have an association between certain cancerous tissues and developing tissues.[ Baudino et al[ demonstrated that the oncogene c-myc was highly expressed in embryonic stem cells and essential for early embryo development. In addition, many crucial pathways during development, such as the Wnt, FGF, BMP, Notch, and Hedgehog pathways,[ were reactivated during carcinogenesis. The abundant evidence makes it reasonable to collect novel information about cancer by examining embryonic development.Trophoblastic cells which have the characteristic of high proliferation, lacking of cell-contact inhibition and escaping effectors of immune system especially during the first trimester of pregnancy share several common features with malignant cells. As a result, trophoblast cells are defined as a “pseudo-tumor”.[ Therefore, we decided to examine the embryonic development by using a dataset of villi development expression to dig out the similarity between development and cancer.Lung cancer is the leading cause of cancer mortality in humans, with approximately 2.3 million new cases and 1.54 million deaths occurring every year in the USA.[ Non-small cell lung cancer (NSCLC) accounts for 80% of lung cancer cases and mainly includes 2 histological subsets, adenocarcinoma (46.6%) and lung squamous cell carcinoma (SCC) (23.3%).[ In recent years, because of the lack of a molecular targeted therapy for lung SCC, the overall survival (OS) of lung SCC patients is still low.[In this study, we conducted correlation analysis between villus gene expression and gestational week and identified genes correlated with development. Through protein-protein interaction (PPI) network and sub-network analysis, 6 gene clusters were identified. Most of these cluster genes were differentially regulated in lung SCC. Six gene clusters were validated by using survival analysis through downloading lung squamous cancer TCGA data[ and gene expression omnibus (GEO) datasets correspondingly.
Methods
All data analyses were performed using R (http://www.r-project.org/, version 3.4.3) and Bioconductor.
Data collection
The training dataset GSE93520 from Zhang, Feng et al[ was collected from the GEO. The samples from GSE93520 were 36 cases of chorionic villus at 6 to 10 weeks of gestation and 8 cases of chorion laeve samples from postpartum placental tissue. The samples were collected with the following criteria: women with gestational complications, such as preeclampsia, fetal growth restriction, and gestational diabetes, and fetuses with known or suspected genetic disorders were excluded.The validation datasets were downloaded from the GEO database:GSE74706 from Marwitz, Depner et al,[ GSE67061 from Tong, Feng et al,[ and GSE18842 from Sanchez-Palencia et al.[The gene-expression datasets from human specimens were available publicly. The ethics approval and consent of participate were declared in the original papers published previously.[
Identification of development correlated genes
First, the raw data were normalized by the median scale method using the R package “limma”.[ Every gene had several different screening probes. Only the probe showing the greatest mean intensity was retained. An expression matrix with 19596 gene features was used for subsequent analysis.We used the cor.test and Pearson's test to perform a correlation analysis between gene expression and gestational age (weeks). P values, the false discovery rate (FDR) and the correlation index were obtained. Genes with correlation index ≥0.8 or ≤−0.8, and FDR <0.0001 were considered significantly gestational age-related differentially expressed genes. Positively correlated genes were genes that were gradually upregulated with development time, and negatively correlated genes were genes that were gradually downregulated during the development process.
Integration of PPI network and subnetwork analysis
Search Tool for the Retrieval of Interacting Genes (STRING)[ was used to construct PPI networks for the gestational age-related upregulated and downregulated genes. The networks were constructed using the default settings, including high-confidence edges with STRING scores greater than 0.4. The whole network was then imported into Cytoscape software (version 3.6). The “Molecular Complex Detection (MCODE)”[ Cytoscape plugin was used to identify discrete clusters (clusters) from the primary PPI networks by the default settings. The top clusters were screened under the conditions of minimum size = 6 and minimum score = 4. The cluster visualization and gene ontology (GO) functional analysis were conducted in STRING.
Assessment of the expression of development core-related gene clusters between cancerous tissue and noncancerous tissue
The potential associations between gene clusters and clinicopathological variables (tumor versus paracancerous tissue) were evaluated in 3 SCC GEO datasets downloaded from the GEO database. The t test was used to compare gene cluster expression between tumor tissue and paracancerous tissue. Hierarchical clustering was used to display the expression profiles for comparison between normal and tumor tissues.
Validation of the development correlated gene clusters’ prognostic value in SCC datasets
The TCGA-LUSC RNA sequencing data used for survival analysis were obtained by using the R package “RTCGAToolbox”. We evaluated the association between the 5-year OS and the development correlated gene clusters’ expression signatures. Hierarchical clustering of the expression of each of the gene clusters divided the sample into 2 groups, and this division was used as a categorical variable to perform survival analysis. Kaplan−Meier survival analysis and log-rank test were used to evaluate the independence of the prognostic value of the gene cluster. The Cox proportional hazards regression model was used to evaluate the independence of the prognostic factors in a stepwise manner. P <.05 was considered significant for all of these analyses.
Results
Gene expression profile analysis of villus development
In our study, a total of 36 chorionic villi and 8 chorion laeve samples of mature placenta were analyzed, and the strategy was diagrammatically outlined in Supplementary Figure 1 (see Figure, Supplemental Digital Content, which illustrated the analyze strategy). We conducted a test evaluating the correlation between gene expression and gestational age (weeks) to identify gestational age-related differentially expressed genes using correlation index ≥0.8 or ≤−0.8 and FDR cutoff <0.0001, which resulted in 719 genes, of which 394 genes were gradually upregulated (see Table, Supplemental Digital Content tableS1, which listed the positively correlated genes)(Fig. 1A)and 325 genes (see Table, Supplemental Digital Content tableS2, which listed the negatively correlated genes)(Fig. 1A)were gradually downregulated during villi development. We considered these gestational age-related genes to represent the main villus development-related genes (DGs) for further analysis.
Figure 1
Identification of VDGs and GO enrichment analysis. A. visualization of correlation index of positive correlated genes (red) and negative correlated genes (blue). B. GO enrichment analysis of upregulated VDGs indicated that these genes were associated with inflammatory and immune responses. C. GO enrichment analysis of downregulated VDGs indicated that these genes were related to cell division. VDGs = villi development-related genes, GO = gene ontology.
Identification of VDGs and GO enrichment analysis. A. visualization of correlation index of positive correlated genes (red) and negative correlated genes (blue). B. GO enrichment analysis of upregulated VDGs indicated that these genes were associated with inflammatory and immune responses. C. GO enrichment analysis of downregulated VDGs indicated that these genes were related to cell division. VDGs = villi development-related genes, GO = gene ontology.We then examined the functions of these DGs using GO enrichment analysis. Twenty of the most significant GO biological process terms are shown in Figure 1B (upregulated DGs) and Figure 1C (downregulated DGs). The gradually upregulated DGs were related to inflammatory response, immune response and neutrophil activation, while the gradually downregulated DGs were related to organelle fission, nuclear division, chromosome segregation.
PPI network analysis and cluster screening of development-related differentially expressed genes
All of the correlated genes, including upregulated genes and downregulated genes, were uploaded to STRING to construct PPI networks. The largest connected subnetwork of the up- and down-regulated PPI network contained 238 nodes (Fig. 2A) and 191 nodes (Fig. 2C), respectively. This subnetwork, composed of DGs, may contain clusters that provide useful information. To identify these clusters, the MCODE plugin in Cytoscape software was used to extract discrete clusters.
Figure 2
PPI network analysis of VDGs. The VDGs were imported into STRING to construct PPI networks. The largest subnetworks of upregulated VDGs (A) and downregulated genes (C) are shown. The right columns of (B) show the discrete up-clusters (up-clusters1, up-clusters2, upclustes3) identified in the network (A), and the corresponding dot plots show the most significant enriched GO terms for each cluster. The right columns of (D) show the discrete down-clusters (down-clusters1, down-clusters2, down-clusters3) identified in the network (B), and the corresponding dot plots show the most significant enriched GO terms for each cluster. The line graphs show the expression characteristics of each cluster with time. VDGs = villi development-related genes, GO = gene ontology.
PPI network analysis of VDGs. The VDGs were imported into STRING to construct PPI networks. The largest subnetworks of upregulated VDGs (A) and downregulated genes (C) are shown. The right columns of (B) show the discrete up-clusters (up-clusters1, up-clusters2, upclustes3) identified in the network (A), and the corresponding dot plots show the most significant enriched GO terms for each cluster. The right columns of (D) show the discrete down-clusters (down-clusters1, down-clusters2, down-clusters3) identified in the network (B), and the corresponding dot plots show the most significant enriched GO terms for each cluster. The line graphs show the expression characteristics of each cluster with time. VDGs = villi development-related genes, GO = gene ontology.The upregulated subnetwork and downregulated subnetwork each contained 3 clusters. These 6 clusters were considered the gene sets that were most relevant to villus development. The distinct expression profiles of these 6 clusters were drawn. The expression levels of upregulated gene clusters increased gradually from 6 to 40 weeks (Fig. 2B), while the expression levels of downregulated genes decreased slowly from 6 to 40 weeks (Fig. 2D).GO analysis was conducted separately among these 6 clusters. The significant biological processes of genes in upregulated clusters were related to chemotaxis and leukocyte migration (up-cluster1), posttranslational protein modification and protein polyubiquitination (up-cluster2), and response to tumor necrosis factor and leukocyte migration (up-cluster3) (Fig. 2B). Downregulated cluster genes were significantly enriched for cell division (down-cluster1), protein targeting ER and ribosome biogenesis (down-cluster2), and regulation of gene expression and epigenetics (down-cluster3) (Fig. 2D).
Highly development correlated genes’ expression scenario in lung SCC
We examined whether the expression profile of genes highly correlated with villi development differs between lung SCC and paracancerous samples. The 6 cluster genes were evaluated in the 3 GEO datasets to compare the expression profile between lung squamous tumor and paracancerous tissue. The ideograph (Fig. 3A and B) was drawn to show the different expression panel among development tissue and tumor, precancerous tissue. First, we used t tests to determine whether the genes were differentially expressed between cancer and normal samples. A gene with a t test P value less than .05 was considered differentially expressed. Then, we used hierarchical clustering to display the expression profiles. The results revealed that upregulated genes in villi development were downregulated in SCC (Fig. 3C), while the downregulated genes in villi development were upregulated in SCC (Fig. 3D). Then we used ROC (Fig. 3E) to evaluate the gene cluster’ accuracy in differentiating tumor from precancerous tissue. Among the 6 ROC curves, the AUC area was above 80% except for 1 curve. The different expression trajectories in the datasets between lung SCC and villi development suggest that embryos and cancer share many similar and important mechanisms, which may elucidate important mechanisms related to tumorigenesis and prognosis.
Figure 3
Highly correlated gene expression scenario in lung SCC. Represented figures (A and B) were used to show the opposite gene expression scenario during development and carcinogenesis. A heatmap was used to visually compare the expression patterns of gene clusters between lung SCC and normal samples. Rows represent genes, and columns represent samples. The yellow bar represents tumor tissue while the blue bar displays normal tissue. Green, black, and red represent low, medium, and high expression, respectively. C shows the upregulated gene cluster expression patterns between lung SCC tumors and normal samples. D shows the downregulated gene cluster expression profiles between lung SCC tumors and normal samples. Six ROC curves (3 up clusters and 3 down clusters) (Figure 3E) were used to evaluate the accuracy of differentiating tumor from precancerous tissue. SCC = squamous cell carcinoma.
Highly correlated gene expression scenario in lung SCC. Represented figures (A and B) were used to show the opposite gene expression scenario during development and carcinogenesis. A heatmap was used to visually compare the expression patterns of gene clusters between lung SCC and normal samples. Rows represent genes, and columns represent samples. The yellow bar represents tumor tissue while the blue bar displays normal tissue. Green, black, and red represent low, medium, and high expression, respectively. C shows the upregulated gene cluster expression patterns between lung SCC tumors and normal samples. D shows the downregulated gene cluster expression profiles between lung SCC tumors and normal samples. Six ROC curves (3 up clusters and 3 down clusters) (Figure 3E) were used to evaluate the accuracy of differentiating tumor from precancerous tissue. SCC = squamous cell carcinoma.
Prognostic significance of development core-related cluster genes for lung SCC patients
To further investigate the relationship between the gene clusters highly correlated with villi development and the clinical phenotypes of lung SCC patients, the 6 gene clusters identified by network analysis were examined for their prognostic significance in TCGA lung SCC. We used the hierarchical clustering method to divide the patients into 2 groups. And then, prognostic value of the gene clusters was verified by using the TCGA lung SCC RNA Seq data. Among the 6 gene clusters, 4 gene clusters, except for down-cluster2 and up-cluster3 (Fig. 4D and E), up-cluster 1 (Fig. 4A), up-cluster 2 (Fig. 4C) and down-cluster 1 (Fig. 4B), down-cluster 3 (Fig. 4F) were found to be significantly associated with 5-year OS in TCGA lung SCC data. Multivariate Cox (Table 1) proportional hazards regression analysis validated this cluster as an independent prognostic factor.
Figure 4
Five-year survival analysis of the 6 clusters’ genes in TCGA lung SCC (A) (C) (E) Kaplan–Meier survival analysis and log-rank test results of up-clusters genes in TCGA lung SCC datasets, in which patients are divided into 2 up-clusters assigned groups. (B) (D) (F) Kaplan–Meier survival analysis and log-rank test results of down-cluster genes in TCGA lung SCC datasets, in which patients are divided into 2 down-clusters assigned groups. SCC = squamous cell carcinoma.
Table 1
Univariate and multivariate analyses of overall survival (Cox proportional hazards regression model) in TCGA lung SCC datasets.
Five-year survival analysis of the 6 clusters’ genes in TCGA lung SCC (A) (C) (E) Kaplan–Meier survival analysis and log-rank test results of up-clusters genes in TCGA lung SCC datasets, in which patients are divided into 2 up-clusters assigned groups. (B) (D) (F) Kaplan–Meier survival analysis and log-rank test results of down-cluster genes in TCGA lung SCC datasets, in which patients are divided into 2 down-clusters assigned groups. SCC = squamous cell carcinoma.Univariate and multivariate analyses of overall survival (Cox proportional hazards regression model) in TCGA lung SCC datasets.
Discussion
Recently, many studies have demonstrated the similarity and relationship between embryo development and tumorigenesis with regard to invasive cellular behaviors,[ gene expression,[ and other important biological behaviors.[ Trophoblasts, which are placental cells that come into intimate contact with the maternal immune system, are similar to tumor cells with respect to their active dividing properties and metastatic abilities.[ In this study, we analyzed the transcriptomic data of placental villi during development at multiple time points. We identified 719 significantly differentially expressed genes during development with time-dependent expression trends by using a correlation test. GO analysis indicated that these villi development-related genes (VDGs) were involved in immune, inflammatory response, cell division, and nuclear division processes (Fig. 1). With PPI network analysis, we identified 3 gene clusters that were gradually upregulated with time and 3 gradually downregulated gene clusters.Many studies have examined embryonic development for information regarding the malignant transformation of tumor cells, taking advantage of this important research method. During tumorigenesis, tumor cells, which in many ways behave like the fetus, have properties associated with defective cell cycle processes as well as genomic instability and immune escape, providing tumor cells with territorial expansion advantages over normal cells.[ Based on this theory, subsequently, the cluster genes highly correlated with villi development were examined for comparison with lung SCC samples. Notably, in most circumstances, the genes upregulated along the development time axis were generally downregulated in lung SCC; the genes downregulated along the development time axis were generally upregulated in lung SCC, which is concordant with many published studies of different tumor types, implying that cancer could hijack the programs essential for embryonic development to obtain the capability for tumor initiation and progression.[ Although the definite reason is still unclear, we found that the expression of certain pivotal genes shows a synchronized pattern in embryonic development and carcinogenesis. For example, the activity of FOXM1 in down-cluster1 is thought to play an important role in the EMT process[ and could promote metastatic activity in lung cancer,[ and its overexpression is associated with a poor prognosis in lung cancer patients.[ SMC4, one of down-cluster1 genes was shown to be involved in lung cancer progression, and its overexpression is also associated with poor prognosis in lung cancer patients.[Predicting the prognosis of cancer is a major challenge in current clinical research. In our study, 4 of 6 identified development-related gene clusters were associated with the prognosis of lung SCC patients based on TCGA lung SCC data (Fig. 4). The individual prognostic ability of clusters was further examined using the Cox proportional hazards regression method.In conclusion, using the correlation test method to identify core genes correlated with development time, we obtained key information about the similarity between development and cancer. Moreover, these development correlated genes have been verified to have prognosis value in lung SCC. Our findings suggest that investigating development could provide valuable information about oncogenesis and cancer progression.
Conclusions
In summary, using villi development trancriptomic data and lung SCC data, we identified the similarity between development and tumor once more. Moreover, we identified 6 cluster genes in which 4 of them were associated with prognosis of lung SCC. Our findings suggest that investigation of development could provide valuable information about oncogenesis and progression.
Acknowledgments
The authors wish to thank Dr Xuexin Yu and Dr. Bangrong Cao for discussions.
Authors: Abel Sanchez-Palencia; Mercedes Gomez-Morales; Jose Antonio Gomez-Capilla; Vicente Pedraza; Laura Boyero; Rafael Rosell; M Esther Fárez-Vidal Journal: Int J Cancer Date: 2010-11-28 Impact factor: 7.396
Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971
Authors: Sebastian Marwitz; Sofia Depner; Dmytro Dvornikov; Ruth Merkle; Magdalena Szczygieł; Karin Müller-Decker; Philippe Lucarelli; Marvin Wäsch; Heimo Mairbäurl; Klaus F Rabe; Christian Kugler; Ekkehard Vollmer; Martin Reck; Swetlana Scheufele; Maren Kröger; Ole Ammerpohl; Reiner Siebert; Torsten Goldmann; Ursula Klingmüller Journal: Cancer Res Date: 2016-05-17 Impact factor: 12.701
Authors: Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos Journal: Nat Med Date: 2018-09-17 Impact factor: 53.440