Literature DB >> 35309509

Integrated Analysis of Transcriptomic and Genomic Data Reveals Blood Biomarkers With Diagnostic and Prognostic Potential in Non-small Cell Lung Cancer.

Ibrahim H Kaya1,2, Olfat Al-Harazi2, Mustafa T Kaya2,3, Dilek Colak2.   

Abstract

Background: Lung cancer is the second most common cancer and the main leading cause of cancer-associated death worldwide. Non-small cell lung cancer (NSCLC) accounts for about 85% of lung cancer diagnoses and more than 50% of all lung cancer cases are diagnosed at an advanced stage; hence have poor prognosis. Therefore, it is important to diagnose NSCLC patients reliably and as early as possible in order to reduce the risk of mortality.
Methods: We identified blood-based gene markers for early NSCLC by performing a multi-omics approach utilizing integrated analysis of global gene expression and copy number alterations of NSCLC patients using array-based techniques. We also validated the diagnostic and the prognostic potential of the gene signature using independent datasets with detailed clinical information.
Results: We identified 12 genes that are significantly expressed in NSCLC patients' blood, at the earliest stages of the disease, and associated with a poor disease outcome. We then validated 12-gene signature's diagnostic and prognostic value using independent datasets of gene expression profiling of over 1000 NSCLC patients. Indeed, 12-gene signature predicted disease outcome independently of other clinical factors in multivariate regression analysis (HR = 2.64, 95% CI = 1.72-4.07; p = 1.3 × 10-8). Significantly altered functions, pathways, and gene networks revealed alterations in several key genes and cancer-related pathways that may have importance for NSCLC transformation, including FAM83A, ZNF696, UBE2C, RECK, TIMM50, GEMIN7, and XPO5.
Conclusion: Our findings suggest that integrated genomic and network analyses may provide a reliable approach to identify genes that are associated with NSCLC, and lead to improved diagnosis detecting the disease in early stages in patients' blood instead of using invasive techniques and also have prognostic potential for discriminating high-risk patients from the low-risk ones.
Copyright © 2022 Kaya, Al-Harazi, Kaya and Colak.

Entities:  

Keywords:  NSCLC; biomarker; blood; early diagnosis; gene signature; lung cancer; omics; prognosis

Year:  2022        PMID: 35309509      PMCID: PMC8930812          DOI: 10.3389/fmolb.2022.774738

Source DB:  PubMed          Journal:  Front Mol Biosci        ISSN: 2296-889X


Introduction

Despite the advances in cancer therapies and raising awareness, lung cancer continues to be one of the most malignant tumors. It is the second most common cancer and the leading cause of cancer-related death worldwide (Bray et al., 2018). Non-small-cell lung carcinoma (NSCLC) is responsible for about 85% of lung cancers (Santarpia et al., 2015). The poor outcome of many NSCLC patients stems from the fact that many are diagnosed after their cancer has developed into advanced stages (Xie and Xie, 2019; Chen et al., 2020), further indicating the necessity of identifying NSCLC at an early stage for maximizing patient survival. Recent genomic studies have shown that changes in gene expression and copy number variants (CNVs) have been associated with human diseases, including cancer (Colak et al., 2010; Colak et al., 2013), and identified potential biomarkers for the disease using RNA- or DNA-based approaches (Jabs et al., 2017; Chakraborty et al., 2018). Previous studies also indicated that integrated genomic and network-based analysis may lead to reliable biomarkers for human diseases (Jinhua Sheng et al., 2011; Colak et al., 2013; Al-Harazi et al., 2016; Chakraborty et al., 2018). However, most of the identified biomarkers requires invasive procedures or not able to diagnose the early NSCLC. The aim of this study is to identify a blood-based gene signature potentially be involved in development of early stage of the disease and have a prognostic value. We performed integrated analysis of transcriptomic and genomic data to identify blood markers with diagnostic and prognostic potential in early NSCLC and validated its significance using over 1000 NSCLC patients from multiple independent genomic datasets with clinical data. The identified gene markers may improve the detection of diseases and help to develop therapeutic strategies.

Materials and Methods

Data Collection and the Integrated Analysis

Whole-genome gene expression and copy number alterations (CNAs) datasets for 190 NSCLC patients were obtained from publicly available databases within NCBI GEO (www.ncbi.nlm.nih.gov/geo) (GSE37745 and GSE76730). These datasets were then analyzed as previously described (Jabs et al., 2017). Moreover, data for blood samples for lung cancer patients (n = 3) and controls (n = 3) were gathered from a publicly available database (GSE69732). Furthermore, we downloaded RNAseq dataset for NSCLC patients from The Cancer Genome Atlas (TCGA) that contains 576 samples (n = 517 tumor, 279 of which are with Stage 1 and 59 normal samples). We compared the transcriptome of early stage NSCLC (n = 279) with normal (n = 59) samples and identified the differentially expressed genes (DEGs). The DEGs were identified using Analysis of Variance (ANOVA) with adjusted p-value of <0.05 and absolute fold change (FC) ≥ 1.5. The p values were adjusted for multiple comparisons by false discovery rate (FDR) according to Benjamini–Hochberg step-up procedure (Benjamini and Hochberg, 1995). The integrated analysis was performed using the Venn diagram approach to identify the common DEGs among mRNA, CNA, early-stage NSCLC and blood gene expression datasets. We then identified genes that are significantly associated with patients’ survival by performing overall survival analysis for each gene separately on a dataset containing 1,144 lung cancer samples collected from 14 datasets (GSE4573 (Raponi et al., 2006), GSE14814 (Zhu et al., 2010), GSE8894 (Lee et al., 2008), GSE19188 (Hou et al., 2010), GSE3141 (Bild et al., 2006), GSE31210 (Yamauchi et al., 2012), GSE29013 (Xie et al., 2011), GSE37745 (Botling et al., 2013), caArray (Director’s Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma et al., 2008), and TCGA (Cancer Genome Atlas Research Network, 2012)) (Győrffy et al., 2013). Figure 1 illustrates our methodology.
FIGURE 1

Schematic flowchart illustrating the methodology.

Schematic flowchart illustrating the methodology.

Validation of the Diagnostic Value of the Gene Signature

For validating the diagnostic value of our gene signature, we used a TCGA dataset (n = 576) and an independent dataset from ArrayExpress (E-MTAB-5231). The independent dataset consists of 22 NSCLC samples and 17 normal adjacent controls. We performed unsupervised principal component analysis (PCA) and two-dimensional hierarchical clustering using PARTEK Genomics Suite (Partek Inc., St. Louis, MO, United States) for each dataset separately. Functional, pathway, and gene interaction network analyses of the gene signature were performed using QIAGEN’s Ingenuity Pathway Analysis (IPA®, QIAGEN Redwood City).

Gene Ontology Enrichment, Pathway, and Gene Network Analyses

Gene ontology (GO) enrichment, pathway, and gene interaction network analyses were performed using (QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis) and Database for Annotation, Visualization and Integrated Discovery (DAVID) (Dennis et al., 2003). We mapped the NSCLC-associated gene signature to its corresponding gene object in the Ingenuity pathway knowledge base and constructed the gene interaction networks. A right-tailed Fisher’s exact test was used to calculate a p-value determining the probability that the biological function (or pathway) assigned to the data set is explained by chance alone (Colak et al., 2020).

Survival, Multivariate Analyses and NSCLC 12-Gene Classifier

Univariate and multivariate Cox regression analyses were used to assess our gene signature’s prognostic significance together with other clinical variables. We performed overall (OS) and progression free (PFS) survival on 1144 and 596 tumor samples, respectively. We calculated a 12-gene signature expression score for each patient that is average expression of up-regulated genes-average expression of down-regulated genes. We then used the median as the cutoff value for classifying patients into high and low risk groups. Survival curves were then plotted using the Kaplan-Meier method, and significance between survival curves was calculated by the log-rank test. In addition, multivariate analysis was performed using our 12-geneset taking histology (adenocarcinoma and squamous cell carcinoma), gender, and smoking history as covariates. A p-value < 0.05 was considered statistically significant. Furthermore, we designed an NSCLC classifier using our 12-gene signature using several machine learning algorithms such as K-Nearest Neighbor, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Nearest Centroid, and Support Vector Machine (SVM). We estimated the classification performance on TCGA with 10-fold cross validation. We utilized standardized gene expression levels of the 12-gene signature as feature values. Accuracy, specificity, sensitivity, and area under curve (AUC) were used statistics measures, as described previously (Al-Harazi et al., 2021a; Al-Harazi et al., 2021b). The Nearest Centroid algorithm with proportional prior probability has outperformed other algorithms. The analyses were performed using PARTEK Genomics Suite (Partek Inc., St. Lois, MO, United States).

Results

Identification of a Blood-Based Gene Signature for Early Stage Lung Cancer

We performed an integrated genomic analysis using four different transcriptomic and genomic datasets for human NSCLC. The analysis of transcriptomic and copy number alterations (CNAs) datasets (GSE37745 and GSE76730; 190 NSCLC) revealed 2,280 significantly expressed genes with copy number alterations (Jabs et al., 2017) (Figure 1). The analysis of whole-genome gene expression profiling of early stage NSCLC (n = 279) with normal (n = 59) samples revealed 7,337 genes (adjusted p-value <0.05 and fold change (FC) ≥ 1.5). Moreover, comparison of tumor transcriptome from patients’ blood with that of from normal controls resulted in 728 genes. We used Venn diagram approach to identify the common DEGs among mRNA, CNA, early-stage NSCLC and blood gene expression datasets that revealed 21 genes that are in common among all datasets (Figure 1). We then identified 12 genes (Table 1), defined as “12-gene signature,” that are significantly associated with patients’ survival by performing survival analysis of over 1,000 lung cancer samples (Figure 1).
TABLE 1

List of 12-gene signature that is identified in this study.

GeneGene Name p-valueFC
FAM83A family with sequence similarity 83, member A1.82E-6058.9
GEMIN7 gem (nuclear organelle) associated protein 74.56E-211.75
ITPA inosine triphosphatase (nucleoside triphosphate pyrophosphatase)3.50E-121.52
NOP58 NOP58 ribonucleoprotein3.21E-281,67
NR2C2AP nuclear receptor 2C2-associated protein8.59E-251.84
RECK reversion-inducing-cysteine-rich protein with kazal motifs1.52E-41-3.34
TIMM50 Translocase of inner mitochondrial membrane 50 homolog3.19E-141.64
TOMM40 Translocase of outer mitochondrial membrane 40 homolog (yeast)1.60E-131.60
UBE2C ubiquitin-conjugating enzyme E2C1.50E-4212.5
XPO5 exportin 59.52E-311.97
ZNF696 zinc finger protein 6963.76E-121.59
ZNF7 zinc finger protein 72.34E-191.62

Abbreviations: FC, fold change; FC, is calculated between the mean values of expression observed in tumor in comparison to normal using the data from The Cancer Genome Atlas (TCGA) (using Stage I only). Negative (−) value indicates downregulation.

List of 12-gene signature that is identified in this study. Abbreviations: FC, fold change; FC, is calculated between the mean values of expression observed in tumor in comparison to normal using the data from The Cancer Genome Atlas (TCGA) (using Stage I only). Negative (−) value indicates downregulation.

Diagnostic and Prognostic Significance of the 12-Gene Signature

To test the diagnostic value of the 12-gene list, we performed unsupervised two-dimensional hierarchical clustering and principal component analyses (PCA) on two datasets (TCGA, n = 576 and E-MTAB-5231, n = 39 samples). The unsupervised PCA and the two-dimensional hierarchical clustering clearly distinguished patients from normal control samples in both datasets (Figure 2).
FIGURE 2

Two-dimensional hierarchical clustering using our gene signature clearly separated patients from normal controls in (A) TCGA (n = 576) and (C) E-MTAB-5231 (n = 39), respectively. The hierarchical clustering revealed two main clusters, one mainly composed of tumors and another composed of normal controls. Samples are denoted in columns and genes are denoted in rows. Unsupervised PCA for (B) TCGA (n = 576) and (D) E-MTAB-5231 (n = 39). Red indicates tumor and blue denotes normal samples.

Two-dimensional hierarchical clustering using our gene signature clearly separated patients from normal controls in (A) TCGA (n = 576) and (C) E-MTAB-5231 (n = 39), respectively. The hierarchical clustering revealed two main clusters, one mainly composed of tumors and another composed of normal controls. Samples are denoted in columns and genes are denoted in rows. Unsupervised PCA for (B) TCGA (n = 576) and (D) E-MTAB-5231 (n = 39). Red indicates tumor and blue denotes normal samples. We confirmed the prognostic significance of our blood-based gene signature for overall as well as recurrence-free survival using a dataset with detailed clinical data from over 1000 NSCLC patients. The analysis demonstrated that high expression score based on 12-genes are significantly associated with poor disease outcome (Figures 3A,B). The 12-gene signature separated the patients into high risk and low-risk groups. Patients in the high-risk group had a significantly worse prognosis than the low-risk group with p-value < 1 × 10−16 (Figure 3). Patients in the high-risk group were more than twice likely to die from the disease than those in the low-risk group (Figure 3A). Similarly, the progression-free survival also showed that patients in the high-risk group had a poorer progression-free survival than patients in the low-risk group (Figure 3B).
FIGURE 3

Prognostic significance of the 12-gene signature. (A) Overall and (B) progression free survival (PFS) analysis using NSCLC tumor samples (n = 1,144 samples). (C) Multivariate analysis using histology (adenocarcinoma and squamous cell carcinoma), gender, and smoking history as covariates. (D) Classification results of our gene signature using nearest centroid with proportional prior probability algorithm.

Prognostic significance of the 12-gene signature. (A) Overall and (B) progression free survival (PFS) analysis using NSCLC tumor samples (n = 1,144 samples). (C) Multivariate analysis using histology (adenocarcinoma and squamous cell carcinoma), gender, and smoking history as covariates. (D) Classification results of our gene signature using nearest centroid with proportional prior probability algorithm. Moreover, the multivariate analyses indicated that our 12-gene signature is prognosticating the outcome of the disease independent of other clinic-pathological variables, such as histology, smoking history, and gender (HR = 2.64, 95% CI = 1.72–4.07; p-value = 1.3 × 10−8) (Figure 3C). Furthermore, we designed the 12-gene classifier using nearest centroid with proportional prior probability algorithm that provided over 99% accuracy in classifying samples as tumors or normal controls (Figure 3D).

Validation in Blood and Functional and Network Analyses

The expression of 12-gene in blood samples from patients and healthy controls (GSE69732) were compared that revealed that 12-gene signature score is significantly higher in tumor compared to normal (p-value = 0.002, Figure 4A). Functional and gene network analyses of the gene signature were performed using IPA which indicated that 12 genes were significantly associated with cancer, cell cycle, cellular movement, molecular transport, RNA trafficking, cell morphology, organ development, and tumor morphology (Figure 4B). Moreover, gene interaction networks revealed several key genes and cancer-related pathways that may role for early NSCLC transformation and disease progression, including FAM83A, ZNF696, UBE2C, RECK, TIMM50, GEMIN7, and XPO5 (Figure 4C).
FIGURE 4

(A) mRNA gene expression of 12-gene signature score in blood from tumor vs. normal. (B) Gene ontology and functional analysis of the 12-gene signature. X-axis represents the significance (–log10 (p-value)) of the functional term. p-value of 0.05 is indicated as the threshold line in the figure (C) Gene interaction network analyses of the 12-gene signature. Red/green indicates higher/lower expression in NSCLC compared to controls. Straight lines are for direct interactions and dashed lines for indirect ones.

(A) mRNA gene expression of 12-gene signature score in blood from tumor vs. normal. (B) Gene ontology and functional analysis of the 12-gene signature. X-axis represents the significance (–log10 (p-value)) of the functional term. p-value of 0.05 is indicated as the threshold line in the figure (C) Gene interaction network analyses of the 12-gene signature. Red/green indicates higher/lower expression in NSCLC compared to controls. Straight lines are for direct interactions and dashed lines for indirect ones.

Discussion

In this study, we sought to identify blood-based biomarkers with diagnostic and prognostic potential for early lung cancer using integrated analysis of multiple high dimensional independent datasets of transcriptomic and genomic datasets that detect the disease in early stages in patients’ biological fluids rather than using invasive techniques. We identified 12-gene signature using integrated omics approach and validated its diagnostic and prognostic significance for overall and recurrence-free survival using data from over 1000 lung cancer patients’ samples with detailed clinical data. The analysis demonstrated that high 12-gene signature score was significantly associated with poor disease outcome. Previous studies reported that the integrated analysis of transcriptomic and genomic data may lead to reliable biomarkers that are more robust in disease classification and may have role in tumorigenesis (Colak et al., 2010; Al-Harazi et al., 2016; Chakraborty et al., 2018; Al-Harazi et al., 2021b). Indeed, several potential cancer driver genes that are involved in tumor initiation and progression have been identified using this approach (Colak et al., 2010; Colak et al., 2013; Ohshima et al., 2017). Functional, pathway, and gene network analyses revealed significant biological functions, including cancer, cell cycle, cellular movement, molecular transport, and RNA trafficking, as well as several key genes and cancer-related pathways that may have importance for NSCLC transformation, including FAM83A, ZNF696, UBE2C, RECK, TIMM50, GEMIN7, and XPO5. Indeed, some of the identified genes were reported to be associated with cancers, including lung cancer. For example, FAM83A was found to be highly expressed in lung tumors (Li et al., 2015; Snijders et al., 2017). RECK is downregulated in esophageal squamous cell carcinoma (ESCC) and associated with a poor survival in ESCC (Zhu et al., 2017). The UBE2C gene is overexpressed in different types of cancers and considered a new target for cancers therapies (Dastsooz et al., 2019). Moreover, we used a machine learning algorithm to develop a model using our 12-gene signature for performing classification and tested its classification accuracy using over 500 lung cancer patients’ data that resulted in 99% prediction accuracy. In conclusion, the 12-gene signature that we identified in this study reveals several genes and pathways that may be essential for early NSCLC transformation and progression and has potential to detect the disease in patients’ blood instead of utilizing invasive techniques. The integrated omics and network analyses may lead to robust biomarkers for the detection of early lung cancer and may lead to improved diagnosis, prognosis and therapeutic options.
  30 in total

1.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies.

Authors:  Andrea H Bild; Guang Yao; Jeffrey T Chang; Quanli Wang; Anil Potti; Dawn Chasse; Mary-Beth Joshi; David Harpole; Johnathan M Lancaster; Andrew Berchuck; John A Olson; Jeffrey R Marks; Holly K Dressman; Mike West; Joseph R Nevins
Journal:  Nature       Date:  2005-11-06       Impact factor: 49.962

2.  Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis, and tissue microarray validation.

Authors:  Johan Botling; Karolina Edlund; Miriam Lohr; Birte Hellwig; Lars Holmberg; Mats Lambe; Anders Berglund; Simon Ekman; Michael Bergqvist; Fredrik Pontén; André König; Oswaldo Fernandes; Mats Karlsson; Gisela Helenius; Christina Karlsson; Jörg Rahnenführer; Jan G Hengstler; Patrick Micke
Journal:  Clin Cancer Res       Date:  2012-10-02       Impact factor: 12.531

3.  Prediction of recurrence-free survival in postoperative non-small cell lung cancer patients by using an integrated model of clinical information and gene expression.

Authors:  Eung-Sirk Lee; Dae-Soon Son; Sung-Hyun Kim; Jinseon Lee; Jisuk Jo; Joungho Han; Heesue Kim; Hyun Joo Lee; Hye Young Choi; Youngja Jung; Miyeon Park; Yu Sung Lim; Kwhanmien Kim; YoungMog Shim; Byung Chul Kim; Kyusang Lee; Nam Huh; Christopher Ko; Kyunghee Park; Jae Won Lee; Yong Soo Choi; Jhingook Kim
Journal:  Clin Cancer Res       Date:  2008-11-15       Impact factor: 12.531

4.  Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study.

Authors:  Kerby Shedden; Jeremy M G Taylor; Steven A Enkemann; Ming-Sound Tsao; Timothy J Yeatman; William L Gerald; Steven Eschrich; Igor Jurisica; Thomas J Giordano; David E Misek; Andrew C Chang; Chang Qi Zhu; Daniel Strumpf; Samir Hanash; Frances A Shepherd; Keyue Ding; Lesley Seymour; Katsuhiko Naoki; Nathan Pennell; Barbara Weir; Roel Verhaak; Christine Ladd-Acosta; Todd Golub; Michael Gruidl; Anupama Sharma; Janos Szoke; Maureen Zakowski; Valerie Rusch; Mark Kris; Agnes Viale; Noriko Motoi; William Travis; Barbara Conley; Venkatraman E Seshan; Matthew Meyerson; Rork Kuick; Kevin K Dobbin; Tracy Lively; James W Jacobson; David G Beer
Journal:  Nat Med       Date:  2008-07-20       Impact factor: 53.440

5.  Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis.

Authors:  Jinhua Sheng; Hong-Wen Deng; Vince D Calhoun; Yu-Ping Wang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Nov-Dec       Impact factor: 3.710

6.  Epidermal growth factor receptor tyrosine kinase defines critical prognostic genes of stage I lung adenocarcinoma.

Authors:  Mai Yamauchi; Rui Yamaguchi; Asuka Nakata; Takashi Kohno; Masao Nagasaki; Teppei Shimamura; Seiya Imoto; Ayumu Saito; Kazuko Ueno; Yousuke Hatanaka; Ryo Yoshida; Tomoyuki Higuchi; Masaharu Nomura; David G Beer; Jun Yokota; Satoru Miyano; Noriko Gotoh
Journal:  PLoS One       Date:  2012-09-19       Impact factor: 3.240

7.  Integrated analysis of gene expression and copy number identified potential cancer driver genes with amplification-dependent overexpression in 1,454 solid tumors.

Authors:  Keiichi Ohshima; Keiichi Hatakeyama; Takeshi Nagashima; Yuko Watanabe; Kaori Kanto; Yuki Doi; Tomomi Ide; Yuji Shimoda; Tomoe Tanabe; Sumiko Ohnami; Shumpei Ohnami; Masakuni Serizawa; Koji Maruyama; Yasuto Akiyama; Kenichi Urakami; Masatoshi Kusuhara; Tohru Mochizuki; Ken Yamaguchi
Journal:  Sci Rep       Date:  2017-04-04       Impact factor: 4.379

8.  A Comprehensive Bioinformatics Analysis of UBE2C in Cancers.

Authors:  Hassan Dastsooz; Matteo Cereda; Daniela Donna; Salvatore Oliviero
Journal:  Int J Mol Sci       Date:  2019-05-07       Impact factor: 5.923

9.  RNA-Seq transcriptome profiling in three liver regeneration models in rats: comparative analysis of partial hepatectomy, ALLPS, and PVL.

Authors:  Dilek Colak; Olfat Al-Harazi; Osama M Mustafa; Fanwei Meng; Abdullah M Assiri; Dipok K Dhar; Dieter C Broering
Journal:  Sci Rep       Date:  2020-03-23       Impact factor: 4.379

10.  Integrative analysis of genome-wide gene copy number changes and gene expression in non-small cell lung cancer.

Authors:  Verena Jabs; Karolina Edlund; Helena König; Marianna Grinberg; Katrin Madjar; Jörg Rahnenführer; Simon Ekman; Michael Bergkvist; Lars Holmberg; Katja Ickstadt; Johan Botling; Jan G Hengstler; Patrick Micke
Journal:  PLoS One       Date:  2017-11-07       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.