Literature DB >> 27081700

Gene identification for risk of relapse in stage I lung adenocarcinoma patients: a combined methodology of gene expression profiling and computational gene network analysis.

Vienna Ludovini¹, Fortunato Bianconi², Annamaria Siggillino¹, Danilo Piobbico², Jacopo Vannucci³, Giulio Metro¹, Rita Chiari¹, Guido Bellezza⁴, Francesco Puma³, Maria Agnese Della Fazia², Giuseppe Servillo², Lucio Crinò¹.

Abstract

Risk assessment and treatment choice remains a challenge in early non-small-cell lung cancer (NSCLC). The aim of this study was to identify novel genes involved in the risk of early relapse (ER) compared to no relapse (NR) in resected lung adenocarcinoma (AD) patients using a combination of high throughput technology and computational analysis. We identified 18 patients (n.13 NR and n.5 ER) with stage I AD. Frozen samples of patients in ER, NR and corresponding normal lung (NL) were subjected to Microarray technology and quantitative-PCR (Q-PCR). A gene network computational analysis was performed to select predictive genes. An independent set of 79 ADs stage I samples was used to validate selected genes by Q-PCR.From microarray analysis we selected 50 genes, using the fold change ratio of ER versus NR. They were validated both in pool and individually in patient samples (ER and NR) by Q-PCR. Fourteen increased and 25 decreased genes showed a concordance between two methods. They were used to perform a computational gene network analysis that identified 4 increased (HOXA10, CLCA2, AKR1B10, FABP3) and 6 decreased (SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1) genes. Moreover, in an independent dataset of ADs samples, we showed that both high FABP3 expression and low SCGB1A1 expression was associated with a worse disease-free survival (DFS).Our results indicate that it is possible to define, through gene expression and computational analysis, a characteristic gene profiling of patients with an increased risk of relapse that may become a tool for patient selection for adjuvant therapy.

Entities: Chemical Disease Gene Species

Keywords: cancer systems biology; computational biology; gene expression profiling; gene networks; lung adenocarcinoma

Mesh：

Year: 2016 PMID： 27081700 PMCID： PMC5058701 DOI： 10.18632/oncotarget.8723

Source DB: PubMed Journal: Oncotarget ISSN： 1949-2553

INTRODUCTION

Lung cancer is the most common cause of death from cancer worldwide, and non-small cell lung cancer (NSCLC: 20% squamous and 80% adenocarcinoma (AD) histology) accounts for almost 80% of such deaths [1]. Approximately 25-30% of patients with NSCLC present with localized disease at the time of diagnosis and undergo surgery with curative intent. Despite complete tumor resection, however, the 5-year survival of these patients is poor and 40 to 70% of them will develop systemic disease with or without local relapses and will eventually die from it. No reliable clinical or molecular predictors are currently available to identify those at high risk for developing recurrent disease and the tumor-node-metastasis (TNM) staging system [2] remains the most powerful tool to predict prognosis in these patients. After radical surgery, adjuvant cisplatin-based chemotherapy (four cycles) has been established as the standard care in stage II and III patients with good performance status, rapid postoperative recovery and adequate organ function based on the results of three phase III trials and of the LACE meta-analysis showing an absolute gain of 5.3% in 5-year survival [3]. The role of adjuvant chemotherapy in stage I radically resected diseaseremains mostly undetermined due to both underrepresentation in clinicaltrials [4] and the survival benefit expected to be small in this population. Most guidelines state that it can be considered for selected patients (younger age, good performance status, large tumors, visceral pleural invasion and inadequate staging) with stage IB (sixth TNM classification) disease. Advances in genome-wide sequencing and microarray analysis have stimulated research in molecular prognostics and have allowed the identification of molecular signatures that can promote a more precise classification and prognostication of human cancers. Recentstudies in patients with early-stage NSCLC have shown that genomic profiles constructed from patient series with long-term follow-up are able to outperform standard pathologic TNM staging in estimating risk ofdisease recurrence [5-9]. However, the signatures often contain large numbers of genes with limited information about their functional importance. This problem limits the clinical application of those signatures. The aim of this study was to identify novel genes increased or decreased in early relapse (ER) compared to no relapse (NR) lung cancer AD stage I patientsand determine the role of a specific signature of the tumor to predict patient prognosis. In this study we used a computational biology approach to construct a survival-related gene network in ADs and to identify genes which were consistently co-expressed with many survival-related genes important in multiple biological processes. The goal is to provide clinical oncologists with more information on tumor biology which could guide therapeutic interventions.

RESULTS

Patient characteristics

We identified 18 patients with stage I disease (TNM version 7.0) from a series of 110 consecutive early stage radically resected AD patients who were referred to the Thoracic Surgery Unit of the Perugia University Hospital, Italy. Thirteen of these 18 patients were defined as NR (patients without evidence of relapse: median follow-up 133.7 months, range 51.6-145.8) while 5 of 18 were classified as ER (patients relapsed within a year: median follow-up 10.4 months, range 0.5-54.2). Normal lung (NL) tissue taken from distant or contralateral lung from patients with AD was used as calibrator for microarray and quantitative PCR (Q-PCR) analysis. Clinical and pathologic characteristics of the two groups (n.13 NR and n.5 ER) are shown in Table 1. Median age was 64 (range 44-84) in NR patients and 67.4 (range 56-77) in ER patients; most of them were male (76.9% NR, 100% ER).

Table 1

Patient characteristics

Characteristics	Patients NR (n. 13)	Patients ER (n. 5)
Median age, years	64 (44-84)	67.4 (56-77)
Median follow-up, months (range)	133.7 (51.6-145.8)	10.4 (0.5-54.2)
Gender: Male/Female (%)	10/3 (76.9/23.1)	5/0 (100/0)
Ever smokers: Yes/No (%)	12/1 (92.3/7.7)	5/0 (100/0)
Stage: IA/1B (%)	8/5 (61.5/38.5)	1/4 (20/80)

Data analysis and real time Q-PCR validation

Microarray analysis showed a panel of 436 differentially expressed genes for NR vs NL (p<0.01) (232 increased and 204 decreased) and a panel of 342 differentially expressed genes forER vs NL (p<0.01) (179 increased and 163 decreased). The heat map plot for gene expression ratio (the (log10) of ER vs NL, NR vs NL and ERvs NR) is reported in Figure 1A. Based on the lower (0.10) and upper (0.90) quartiles of the distribution of the logarithm of the fold of ER versus NR, we selected 50 genes (19 increased and 31 decreased) shown in Figure 1B (additional data are available in Supplementary Table S1 and Table S2). We validated by Q-PCRthe expression levels of 19 increased and 31 decreased genes in three pools and in each individual patient sample. The logarithm of the ratio ER versus NR for the increased and decreased genes is shown in Figures 2A and 2B, respectively.

Figure 1

Microarray results

A. Heat map plot for gene expression ratio- the (log10) of ER vs Normal, NR vs Normal and ER vs NR. B. Heat map plot for the selected genes: lower (0.10) and upper (0.90) quartiles of the distribution of the logarithm of the fold of ER versus NR.

Figure 2

Validation experiments for the increased and decreased selected genes (orange diamond), Q-PCR pool (cyan square) and Q-PCR patients (violet triangle)

A. Logarithm of the ratio ER vs NR for the increased genes. B. Logarithm of the ratio ER vs NR for the decreased genes.

Microarray results

Validation experiments for the increased and decreased selected genes (orange diamond), Q-PCR pool (cyan square) and Q-PCR patients (violet triangle)

A. Logarithm of the ratio ER vs NR for the increased genes. B. Logarithm of the ratio ER vs NR for the decreased genes. Eighteen of the 19 increased genes showed a concordance of 94.7% between expression levels of microarray and those of the Q-PCR pool (r=0.32, p=0.004): only the IL1R1 gene showed a different value in the Q-PCR pool than the value of the microarray. Similar results were observed in the decreased genes (30 of the 31, (r=0.36 p<0.001): only the REG4 gene showed a different value in the Q-PCR pool than the value of the microarray. Fourteen of the 19 increased genes showed a concordance of 73.6% between expression levels of microarray and those of the patient Q-PCR (r=0.29, p=0.36): 4 genes (PPBP, ITGB8, IL1RL1, INSL4) showed a different value in the patient sample Q-PCR than that ofthe microarray. Twenty-five of the 31 decreased genes showed a concordance of 80.6% between expression levels of microarray and those of the patient Q-PCR (r=0.26, p=0.24): six genes (IGHD, GKN2, WIF1, KLB,PAPPA2, MSMB) showed a different value in the patient sample Q-PCR thanthe value of the microarray.

Computational analysis

The selected 14 increased and 25 decreased genes were used to perform a computational gene network analysis. The goal of computational analysis is to select a few key genes from the starting group that are relevant in the activated and inhibited gene networks. The activated gene network was generated with GeneMania tools using as query input the 14 increased genes. Figure 3Ashows the GeneMania network output- obtained setting as inputs: the human organism data, the query gene-based as weighting method, suggestedwhen the input gene list contains more than 5 genes, and co-expression,pathways and predicted as network interactions [10]. The black nodes are the input genes (H19 gene symbol was replaced with the alias CDKN1C gene symbol), the grey nodes are the recommended nodes (the dimensions are proportional to the GeneMania network score) and theviolet link thickness is related to the weight of co-expression interaction inferred from the public data and used in the network generation algorithms [11].

Figure 3

Computational analysis for increased and decreased genes

Computational analysis for increased and decreased genes

A. GeneMANIA networks of validated genes B. Community Landscape Analysis obtained with ModuLand plug-in. The four modules were plotted in the graph using a different color for each groupof nodes in the same module. C. Key nodes that predict the function of all four modules. D. GeneMANIA networks of validated genes E. Community Landscape Analysis obtained with ModuLand plug-in. The five modules were plotted in the graph using a different color for each groupof nodes in the same module. F. Key nodes that predict the function of all five modules. The GeneMania network was elaborated with the ModuLand plug-in that provides an algorithm for determining extensively overlapping network modules [12]. Figure 3B shows the four modules generated for the increased GeneMania network. The nodes in the same module have the same color. ModuLand also identifies several hierarchical layers of modules, where meta-nodes of the higher hierarchical layer represent modules of the lower layer. The tool assigned module cores, which predict the function of the whole module, and determine key nodes that connect two or multiple modules. For the activated gene network the key nodes were HOXA10, CLCA2, AKR1B10, FABP3 (Figure 3C). As for the increased genes, we applied the same steps as those used for the decreased genes (XIST gene symbol was replaced with the alias ECEL1 gene symbol). Figure 3D shows the GeneMania output. In this case there are yellow links that arethe predicted interactions not proposed by the recommender systems in increased genes. We applied the ModuLand algorithm to the decreased GeneMania network and five modules were obtained (Figure 3E). The selected genes as the central nodes that best predict the function of the five modules obtained with ModuLand were: SCGB1A1, PGC, CHIA and S100P (Figure 3F). CHIA and S100P were not included inthe selected decreased genes but are proposed by the GeneMania recommender systems that integrate the query input information with the public [13]. The same methodology appliedto the increased genes was reproduced to select the validated decreasedgenes using community centrality score [14]. CHIA and S100P were replaced with TFF1, PSCA, SPRR1B and PRSS1 that were the validated genes with the highest value of the score for their module. The 4 increased and 6 decreased genes with their functions are shown in Table 2.

Table 2

Function of selected genes

Gene Name	Expression	Functions	References
HOXA10	Increased	Homebox A10 (HOXA10) is part of the A cluster, on chromosome 7, of the class of transcription factorscalled homebox genes. It encodes a DNA-binding transcription factor that may regulate gene expression, morphogenesis, and differentiation. More specifically, it may function in fertility, embryo viability, and regulation of hematopoietic lineage commitment	[26]
CLCA2	Increased	The protein encoded by Chloride channel accessory 2 (CLCA2) gene, on chromosome 1, belongs to the calcium sensitive chloride conductance protein family. Since this protein is expressed predominantly in trachea and lung, it is suggested to play a role in the complex pathogenesis of cystic fibrosis. It may also serve as adhesion molecule for lung metastatic cancer cells, mediating vascular arrest and colonization, and furthermore, it has beenimplicated to act as a tumor suppressor gene for breast cancer.	[47] [48]
AKR1B10	Increased	Aaldo-keto reductase family 1,member B10 (AKR1B10) gene encodes a member of the aldo/keto reductase superfamily, which consists of more than 40 known enzymes and proteins. It is highly expressed in lung, adrenal gland, small intestine and colonand may play an important role in liver carcinogenesis.	[49]
FABP3	Increased	The intracellular fatty acid-binding proteins (FABPs) belong to a multigene family. FABPs are divided into at least three distinct types, hepatic-, intestinal- and cardiac-type. They participate in the uptake, intracellular metabolism and/or transport of long-chain fatty acids, in the modulation of cell growth and proliferation. FABP3 gene contains four exons and its function is to arrest growth of mammary epithelial cells. This gene is acandidate tumor suppressor gene for human breast cancer.	[50]
SCGB1A1	Decreased	Secretoglobin, family 1A, member 1(SCGB1A1) gene encodes a member of the secretoglobin family of small secreted proteins. The encoded protein has been implicated in numerous functions including anti-inflammation, inhibition of phospholipase A2 and the sequestering of hydrophobic ligands. Defects inthis gene are associated with a susceptibility to asthma	[51]
PSCA	Decreased	Prostate stem cell antigen (PSCA) gene encodes a glycosylphosphatidylinositol-anchored cell membrane glycoprotein. In addition to being highly expressed in the prostate it is also expressed in the bladder, placenta, colon, kidney and stomach. This gene is up-regulated in a large proportion of prostatecancers and is also detected in cancers of the bladder and pancreas. The function of PSCA in tumor biology and the regulatory mechanism of PSCA expression still remains unknown.	[30]
PGC	Decreased	Progastricsin (pepsinogen C) (PGC) gene encodes an aspartic proteinase that belongs to the peptidase family A1. The encoded protein is a digestive enzyme that is produced inthe stomach and constitutes a major component of the gastric mucosa. This protein is also secreted into the serum. This protein is synthesized as an inactive zymogen converted into its active mature format low Ph.	[52]
PRSS1	Decreased	Protease, serine, 1 (trypsin 1) (PRSS1) gene encodes a trypsinogen, which is a member of the trypsin family of serine proteases. This enzyme is secreted by the pancreas and cleaved to its active form in the small intestine. This gene and severalother trypsinogen genes are localized to the T cell receptor beta locuson chromosome 7.	[53]
TFF1	Decreased	Trefoil factor 1 (TFF1) gene is a members of the trefoil family. They are stable secretory proteins expressed in gastrointestinal mucosa. Their functions are not defined, but they may protect the mucosa from insults, stabilize the mucus layer,and affect healing of the epithelium. This gene, which is expressed in the gastric mucosa, has also been studied because of its expression in human tumors. This gene and two other related trefoil family member genes are found in a cluster on chromosome 21.	[54]
SPRR1B	Decreased	Small proline-rich proteins (SPRRs) multi-gene family maps on chromosome 1. Their expression is highin epithelia of oral tissues such as tongue, esophagus and stomach, in contrast to external dry epithelia, such as skin. It has a role during squamous differentiation of skin and respiratory epithelial cells. Moreover, SPRR1 is expressed in squamous tumors of the lung. However, its role in non-squamous cells is largely unknown; it seems that it alsooccurs in non-squamous tissues and cell lines. This protein family is an important component of the cornified cell envelope, a structure formed beneath the plasma membrane of squamous differentiated cells by extensive cross-linking of several proteins.	[55], [56], [57]

Independent validation set by Q-PCR

To further explore the impact of increased or decreased genes on disease free survival (DFS) we selected FABP3 (increased gene) and SCGB1A1 (decreased gene) as having the two highest GeneMania discriminat network scores (Supplementary Table S3) [13]. They were evaluated by Q-PCR analysis on an independent study set of 79ADs stage I samples. Of these patients, 63 were in NR with a median DFSof 31.7 months (range 12.1-71.4), while the remaining 16 were in ER with a median DFS of 7.4 months (range 0.9-11.8). Median age was 67 years (range 38-81) for NR patients and 68 years (55.0-75.5) for ER patients; most of them were male (66.7% NR, 62.5% ER) and smokers (69.8%NR, 93.8% ER) (Table 3 panel A). The box plots of mRNA expressions of FABP3 and SCGB1A1 in NR and ER groups are shown in Figure 4A and 4B,respectively. High FABP3 expressions were associated with ER patients (p=0.017) with respect to those in NR, while low SCGB1A1 expressions were showen in ER patients as compared to those in NR with border line significance (p=0.111). At univariate analysis, high FABP3 expression and low SCGB1A1 expression were independent predictors of shorter DFS (P=0.034, HR=3.39, CI 95% 1.09-10.52, p=0.037, HR=0.29 CI 95% 0.09-0.92,respectively), as reported in Table 3 panel B and Figure 4C and 4D, respectively.

Table 3

Independent patient sample: patient characteristics (Panel A) and disease free survival with Cox model for the selected genes (Panel B)

Panel A
Characteristics	Patients NR (n. 63)	Patients ER (n.16)
Median age, years	67.0 (38-81)	68.0 (55.0-75.5)
Median DFS, months (range)	31.7 (12.1-71.4)	7.4 (0.9-11.8)
Median follow-up, months (range)	37.2 (12.1-71.4)	9.4 (0.9-44.6)
Gender: Male/Female (%)	42/21 (66.7/33.3)	10/6 (62.5/37.5)
Ever smokers: Yes/No (%)	44/19 (69.8/30.2)	15/1 (93.8/6.2)
Stage: IA/1B (%)	42/21 (66.7/33.3)	8/8 (50.0/50.0)

Figure 4

Box plots and Kaplan-Meier estimates for disease-free survival (DFS) for an independent patient population

A. Box plot for the logarithm of FABP3 gene expression for NR and ER patients. B. Box plot for the logarithm of SCGB1A1 gene expression for NR and ER patients. C. Kaplan-Meier estimates for disease-free survival (DFS) according to lowand high FABP3 expression with respect to the mean of this gene expression in the study population. D. Kaplan-Meier estimates fordisease-free survival (DFS) according to low and high SCGB1A1 expression with respect to the mean of this gene expression in the studypopulation.

Box plots and Kaplan-Meier estimates for disease-free survival (DFS) for an independent patient population

DISCUSSION

The identification of new genes involved in the risk of early relapse of radically resected lung AD patients could assist intheir clinical management. Current evidence supports adjuvant cisplatin-based chemotherapy (four cycles) as standard of care in stagesII and III radically resected NSCLC, while most stage IA and IB patients receive only surgical resection. As a significant proportion ofstage I patients will relapse within 3 years, identification of early-stage patients with a poor prognosis could assist in selecting theappropriate candidates for adjuvant therapy. Several genomic and proteomic approaches have been made to identify signatures that can moreaccurately stratify NSCLC patients [15]. To our knowledge this is the first study that combines high throughput gene expression and computational analysis to identify stage I radically resected patients with an increased risk of relapse. This study proposes a new overall methodology to identify genesthat are relevant in the prediction of disease relapse starting from anRNA sample pool of patients with homogenous stage (only stage I) and histopathological type (ADs). Previous gene profiling studies [16-18]used non-homogenous patient stages and histotypes and a large number ofgenes were selected to predict outcomes that are not easily applicable in the clinical setting. The methodology of our study by computational and bioinformatic analysis allowed us to select a few genes that discriminated for risk of relapse and can be used in clinical practice. Microarray technology is commonly used in basic medical and biological research but has not yet been used for routine analysis in clinical practice. Today, various microarray platforms are used for generating RNA expression profiling and there is no gold standard, which probably are two possible explanations of conflicting results [16-19]. Real-time Q-PCR is considered the—gold standard' for gene expression analysis and is commonly used for validation of microarray results [20]. In our study, using Affymetrix microarray technology, we were able to identify a number of genes that were subsequently validated using real time Q-PCR, both in the pools and in individual RNA samples of selected patients in ER and NR disease. Validation of expression levels of increased and decreased genes by Q-PCR in the pools showed a concordanceof 94.7% with respect to expression levels of microarray data (p<0.001). Due to the variability of gene expression among the RNA samples, as observed in another study [21],the validation of expression levels of increased and decreased genes byQ-PCR in the individual patients showed a lower concordance (73.6% for increased and 80.6% for decreased genes) with respect to the values of microarray expression levels (p<0.001). Using computational analysis we were able to identify 4 increased and 6 decreased genes as the central nodes that best predict the function of the five modules obtained with ModuLand. The increased genes HOXA10, CLCA2, AKR1B10 and FABP3 were all present inthe validated genes, while for the decreased genes only SCGB1A1 and PGCwere in the validated gene list. CHIA and S100P were proposed by the GeneMania recommender systems and related to the centrality score to TFF1, PSCA, SPRR1B and PRSS1 that were in the selected list. Our ER patients presented a specific high expression of HOXA10, CLCA2, AKR1B10 and FABP3 (increased genes) and a low expression of SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1 (decreased genes) with respect to NR patients. As FABP3 and SCGB1A1 genes had the highest discriminant network score in the computational analysis, we used an independent population of stage I Ads to assess their prognostic significance on DFS. We demonstrated that high FABP3 expression and low SCGB1A1 expression was associated with a worse DFS. In NSCLC the prognostic role of FABP3 is not known, while in gastric cancer, high expression of heart-type FABP protein is associated with increased tumoraggressiveness, metastasis and poor prognosis [22]. Interestingly, in NSCLC cell lines, increased expression of heart-FABP appears to have a role in predicting response to gefitinib [23]. Davidson's study, using a methodology similar to that of our study, identified sets of genes that were differentially expressed in leiomyosarcoma (LMS) than in endometrial stromal sarcoma. Among these genes, FABP3 was predominantly overexpressed in LMS [24]. SCGB1A1 can be used as a marker of precancerous progression in the bronchial epithelium since it is involved in the process of damage repair and xenobiotic metabolism; when decreased it isoften associated with neoplastic transformation. As in our study, lowerlevels of this protein were associated with a worse clinical outcome ina population at high risk for lung cancer. The expression of SCGB1A1 also appears to be inversely correlated to regression of bronchial dysplasia and improvement in sputum cytometry assessment in smokers withhigh lung cancer risk. [25] HOXA10 has been reported to be increased in human lung cancer cell lines and lung tumor tissues with respect to normal lung [26]. CLCA2 has a significant effect on cell invasion and the metastatic process and its expression is increased as a result of cell damage. It regulates the apoptotic pathway, senescence and carcinogenicity trough p53 [27]. Moreover, the CLCA2 gene has been shown to act as a tumor suppressor in breast and colon cancer, where it is often decreased. On the contrary, overexpression of CLCA2 has been reported to be specifically associated with NSCLC. AKR1B10 has been proposed as a promising new diagnostic marker in NSCLC of smokers since it is often overexpressed inmoderate or poorly differentiated NSCLC (in 84% of squamous cell carcinomas and in 29% of adenocarcinomas), mostly in males and smokers [28]. Among decreased genes, PSCA has a role in signal transduction and in cell growth regulation. In a gastric cancer cell line it has been shown to have cell growth inhibitory activity [29] while in gallbladder adenocarcinoma it seems to be overexpressed and to correlate with decreased survival [30]. The expression status of PSCA in cancer cells appears to depend on the epithelium of their origin [31]. In NSCLC its expression is negatively correlated with prognosis [32]. The PGC gene has a general role in suppressing tumordevelopment and several studies reported that low expression of PGC protein was closely related to poor differentiation and unfavorable survival in patients with breast, prostate, gastric and ovarian cancers [33-35]. PRSS1 gene expression seems to induce apoptosis and to promote spontaneous pancreatitis [36] and its role is not known in NSCLC. TFFs is overexpressed in prostate, breast and lung AD and seems to be an indicator of worse prognosis. In lung adenocarcinoma, TFF1 gene expression is significantly associated with larger tumor size and acinar subtype [37]. Foekens et al. showed that decreased TFF1 in early breast cancer is an important variable for the identification of patients at high risk for recurrence and death [38]. In gastric tumors a decrease of TFF1 significantly reduced the apoptosis of the cell lines and facilitated their proliferation [39]. The SPRR1B gene has a role in the transition of cells to G0 and may disrupt the normal progression to mitosis resulting in changes in ploidy. SPRR1B is a potential biomarker for bronchial malignant transformation since its expression is markedly reduced duringtumor progression and its loss induces an irreversible malignant transformation [40]. In conclusion, our results indicate that it is possible to define, through gene expression and computational analysis, acharacteristic gene profiling of ER patients with an increased risk of disease relapse. We also validated FABP3 and SCGB1A1 genes as independent prognostic factors of worse DFS. This may pave the way to more effective patient selection for adjuvant studies and possibly yieldnovel therapeutic targets with potential for drug development.

MATERIALS AND METHODS

Overall workflow method

We introduced a workflow diagram to synthetize the overall methodology applied in this paper (Figure 5). The goal of the study was to select a subgroup of genes relevant for risk of early relapse in NSCLC patients. From a database of consecutive resected NSCLC patients we identified stage I adenocarcinomas. The samples of patients in ER, NR and normal lung (NL) tissue were then identified. Three RNA pools were created, one for each group (ER, NR, NL) that were the input of the Microarray Affymetrix technology. The 47000 probes output for ER, NR, e NL were analysed using bioinformatic tools: detection validation analysis (ER vs NL and NR vs NL), ranking analysis (ER vs NR) and log distribution fold changes quartiles analysis(ER vs NR). These analyses were used to select 19 increased and 31 decreased genes. A real time quantitative-PCR (Q-PCR) validation step for the selected genes was performed using single patient samples and three RNA pools. We selected the 14 increased and 25 decreased genes that showed a concordance between microarray and Q-PCR expression levelsin patients and pools. These genes were used as input for computationalanalysis based on GeneMania and ModLand tools of Cytoscape. We obtained4 increased and 6 decreased genes that could be used as a genetic signature to predict early relapse. Furthermore, an independent set of 79 DCs stage I samples was used to validate selected genes by Q-PCR.

Figure 5

Study flowchart

Patient selection

As illustrated in Figure 5, using aprospective tissue banking of 110 consecutive resected NSCLC patients at S. Maria della Misericordia Hospital in Perugia-Italy, we selected frozen specimens of patients with stage I lung adenocarcinoma [2], [41]. Patients and their corresponding tissues were divided as follow: 1) no relapse (NR) based on the estimated risk of recurrence of these patients [42]; 2) early relapse (ER) disease relapse within a year from surgery; 3) normallung (NL) tissue taken from distant or contralateral lung from both groups (1 and 2) of patients. From these three groups of patients, threeRNA pools were created: pool ER, pool NR, pool NL. Tissues were stored in a freezer at −80°C in a solution containing RNAlater (Qiagen S.p.A., Milan, Italy). Samples had to contain at least 50% tumor cells to be eligible for microarray analysis as determined by one reference pathologist (B.G.) on adjacent separate sections. The study was approved by the local Ethics Committeeand was conducted in accordance with ethical principles of the latest version of the Declaration of Helsinki. Written informed consent for gene expression analyses was obtained from each patient entering the study.

RNA extraction, microarray and Q-PCR

Total RNA was extracted from frozen tissues after thawing and homogenizing by IKA Ultra-Turrax and QIAzol Lysis Reagent. RNA was extracted with the phenol chloroform method. From the aqueous phase, RNA was automatically purified by BioRobot EZ1using EZ1 RNA Universal Tissue kit instrument according to the manufacturer's instructions (Qiagen S.p.A., Milan, Italy). RNA was eluted in 50 μl of RNase-free water and stored at −80°C until use. The quality, integrity and quantity of the total RNA was evaluated on Experion™ Bioanalyzer (Biorad Technologies, Italy). To create the three pools, a concentration of 1 γ for each sample belonging to each pool was used. The concentration of the three pools was assessed and the volume of each pool was calculated to obtain the required concentration (5 γ) for the Microarray. We compared gene expression profiling from pool-NL and cancer specimens from pool-NR and pool-ER, using Affimetrix human microarray HG-U133Plus 2.0 that evaluates the expression levels of more than 47,000 human transcripts and variants, including 38,500 well-characterized genes in asingle experiment. We used reverse transcription PCR (RT-PCR) to createfrom RNA complementary DNA (cDNA) using random primer technologies. Expression of cDNA for increased and decreased genes was measured in thethree pools (NR, ER, NL) and in each of the lung tissues of corresponding patients by real-time Q-PCR using SybrGreen Gene Expression Assays on Stratagene Mx3000P QPCR System. Premier Biosoft Beacon Designer was used to design the primers. The housekeeping gene was the HPRT (Hypoxantin PhosphoRibosyl Transferase); the calibrator wasthe RNA pool of normal lung tissue. All runs included a calibrator sample and a one no-template control, and all samples were measured in triplicate. Relative quantification was carried out using the 2−ΔΔCt method using HPRT as reference gene [43]. Increased and decreased genes were presented using radar plot for the logarithm of the rate of ER versus NR. The microarray data, pool and patient Q-PCR mean of the gene expressions were compared with the 0 of the logarithm of the ER and NR ratio. The correlation and the concordance were tested with Kendall's correlations with a poisson regression.

Bioinformatics analysis

Filtering criteria were applied to select significant genes. A one-sided Wilcoxon's Signed Rank test was the statistical method used to calculate the detection p-value (<0.01) that reflects the significance of the differences between perfect match intensity and mismatch intensity for each probe pair of the microarray. Next, a ranking analysis of microarray data based on F-statistic [44] (P< 0.01) was used to identify statistically significant genes from ERversus NR experiments. Agilent SIGNET and Matlab software were used to perform bio-informatic analysis. The increased and decreased genes to be analyzed in the validate setting were chosen considering the logarithm of the ratio between ER and NR and applying the lower and the upper quantiles (0.10 and 0.90) of the distribution.

Real time Q-PCR in the independent validation set

An independent validation set of 79 ADs patients collected at the S. Maria della Misericordia Hospital (Perugia, Italy) from 2008 to 2014 was used to further investigate, using Q-PCR, the impact of increased or decreased genes on disease- free survival (DFS). We evaluated FABP3 (increased gene) and SCGB1A1 (decreased gene) representing the network central nodes. Quantification of mRNA expression levels of two genes, were performed by real-time one-step RT-PCR using QuantiFast assay (Qiagen, Milan, Italy). The results were compared considering β-actin (internal reference gene) and the pool of normal tissues (calibrator) using the 2−ΔΔCtmethod. DFS and the 95% confidence intervals (CIs) were evaluated by the Kaplan–Meier method comparing the different groups by log-rank test. The Cox proportional hazards model was used to evaluate the prognostic role of each single parameter studied on DFS, in univariate analyses.

Gene network computational analysis

Gene network analysis was performed to evaluate the network representation of the molecular relationships between increased and decreased selected genes in ER vs NR. GeneMANIA, a Cytoscape plug-in, was used to predict increased and decreased gene interactions and to expand the networks with functionally similar genes, using available genomics and proteomics data [10]. Networks were generated using information derived from the co-expression, pathways and predicted categories. Key network nodes were identified through the analysis of network properties including connectedness and centrality. We used the ModuLand algorithm, implemented as plug-in of Cytoscape, to analyze the properties of the activated or increased and inhibited or decreased networks that were obtained using GeneMANIA. The ModuLand algorithm produced clusters (or modules) represented by key network nodes [12]. The function of these centralnodes best predicts the function of the module it represents. The central nodes are also likely to represent the key functional elements of the overall network and therefore can be used to prioritise future work [45], [46].

57 in total

1. Prostate stem cell antigen, a presumable organ-dependent tumor suppressor gene, is down-regulated in gallbladder carcinogenesis.

Authors: Hiroe Ono; Nobuyoshi Hiraoka; Yeon-Su Lee; Sang Myung Woo; Woo Jin Lee; Il Ju Choi; Akira Saito; Kazuyoshi Yanagihara; Yae Kanai; Sumiko Ohnami; Fumiko Chiwaki; Hiroki Sasaki; Hiromi Sakamoto; Teruhiko Yoshida; Norihisa Saeki
Journal: Genes Chromosomes Cancer Date: 2011-09-20 Impact factor: 5.006

Review 2. Microarray data analysis: from disarray to consolidation and consensus.

Authors: David B Allison; Xiangqin Cui; Grier P Page; Mahyar Sabripour
Journal: Nat Rev Genet Date: 2006-01 Impact factor: 53.242

3. The breast cancer beta 4 integrin and endothelial human CLCA2 mediate lung metastasis.

Authors: M Abdel-Ghany; H C Cheng; R C Elble; B U Pauli
Journal: J Biol Chem Date: 2001-04-24 Impact factor: 5.157

4. Initiation of assembly of the cell envelope barrier structure of stratified squamous epithelia.

Authors: P M Steinert; L N Marekov
Journal: Mol Biol Cell Date: 1999-12 Impact factor: 4.138

5. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.

Authors: A Bhattacharjee; W G Richards; J Staunton; C Li; S Monti; P Vasa; C Ladd; J Beheshti; R Bueno; M Gillette; M Loda; G Weber; E J Mark; E S Lander; W Wong; B E Johnson; T R Golub; D J Sugarbaker; M Meyerson
Journal: Proc Natl Acad Sci U S A Date: 2001-11-13 Impact factor: 11.205

6. GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop.

Authors: J Montojo; K Zuberi; H Rodriguez; F Kazi; G Wright; S L Donaldson; Q Morris; G D Bader
Journal: Bioinformatics Date: 2010-10-05 Impact factor: 6.937

7. Three-gene prognostic classifier for early-stage non small-cell lung cancer.

Authors: Suzanne K Lau; Paul C Boutros; Melania Pintilie; Fiona H Blackhall; Chang-Qi Zhu; Dan Strumpf; Michael R Johnston; Gail Darling; Shaf Keshavjee; Thomas K Waddell; Ni Liu; Davina Lau; Linda Z Penn; Frances A Shepherd; Igor Jurisica; Sandy D Der; Ming-Sound Tsao
Journal: J Clin Oncol Date: 2007-12-10 Impact factor: 44.544

8. Lung adjuvant cisplatin evaluation: a pooled analysis by the LACE Collaborative Group.

Authors: Jean-Pierre Pignon; Hélène Tribodet; Giorgio V Scagliotti; Jean-Yves Douillard; Frances A Shepherd; Richard J Stephens; Ariane Dunant; Valter Torri; Rafael Rosell; Lesley Seymour; Stephen G Spiro; Estelle Rolland; Roldano Fossati; Delphine Aubert; Keyue Ding; David Waller; Thierry Le Chevalier
Journal: J Clin Oncol Date: 2008-05-27 Impact factor: 44.544

9. Focal adhesion kinase activated by beta(4) integrin ligation to mCLCA1 mediates early metastatic growth.

Authors: Mossaad Abdel-Ghany; Hung-Chi Cheng; Randolph C Elble; Bendicht U Pauli
Journal: J Biol Chem Date: 2002-07-10 Impact factor: 5.157

10. Expression of human cationic trypsinogen (PRSS1) in murine acinar cells promotes pancreatitis and apoptotic cell death.

Authors: T Athwal; W Huang; R Mukherjee; D Latawiec; M Chvanov; R Clarke; K Smith; F Campbell; C Merriman; D Criddle; R Sutton; J Neoptolemos; N Vlatković
Journal: Cell Death Dis Date: 2014-04-10 Impact factor: 8.469

20 in total

1. Loss of RUNX1 is associated with aggressive lung adenocarcinomas.

Authors: Jon Ramsey; Kelly Butnor; Zhihua Peng; Tim Leclair; Jos van der Velden; Gary Stein; Jane Lian; C Matthew Kinsey
Journal: J Cell Physiol Date: 2017-11-01 Impact factor: 6.384

2. Identification of oral squamous cell carcinoma markers MUC2 and SPRR1B downstream of TANGO.

Authors: Tomonori Sasahira; Miyako Kurihara-Shimomura; Hiroyuki Shimomura; Anja Katrin Bosserhoff; Tadaaki Kirita
Journal: J Cancer Res Clin Oncol Date: 2021-02-23 Impact factor: 4.553

3. Reverse phase protein array (RPPA) combined with computational analysis to unravel relevant prognostic factors in non- small cell lung cancer (NSCLC): a pilot study.

Authors: Vienna Ludovini; Rita Chiari; Lorenzo Tomassoni; Chiara Antonini; Elisa Baldelli; Sara Baglivo; Annamaria Siggillino; Francesca Romana Tofanetti; Guido Bellezza; K Alex Hodge; Emanuel Petricoin; Mariaelena Pierobon; Lucio Crinò; Fortunato Bianconi
Journal: Oncotarget Date: 2017-06-14

4. Digoxin-induced anemia among patients with atrial fibrillation and heart failure: clinical data analysis and drug-gene interaction network.

Authors: Yubi Lin; Siqi He; Ruiling Feng; Zhe Xu; Wanqun Chen; Zifeng Huang; Yang Liu; Qianhuan Zhang; Bin Zhang; Kejian Wang; Shulin Wu
Journal: Oncotarget Date: 2017-06-16

5. Differences in reproductive toxicology between alopecia drugs: an analysis on adverse events among female and male cases.

Authors: Min Wu; Qingxiong Yu; Qingfeng Li
Journal: Oncotarget Date: 2016-12-13

6. Detection of oncogenic mutations in resected bronchial margins by next-generation sequencing indicates early relapse in stage IA lung adenocarcinoma patients.

Authors: Tangfeng Lv; Jiawei Zou; Hongbing Liu; Qin Shen; Zhenfeng Lu; XiaoJun Zhou; Xiaonan Wang; Yong Song
Journal: Oncotarget Date: 2017-06-20

7. Targeted next-generation sequencing reveals recurrence-associated genomic alterations in early-stage non-small cell lung cancer.

Authors: William C S Cho; Kien Thiam Tan; Victor W S Ma; Jacky Y C Li; Roger K C Ngan; Wah Cheuk; Timothy T C Yip; Yi-Ting Yang; Shu-Jen Chen
Journal: Oncotarget Date: 2018-11-20

8. INSL4 as prognostic marker for proliferation and invasiveness in Non-Small-Cell Lung Cancer.

Authors: Damiano Scopetti; Danilo Piobbico; Cinzia Brunacci; Stefania Pieroni; Guido Bellezza; Marilena Castelli; Vienna Ludovini; Francesca Romana Tofanetti; Lucio Cagini; Angelo Sidoni; Efisio Puxeddu; Maria Agnese Della-Fazia; Giuseppe Servillo
Journal: J Cancer Date: 2021-05-05 Impact factor: 4.207

Review 9. The Role of AKR1B10 in Physiology and Pathophysiology.

Authors: Satoshi Endo; Toshiyuki Matsunaga; Toru Nishinaka
Journal: Metabolites Date: 2021-05-21

10. Upregulation of lncRNA ZFAS1 promotes lung adenocarcinoma progression by sponging miR-1271-5p and upregulating FRS2.

Authors: Gang Fan; Jin Jiao; Feng Shen; Fulu Chu
Journal: Thorac Cancer Date: 2020-06-08 Impact factor: 3.500