Literature DB >> 25392690

Integrated DNA Copy Number and Gene Expression Regulatory Network Analysis of Non-small Cell Lung Cancer Metastasis.

Seyed M Iranmanesh1, Nancy L Guo2.   

Abstract

Integrative analysis of multi-level molecular profiles can distinguish interactions that cannot be revealed based on one kind of data in the analysis of cancer susceptibility and metastasis. DNA copy number variations (CNVs) are common in cancer cells, and their role in cell behaviors and relationship to gene expression (GE) is poorly understood. An integrative analysis of CNV and genome-wide mRNA expression can discover copy number alterations and their possible regulatory effects on GE. This study presents a novel framework to identify important genes and construct potential regulatory networks based on these genes. Using this approach, DNA copy number aberrations and their effects on GE in lung cancer progression were revealed. Specifically, this approach contains the following steps: (1) select a pool of candidate driver genes, which have significant CNV in lung cancer patient tumors or have a significant association with the clinical outcome at the transcriptional level; (2) rank important driver genes in lung cancer patients with good prognosis and poor prognosis, respectively, and use top-ranked driver genes to construct regulatory networks with the COpy Number and EXpression In Cancer (CONEXIC) method; (3) identify experimentally confirmed molecular interactions in the constructed regulatory networks using Ingenuity Pathway Analysis (IPA); and (4) visualize the refined regulatory networks with the software package Genatomy. The constructed CNV/mRNA regulatory networks provide important insights into potential CNV-regulated transcriptional mechanisms in lung cancer metastasis.

Entities:  

Keywords:  DNA copy number variation; lung cancer; mRNA gene expression; regulatory networks

Year:  2014        PMID: 25392690      PMCID: PMC4218678          DOI: 10.4137/CIN.S14055

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

Lung cancer has the highest mortality rate for both men and women in the world. Lung cancer includes non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC constitutes 85% of all lung cancer cases, and the other 15% lung cancers are SCLC. NSCLC has two major subtypes of histology: squamous cell lung carcinoma and lung adenocarcinoma (AC).1 Surgery is the main treatment option for stage I NSCLC.2 Unfortunately, 35–50% of stage I NSCLC patients will die from relapse or metastasis within 5 years after surgery and adjuvant chemotherapy for patient stages II and III have not shown good results.3 It is a challenging issue for physicians to identify patients for chemotherapy. In spite of a few studies describing transcriptional profiling for lung cancer prognosis,3,4 currently, there is no fully validated clinical model for predicting lung cancer prognosis or chemoresponse.5 Therefore, it remains an important research issue to identify prognostic and predictive genes for improving lung cancer treatment. In the search of clinically important biomarkers, many studies have ranked genes based on clinical outcome and used top-ranked genes in a classifier.6 However, rank-based feature selection methods cannot model the complex interactions among genes in disease.7 It has been shown that the genes that have a statistically significant association with outcome results are not necessarily good classifiers.8–10 Discovering networks of molecular interactions is much more important than finding a list of genes in personalized medicine.11 Molecular network analysis had been shown to be useful in disease classification12 and identification of novel therapeutic targets.13 DNA copy number variations (CNVs) refer to the copy number changes in a chromosomal segment, often observed in tumor tissues.14,15 Several studies have shown that CNV may be linked to the change of expression levels in some genes.14,16,17 Recent studies estimate that CNVs are responsible for >15% of heritable variation in gene expression (GE).18 The relation between GE and DNA copy number is complex. For cancer cells in general, the impact of CNV on GE is found to be important. Many studies have investigated the relation between CNV and GE. Many of them consider correlation between GE and CNV, in a manner of gene by gene across all samples.19,20 Nevertheless, the correlation between CNV and GE was not consistent across all genes.21 Moreover, each cancer is unique and has its own driver genes. Driver genes are involved in cancer pathogenesis, in contrast to “passenger genes,” which mutate during pathogenesis without having an effect on cancer.22 It is an important issue to identify driver genes and distinguish them from “passenger genes” that have no discerning advantage.23 Integrative analysis of CNV and GE could recognize potential cancer driver genes.24 A challenge in finding the candidate drivers is that the number of regulators (potential candidate drivers) is large and is related to the amount of available data. As CNV regions contain a large number of genes, it is difficult to identify a driver gene that is a correct regulator.25 Therefore, there is a need to use some information in addition to DNA copy number, such as GE, in the search for the functionally important driver genes. CONEXIC23 is a method that combines both CNV and GE to detect driver genes located in a deleted or amplified region and constructs regulatory networks based on the identified driver genes. Each potential driver gene changes in some tumors and is considered to have a major role in regulating expression of a group of genes.23 In this study, we present a novel framework based on CONEXIC to integrate CNV and GE and apply it to model lung cancer progression. First, candidate driver genes were selected from a pool of genes that were either amplified or deleted in NSCLC tumors, or were associated with metastasis at DNA copy number level or mRNA expression level. Second, CONEXIC was used to construct regulatory networks based on these candidate drivers and rank the importance of the driver genes. Third, because the predicted regulatory networks were large, Ingenuity Pathway Analysis (IPA) was used to reduce them to the networks of experimentally validated interactions. Finally, these refined regulatory networks were visualized with Genatomy.26

Materials and Methods

DNA copy number and mRNA profiles in patient cohorts

Two datasets were used in this study. The first dataset contains 271 NSCLC tumor samples with DNA copy number profiles. These 271 samples are histologically divided into lung AC (n = 179) and squamous cell carcinoma (SQCC; n = 92). This dataset also contains GE profiles for 49 samples (n = 29 for AC; n = 20 for SQCC). This dataset is available in NCBI Gene Expression Omnibus (GEO) with accession number GSE31800. DNA copy number profiles of the first dataset were quantified for each sample with whole-genome tiling path array comparative genomic hybridization (aCGH). Details of the genomic array, DNA extraction, labeling, and hybridization were described previously.27 aCGH is a technique for measuring the changes in chromosomal segments.28 The main difference between CGH and mRNA expression is that DNA is hybridized rather than mRNA transcript.29 CGH consists of log-ratio normalized intensities from disease versus normal samples. With resolution enhancements, aCGH is becoming more powerful. Consequently, this method has more advantages comparing to cytogenetic techniques such as fluorescence in situ hybridization (FISH).29 The GE for this dataset was generated by Affymetrix GeneChips30 and is available at GEO with accession number GSE31800. The second dataset includes SQCC tumors with DNA copy number profiles (n = 201) and mRNA expression profiles (n = 132). DNA copy number values were generated with Agilent 415 microarrays.31 Copy number estimations for each tumor were refined using tangent normalization. Tangent normalization divides tumor signals by signal intensities from the linear combination of all normal samples.31 GE of this dataset was quantified using Agilent 244k microarrays. Preparation, hybridization, and processing to produce GE was previously described,32 and the dataset is available in The Cancer Genome Atlas (TCGA) data portal1 Lung SQCC section.31 Clinical information of these two patient cohorts is shown in Table 1.
Table 1

Clinical information of patient cohorts analyzed in this study.

VARIABLESTCGA DATASET (31) (n=201)GEO DATASET (GSE31800; n=271)
Gender
Male28%NA
Female72%NA
Race
White65%NA
Black or African American5%NA
Asian2%NA
NA28%NA
Histological type
Lung adenocarcinoma0179
Lung squamous cell carcinoma20192
Age
Mean ± std67.5 ± 8.5NA
[Min, Max] (Median)[39, 85] (68)NA
Vital status
Alive58%NA
Dead42%NA
Pathological tumor stage
Stage I55%NA
Stage II24%NA
Stage III21%NA
Ethnicity
Not Hispanic or Latino59%NA
Hispanic or Latino2%NA
NA39%NA
Tobacco smoking history
Current Smoker16%NA
Reformed smoker for < or = 15 years56%NA
Reformed smoker for > 15 years21%NA
Non-Smoker4%NA
NA3%NA

Matching DNA probes to genes

DNA copy number values were assigned to the corresponding genes with a mapping scheme. DNA copy number datasets contain information about the chromosomal location of each probe and its copy number value. There is no chromosomal information of genes in GE datasets, requiring the use of some public repositories such as UCSC Genome Browser2 and MatchMiner3. To find the chromosomal location for each gene, gene names should be entered to the software in a batch mode, and the output includes chromosomal location. The DNA copy number probes must then be matched to the corresponding genes. Among different methods to match genes to copy number probes, distance matching33 was used. This method matches each gene to the closest probe that is on the same chromosome.

Detecting DNA copy number aberrations

There is a need to convert raw log-ratio values in the original aCGH data to absolute measures of DNA copy numbers. These absolute measures usually have three states: loss (less than two copies), normal (two copies), and gain (three to four copies). This process is named “calling.”34 CGHcall was used to detect these different DNA copy number states.34 CGHcall classifies different regions of CNV based on mixture models. First, it uses circular binary segmentation (CBS) to segment DNA copy number values of aCGH. The CBS method has been shown as one of the strongest methods for segmentation.35 Second, CGHcall considers the fact that loss, normal, and gain levels are not uniform across all the samples. It allows fluctuation by using random effects. Finally, it uses a combination of segmentation results and mixture model to figure out the most likely state per segment.34 This package was used in its default setting with human as the tissue type.

CONEXIC

CONEXIC23 is a computational algorithm that performs integrative analysis of CNV and GE in cancer. This algorithm is based on module network36 and combines matched GE and CNV of the samples to identify cancer driver genes. CONEXIC uses a Bayesian scoring function to detect the combination of modulators among the amplified or deleted regions, which explains the behaviors of GE modules across patient samples. The score measures how well a modulator can predict the behaviors of a GE module. CONEXIC searches for high-score modulators to recognize most probable driver genes in a stepwise manner: (1) selection of candidate drivers, (2) a single modulator step that builds an initial network between candidate drivers and gene modules, and (3) an iterative network learning step that improves the initial model. This method not only searches for important driver genes but also constructs regulatory networks. The concept of regulation networks or regulation program originates from Segal et al.36 This software package was used in its bootstrapping mode. A score function was used to rank cancer driver genes in the CONEXIC method. The score function is a Bayesian function that maximizes the joint probability of data and model structure. Let D be the data and S represent the structure of the network. The scoring function is: The first term is the likelihood of the data for a given model, and it has a normal gamma distribution. The normal gamma distribution function gives higher score to the data with lower variance. Therefore, it splits the data into two completely different distributions. The second part is a priori on the structure of data, which is a penalty score on network complexity. The penalty function has two parts. The first part penalizes the number of leaves in each regulation program using exponential distribution over the total number of leaves. The second part is a network width penalty function that penalizes (1) the number of genes in the module for each modulator and (2) the number of distinct split values that a modulator has. Hence, the scoring function would be: The parameters of the score are β, x, y, which can be determined by optimizing the log-likelihood of data through cross-validation test.23,37

IPA

IPA4 is a curated database that enables the search and visualization of experimentally confirmed molecular interactions and networks. Published molecular interactions can be explored from this web-based software application. This database gathers information from multiple resources, including experimental repositories or text collections of published literatures. In this study, only direct interactions among genes in human tissues were considered experimentally and confirmed molecular interactions in IPA analysis.

Genatomy

After constructing regulatory networks with CONEXIC and refining them using IPA, Genatomy was used for visualizing the networks. Genatomy is a software toolset for visualizing biological data such as GE, genotypes, and DNA copy number information.26 The software package was used based on its default settings. The CNV and GE values were normalized with the min and standard deviation to be in the range of [−1, 1].

Results

The scheme to perform integrative analysis of CNV and GE contains the following steps (Fig. 1). First, candidate driver genes were selected using DNA copy number information and mRNA prognostic genes. CGHcall34 was used to detect the aberrant regions of DNA copy numbers. Genes that showed consistent copy number aberrations in NSCLC compared with normal tissues in both datasets were selected. In addition, genes that had a significant association with patient survival time based on their copy number status were selected. The remaining candidate driver genes were mRNA prognostic genes identified in our previous studies. Second, after identifying candidate driver genes, CONEXIC was used to construct regulatory networks for good prognosis and poor prognosis NSCLC patients, respectively. Third, as the constructed regulatory networks were large, IPA was used to reduce the gene interactions to only experimentally validated ones. Finally, the Genatomy package was used to visualize the refined regulatory networks.
Figure 1

Overview of integrative analysis of DNA copy number and GE.

Selecting candidate driver genes

The selection of candidate driver genes consists of two parts: first, DNA copy number information was used to find the aberrant regions and second, mRNA prognostic genes previously identified in our studies7,38,39 were included. To identify candidate driver genes based on DNA copy numbers, the CGHcall package was used. This method distinguishes three different states of CNV: loss (deletion), normal, and gain (amplification). Genes in the normal state have two copy numbers, in a loss state have less than two copies, and in a gain mode have three or more copy numbers. The CGHcall package was run on three data-sets (the dataset GSE31800 containing both AC and SQCC was analyzed separately according to histology). Based on results from CGHcall, genes that showed to be aberrant in more than 50 percent of the NSCLC tumor samples compared with normal tissues were selected for further analysis. Supplementary Tables S1–S3 list the genes with CNV in three corresponding datasets: SQCC and AC in the dataset (GSE31800), respectively, and SQCC in the TCGA dataset.31 To further select relevant genes with CNV in NSCLC initiation, genes that had consistent CNV among different datasets were pinpointed. There were 30 genes in common with CNV of type loss between AC and SQCC in the dataset GSE31800 (Table 2). Among these genes, NEUROG1 was found to be an important marker in early detection of colorectal cancer.40 LYN was a mediator of epithelial—mesenchymal transition and a therapeutic target of dasatinib in breast cancer.41 CPNE3, ABCB9, and AZIN1 were shown to be involved in promoting NSCLC metastasis or mediating chemoresponse.42–44 Table 3 shows the common genes among SQCC tumors in the TCGA dataset31 and the GSE31800 dataset. Only three genes (C3orf31, SELT, and C3orf52) were found to have consistent CNV with gain copy number aberration in both patient cohorts. Among them, C3orf52 was a prognosis gene of cancer in a US patent.45 Genes listed in Tables 2 and 3 showed a consistent CNV in multiple patient cohorts and histology of NSCLC, indicating their important role in NCLSC initiation.
Table 2

Genes with consistent DNS CNVs between lung AC and SQCC in the dataset GSE31800. The percentage in this table stands for the percent of CNV in the corresponding patient cohort.

GENE NAMECHROMOSOME #AC N=179 [ACCN GSE31800]SQCC N=92 [ACCN GSE31800]CNV TYPE
UBE1DC13q22.151%65%Loss
CMTM63p22.351%65%Loss
ULK43p22.151%65%Loss
NKX2–55q3478%78%Loss
NEUROG15q23–q3178%78%Loss
LOC4411506p21.178%76%Loss
C6orf1536p21.178%76%Loss
C6orf1346p21.3378%76%Loss
C6orf1736q22.3278%76%Loss
C6orf1946p22.178%76%Loss
HIBADH7p15.268%67%Loss
IFRD17q31.168%67%Loss
PMPCB7q22.168%67%Loss
TRIB18q24.1351%68%Loss
AZIN18q22.351%68%Loss
CTSB8p2270%79%Loss
IKBKB8p11.270%79%Loss
IMPAD18q12.170%79%Loss
CPNE38q21.370%79%Loss
TUSC38p2270%79%Loss
RIPK28q2170%79%Loss
LYN8q1370%79%Loss
ENTPD48p21.370%79%Loss
ABCB912q2467%66%Loss
SPRYD412q13.367%66%Loss
DDIT312q13.1 s67%66%Loss
RAB22 A20q13.3271%64%Loss
NCOA620q1171%64%Loss
PRPF620q13.3371%64%Loss
STX1620q13.3271%64%Loss
Table 3

Genes with consistent CNV in the TCGA dataset (31) and SQCC samples in dataset GSE31800. The percentage in this table stands for the percent of CNV in the corresponding patient cohort.

GENE NAMECHROMOSOME #SQCC (n=201) (31)SQCC (n=92) (GSE31800)CNV TYPE
C3orf313p25.280%62%Gain
SELT3q25.182%62%Gain
C3orf523q13.280%64%Gain
Next, we sought to identify prognostic genes using three categories of CNV status (amplification, normal, and deletion). A Cox proportional hazards model was fit for the SQCC patients in the TCGA dataset.31 Genes with a significant (P < 0.05; Cox model) association between their CNV status and survival time in SQCC patients are listed in Table 4. Hazard ratios and their confidence intervals for the gain and loss categories are shown as well. A total of 30 genes had a significant association with patient survival time based on the gain status of DNA copy number, and 11 genes showed a strong association with survival time with their loss status. TADA3L is the only gene that had a strong association with survival time for both gain and loss DNA copy number value. TADA3L was associated with tumor suppressor gene p53 during the differentiation of hematopoietic stem and progenitor cells.46 As metastasis/relapse is the major treatment failure of lung cancer, aberrations in DNA copy numbers of genes in Table 4 are related to NSCLC metastasis and relapse.
Table 4

Genes with a significant association between CNV and survival time in SQCC tumors in the TCGA dataset (31). Top genes are identified based on P-value. Hazard ratios and their confidence intervals are shown. Significant hazard ratios are marked with an asterisk (*).

GENE NAMEP-VALUEGAIN-HAZARD (95% CI)LOSS-HAZARD (95% CI)
C14orf1730.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
BRMS1L0.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
CHURC10.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
NEK90.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
CIDEB0.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
SERPINA30.00072.7851 [1.61, 4.81]*1.3415 [0.77, 2.32]
C14orf1720.00082.7625 [1.59, 4.77]*1.3195 [0.76, 2.28]
PTPRU0.00130.7796 [0.49, 1.23]0.2738 [0.12, 0.61]*
SEMA3F0.00140.4337 [0.24, 0.76]*1.1128 [0.54, 2.28]
TADA3L0.00150.3342 [0.19, 0.57]*0.4007 [0.18, 0.86]*
CCNB1IP10.00162.6560 [1.52, 4.62]*1.3451 [0.77, 2.34]
C14orf790.00162.6560 [1.52, 4.62]*1.3451 [0.77, 2.34]
UPB10.00160.3726 [0.20, 0.66]*0.7158 [0.42, 1.20]
SH3BP10.00160.3726 [0.20, 0.66]*0.7158 [0.42, 1.20]
TCF200.00160.3726 [0.20, 0.66]*0.7158 [0.42, 1.20]
MUSTN10.00200.4559 [0.25, 0.80]*1.1888 [0.57, 2.47]
ZC3H140.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
C14orf500.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
TDP10.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
C14orf240.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
KIAA08310.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
PTGER20.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
FOXN30.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
PRMT50.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
RIPK30.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
ZFP36L10.00212.5397 [1.46, 4.39]*1.2468 [0.71, 2.16]
LGR60.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
RFX50.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
CD1D0.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
FLVCR10.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
MRPS140.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
DAB10.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
PUSL10.00220.9346 [0.59, 1.47]0.3007 [0.13, 0.66]*
CSDC20.00220.3736 [0.20, 0.67]*0.7728 [0.46, 1.29]
P2RXL10.00220.3736 [0.20, 0.67]*0.7728 [0.46, 1.29]
PNPLA30.00220.3736 [0.20, 0.67]*0.7728 [0.46, 1.29]
CSNK1E0.00220.3736 [0.20, 0.67]*0.7728 [0.46, 1.29]
CCDC1170.00220.3736 [0.20, 0.67]*0.7728 [0.46, 1.29]
ATP8B20.00231.0134 [0.64, 1.59]0.3111 [0.13, 0.69]*
NPPB0.00231.0111 [0.64, 1.58]0.3108 [0.13, 0.69]*
Additional candidate driver genes were from mRNA prognostic genes identified in our previous studies, showing a significant association with NSCLC patient survival and/or chemoresponse.7,38,39,47 These genes were further validated with Reverse transcription polymerase chain reaction (RT-PCR) assays of additional NSCLC patient samples (unpublished results). Supplementary Table S4 shows the complete list of candidate driver genes considered for constructing regulatory networks. Among these candidate driver genes, 30 genes had loss (Table 2) and 3 had gain CNV (Table 3) in the studied NSCLC cohorts. Forty genes had a strong association with SQCC survival time (Table 4), and the rest of the genes are mRNA prognostic genes identified in our previous studies.7,38,39,47 These candidate driver genes were used to construct regulatory networks.

Constructing CNV/mRNA regulatory networks

Having selected candidate driver genes as the first step, the second and third steps of the CONEXIC method were used to construct regulatory networks and rank the importance of driver genes based on DNA copy number and GE information. In the second step of the CONEXIC method, the single modulator step, network initialization between candidate drivers and gene modules, was completed. Genome-wide CNV and GE datasets were filtered before this step. Specifically, genes with a variance above 0.2 in each patient cohort were considered for further analysis, because genes with a constant level of copy number or expression across all samples are unlikely to have a role in the regulatory networks.23 After the data filtering, the TCGA dataset31 was separated into good prognosis and poor prognosis groups. Patients who survived more than 3 years after surgery were grouped as good prognosis, whereas those who had survival time less than 2.5 years were grouped as poor prognosis. This single modulator step was performed with non-parametric bootstrapping for 10 times23 on good and poor prognosis groups, respectively. The bootstrapping method guarantees the robustness of the results. The modulators that were selected in at least 60% of the runs were picked. These modulators were used for a final run of the single modulator step. After performing the second step, the relation among genes was initialized and the basic regulatory networks were created. These networks were considered as the starting point for the third step, which is the network learning step. The third step of the CONEXIC method was performed with non-parametric bootstrapping to ensure robustness of the results. This step was also completed for good prognosis and poor prognosis groups, separately. This step was also run 10 times, and the candidate driver genes that were selected in 60% of the runs were selected for the final run. Supplementary Table S5 shows the top-ranked genes (modulators) after running the final network learning step. As a result, each of these modulators is related to a large number of genes in the constructed module network. As mentioned earlier, part of candidate driver genes were mRNA prognostic biomarkers identified from our previous studies, with potential clinical utilities for personalized lung cancer treatment.7,38,39,47 After constructing regulatory networks, we sought to investigate the functional roles of these mRNA prognostic genes. As the regulatory networks were constructed separately for good prognosis and poor prognosis patients, respectively, investigating important driver genes and regulatory modules could potentially reveal important mechanisms in NSCLC progression and metastasis and their impact on clinical outcome. Table 5 shows the regulatory role of top-ranked prognostic mRNA biomarkers in poor prognosis patients. Specifically, SAMD4B, APOA2, ATAD4, and VASH1 were top-ranked modulators (driver genes) in poor prognosis, with a potential functional involvement in rendering adverse clinical outcome in NSCLC patients. The other mRNA prognostic genes in Table 5 were not modulators, but they are in the modules of other driver genes such as CTSB. Table 6 shows the regulatory role of the mRNA prognostic genes in the good prognosis dataset. In particular, ADH1B, CCL19, FHL1, VASH1, and RB1 were top-ranked modulators (driver genes) in good prognosis group. Other mRNA prognostic genes were not modulators, but they were in the modules that were regulated by driver genes such as RB1 and VASH1. In the comparison of Tables 5 and 6, modulators in good prognosis and poor prognosis groups were vastly different, suggesting that NSCLC tumor progression and the ultimate clinical outcome are driven by very different molecular mechanisms.
Table 5

mRNA prognostic biomarkers identified in our previous studies7,38,39 ranked as top driver genes in poor prognosis SQCC patients.

GENE NAMEMODULATOR? (# MODULES)*MODULATED BYREFERENCE
SCLYNoIFRD139
TNFSF9NoCTSB38
CD27NoSTAT439
DAG1NoSELTN/A#
SAMD4BYes (3)No modulatorN/A#
THBS1NoCTSB39
XPO1NoPTGER239
C8orf70NoPTGER239
STK24NoCTSB39
AKAP13NoCTSB38
APOA2Yes (3)No modulator7
CCL19NoSTAT47
CLIC2NoSTAT47
COL14A1NoVASH17
HMBOX1NoVASH17
IRF3NoSAMD4B7
ATAD4Yes (5)No modulator7
SLC39A8NoVASH17
APIN1NoIFRD17
TAF4NoSELT7
TOMM34NoCTSB7
VASH1Yes (2)SAMD4B7
VIPR2NoSERPINA37
HFENoIFRD139
HNF4ANoCTSBN/A#
STAT6NoSERPINA347

Notes:

If a gene is a modulator (driver gene), the number of modules regulated by it is listed in parentheses.

Unpublished mRNA prognostic biomarkers associated with NSCLC outcome.

Table 6

mRNA prognostic biomarkers identified in our previous studies7,38,39 ranked as top driver genes in good prognosis SQCC patients.

GENE NAMEMODULATOR? (# MODULES)*MODULATED BYREFERENCE
OGTNoC6orf13438
CCDC99NoCCL1939
CD27NoCCL1939
DAG1NoC6orf134N/A#
XPO1NoPRMT539
C8orf70NoRB139
AKAP13NoRB138
MSX2NoRB138
ADH1BYes (4)STAT47
ANXA6NoVASH17
CCL19Yes (4)No Modulator7
CLIC2NoADH1B7
COL14A1NoVASH17
FHL1Yes (1)VASH17
ICA1NoRB17
IRF3NoC6orf1347
IVDNoVASH17
SLC39A8NoADH1B7
SPIN1NoPRMT57
TAF4NoC6orf1347
VASH1Yes (4)ADH1B7
HFENoC14orf5039
RB1Yes (7)No Modulator7
STAT6NoCCL1947
ZNF638NoC6orf13438
UBE1L2noC6orf13439

Notes:

If a gene is a modulator (driver gene), the number of modules regulated by it is listed in parentheses.

Unpublished mRNA prognostic biomarkers associated with NSCLC outcome.

Refinement and visualization of CNV/mRNA regulatory networks

From the results of CONEXIC analysis, top-ranked driver genes were examined for further analysis. Specifically, CTSB was one of the top selected driver genes in poor prognosis patient group, and it was shown to modulate most mRNA prognostic genes, including TNSF9, THBS1, STK24, AKAP13, TOMM34, and HFE. Therefore, CTSB was selected for further analysis. For good prognosis group, RB1 was selected for further analysis because it is a prognostic biomarker gene by itself7 and modulated multiple mRNA prognostic biomarkers, including C8orf70, AKAP13, MSX2, and ICA1. In the regulatory network analysis, CTSB was related to 1,043 other genes in 10 different modules. RB1 modulated 681 genes in the 7 constructed regulatory modules. To refine the constructed regulatory networks, IPA was used to select experimentally validated molecular interactions in the module networks. Experimentally confirmed molecular interaction network for CTSB in poor prognosis SQCC patients is shown in Figure 2a, and the confirm network for RB1 in good prognosis SQCC patients is shown in Figure 3a, respectively. Next, Genatomy package was used to visualize the refined regulatory networks based on the text output from CONEXIC. Figure 2b shows the modules regulated by CTSB in poor prognosis SQCC patients, focusing on the modules involving previously identified mRNA prognostic genes. Similarly, Figure 3a shows RB1-modulated mRNA expression in good prognosis SQCC patients, focusing on the modules involving our identified mRNA biomarkers. These results reveal potential regulatory mechanisms of how CNV in a driver gene regulates the expression of a group of genes, among which are prognostic and predictive mRNA biomarkers associated with NSCLC metastasis and chemoresponse. The interactions among these genes in the computationally constructed regulatory networks were experimentally confirmed in the published literature with IPA.
Figure 2

Regulatory networks of CTSB in poor prognosis SQCC patient group. (A) Experimentally validated network of CTSB in poor prognosis SQCC patients (31) with IPA analysis. (B) CNV/mRNA regulatory network of CTSB in poor prognosis SQCC patients (31).

Notes: The top color bar under the driver gene shows the mRNA expression and the below one shows CNV. Green color indicates overexpression/amplification and red color indicates under-expression/deletion.

Figure 3

Regulatory networks of RB1 in good prognosis SQCC patient group. (A) Experimentally validated networks for RB1 in good prognosis SQCC patients (31) with IPA analysis. (B) CNV/mRNA regulatory network of RB1 in good prognosis SQCC patients (31).

Notes: Green color indicates overexpression/amplification, and red color indicates under-expression/deletion.

Discussion and Conclusion

Lung cancer is a complex disease involving numerous somatic mutations, amplifications, and deletions. Tumor relapse and metastasis is the major cause of treatment failure (ie, death) in lung cancer. Chromosomal CNVs are shown to be functionally important in regulating GE changes and genotypes in tumorigenesis and metastasis. While using GE changes alone has been shown to be a precise and feasible tool for clinical diagnostics and prognostics,48 it does not necessarily reveal molecular functional involvement and cancer mechanisms. Integrative analysis of GE and copy number is expected to identify cancer driver genes and prioritize them, as well as reveal disease mechanisms and provide insight into novel therapeutic targets. Regulatory network approaches have been developed for detecting GE affected by DNA CNVs, genetic biomarkers, or motif data. Lirnet,25 Geronimo,37 and CONEXIC23 are three major approaches based on module networks, which were originally conceived by Segal et al.36 CONEXIC was used in this study to construct CNV/mRNA regulatory networks because it models CNV together with GE, whereas Lirnet and Geronimo have not been used in such applications before. There are some other regulatory networks that were developed to model other molecular data types,49,50 but not with GE and DNA copy numbers. The novelty of CONEXIC is that it does not require the correlation between CNV and GE in the search for driver genes. Many methods for identifying driver genes among aberrant regions focus on the genes with correlated CNV and GE.51,52 As a matter of fact, in many cases, the expression of a driver gene is poorly connected with the DNA copy number.23 The reason behind this is that CNV is only one of the many causes to induce up-regulation or down-regulation of a gene. In fact, the expression of many driver genes is less correlated with their copy numbers compared with passenger genes. In contrast, the expression of genes in a module is more related to the expression of the corresponding driver gene. The algorithm of CONEXIC was designed based on these biological contexts, which has the following advantages over other methods. First, by assessing both CNV and GE, CONEXIC provides a better way to find important areas in the aberrant chromosomal regions as potential candidate driver genes, comparing to the methods that consider only DNA copy number. Second, CONEXIC associates the expression of a driver gene to the expression of genes in the corresponding module in constructing regulatory networks.23 However, this could be considered as a limitation of CONEXIC, as it does not consider CNV in constructing regulatory networks in the second and third steps of the algorithm. One of the future improvements on this method would be considering two types of regulators (GE regulators and CNV regulators) and combining these two types of regulators in constructing regulatory networks. In this study, we designed a framework based on CONEXIC to integrate CNV with GE to identify important driver genes in NSCLC and reveal experimentally confirmed regulatory networks in metastasis. In our approach, the candidate driver genes were selected and validated using multiple patient cohorts. Instead of using Genomic Identification of Significant Targets in Cancer (GISTIC) method53 as proposed in the original CONEXIC method, CGHcall34 was used to classify different regions of CNV based on mixture models. The selected candidate driver genes had either consistent CNV in different NSCLC patient cohorts and histology subtypes or a strong association with patient survival information based on their CNV status or mRNA expression levels. Next, regulatory networks were constructed for good prognosis and poor prognosis patient groups separately. The candidate driver genes were the same for both groups. After running CONEXIC with bootstrapping, the top-ranked driver genes and the constructed regulatory networks were very different in both groups, indicating that different molecular mechanisms render different clinical outcomes in NSCLC patients. As these computationally derived networks contained a large number of modules, IPA was used to select the genes and molecular interactions in the modules that were experimentally validated and reported in the literature. In this way, the gene modules were refined to the experimentally validated ones. Finally, Genatomy was used to visualize the refined regulatory networks. These experimentally confirmed regulatory networks reveal important CNV-regulated transcriptional activities in NSCLC metastasis. As many genes in the candidate drivers were NSCLC prognostic or chemoresponse predictive biomarkers identified in our previous studies, these results shed light on potential innovative therapeutic targets for NSCLC treatment. Supplementary Tables 1–5. Tables S1–S3 list the genes with CNV in three corresponding datasets: SQCC and AC in the dataset (GSE31800), and SQCC in the TCGA dataset. Table S4 shows the complete list of candidate driver genes considered for constructing regulatory networks. Table S5 shows the top-ranked genes (modulators) after running the final network learning step.
  52 in total

Review 1.  Array comparative genomic hybridization and its applications in cancer.

Authors:  Daniel Pinkel; Donna G Albertson
Journal:  Nat Genet       Date:  2005-06       Impact factor: 38.330

2.  Gene Ontology-driven transcriptional analysis of CD34+ cell-initiated megakaryocytic cultures identifies new transcriptional regulators of megakaryopoiesis.

Authors:  Peter G Fuhrken; Chi Chen; Pani A Apostolidis; Min Wang; William M Miller; Eleftherios T Papoutsakis
Journal:  Physiol Genomics       Date:  2008-02-05       Impact factor: 3.107

Review 3.  Lung cancer.

Authors:  Roy S Herbst; John V Heymach; Scott M Lippman
Journal:  N Engl J Med       Date:  2008-09-25       Impact factor: 91.245

Review 4.  Protein networks in disease.

Authors:  Trey Ideker; Roded Sharan
Journal:  Genome Res       Date:  2008-04       Impact factor: 9.043

5.  Integrated analysis of copy number alterations and gene expression: a bivariate assessment of equally directed abnormalities.

Authors:  Martin Schäfer; Holger Schwender; Sylvia Merk; Claudia Haferlach; Katja Ickstadt; Martin Dugas
Journal:  Bioinformatics       Date:  2009-10-14       Impact factor: 6.937

6.  Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances.

Authors:  S Solinas-Toldo; S Lampel; S Stilgenbauer; J Nickolenko; A Benner; H Döhner; T Cremer; P Lichter
Journal:  Genes Chromosomes Cancer       Date:  1997-12       Impact factor: 5.006

7.  Predicting cancer drug response by proteomic profiling.

Authors:  Yan Ma; Zhenyu Ding; Yong Qian; Xianglin Shi; Vince Castranova; E James Harner; Lan Guo
Journal:  Clin Cancer Res       Date:  2006-08-01       Impact factor: 12.531

8.  MicroRNA-31 inhibits cisplatin-induced apoptosis in non-small cell lung cancer cells by regulating the drug transporter ABCB9.

Authors:  Zhuo Dong; Zhiwei Zhong; Lihua Yang; Shaomin Wang; Zhaohui Gong
Journal:  Cancer Lett       Date:  2013-10-04       Impact factor: 8.679

9.  Confirmation of gene expression-based prediction of survival in non-small cell lung cancer.

Authors:  Nancy L Guo; Ying-Wooi Wan; Kursad Tosun; Hong Lin; Zola Msiska; Daniel C Flynn; Scot C Remick; Val Vallyathan; Afshin Dowlati; Xianglin Shi; Vincent Castranova; David G Beer; Yong Qian
Journal:  Clin Cancer Res       Date:  2008-12-15       Impact factor: 12.531

10.  Learning a prior on regulatory potential from eQTL data.

Authors:  Su-In Lee; Aimée M Dudley; David Drubin; Pamela A Silver; Nevan J Krogan; Dana Pe'er; Daphne Koller
Journal:  PLoS Genet       Date:  2009-01-30       Impact factor: 5.917

View more
  4 in total

1.  Network and pathway analysis of cancer susceptibility (a).

Authors:  Nancy Guo; Russell S Schwartz; Jiang Qian; Peilin Jia; Youping Deng
Journal:  Cancer Inform       Date:  2014-03-12

2.  Identifying subtype-specific associations between gene expression and DNA methylation profiles in breast cancer.

Authors:  Garam Lee; Lisa Bang; So Yeon Kim; Dokyoon Kim; Kyung-Ah Sohn
Journal:  BMC Med Genomics       Date:  2017-05-24       Impact factor: 3.063

3.  A machine learning framework for genotyping the structural variations with copy number variant.

Authors:  Tian Zheng; Xiaoyan Zhu; Xuanping Zhang; Zhongmeng Zhao; Xin Yi; Jiayin Wang; Hongle Li
Journal:  BMC Med Genomics       Date:  2020-08-27       Impact factor: 3.063

4.  Disease progression and pseudoprogression following partial response with rare gingiva metastasis upon IBI308 (sintilimab) immunotherapy reflects the complexity of metastatic lung cancer: a case report.

Authors:  Haiping Jiang; Lele Song; Zhenguang Chen; Yuan Gao; Yongfeng Ding; Qihan You; Lei Ye; Lifeng Li; Guifeng Liu; Nong Xu
Journal:  Transl Cancer Res       Date:  2021-01       Impact factor: 1.241

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.