Literature DB >> 25335079

Integrated exon level expression analysis of driver genes explain their role in colorectal cancer.

Mohammad Azhar Aziz1, Sathish Periyasamy2, Zeyad Al Yousef3, Ibrahim AlAbdulkarim1, Majed Al Otaibi1, Abdulaziz Alfahed1, Glowi Alasiri1.   

Abstract

Integrated analysis of genomic and transcriptomic level changes holds promise for a better understanding of colorectal cancer (CRC) biology. There is a pertinent need to explain the functional effect of genome level changes by integrating the information at the transcript level. Using high resolution cytogenetics array, we had earlier identified driver genes by 'Genomic Identification of Significant Targets In Cancer (GISTIC)' analysis of paired tumour-normal samples from colorectal cancer patients. In this study, we analyze these driver genes at three levels using exon array data--gene, exon and network. Gene level analysis revealed a small subset to experience differential expression. These results were reinforced by carrying out separate differential expression analyses (SAM and LIMMA). ATP8B1 was found to be the novel gene associated with CRC that shows changes at cytogenetic, gene and exon levels. Splice index of 29 exons corresponding to 13 genes was found to be significantly altered in tumour samples. Driver genes were used to construct regulatory networks for tumour and normal groups. There were rearrangements in transcription factor genes suggesting the presence of regulatory switching. The regulatory pattern of AHR gene was found to have the most significant alteration. Our results integrate data with focus on driver genes resulting in highly enriched novel molecules that need further studies to establish their role in CRC.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25335079      PMCID: PMC4204855          DOI: 10.1371/journal.pone.0110134

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

There is a wealth of information at omics level that associates cytogenetics and gene expression changes leading to colorectal cancer (CRC). The integration of gene expression and copy number (CN) data to identify DNA CN alterations that induce changes in the expression levels of the associated genes is a common task in cancer studies [1]–[3]. The central dogma of molecular biology has thus been addressed at two important levels. There have been many reports providing evidence of changes at the genome level in the form of copy number aberration [4], single nucleotide polymorphisms, loss of heterozygosity that attempt to understand the molecular events associated with colorectal cancer. These somatic or hereditary changes have different mechanism of contributing to initiation and progression of CRC. Loss and gain of crucial chromosomal regions leading to deletion or amplification of cancer related genes has been very well established. The functional significance of these molecular events has been measured using different tools and algorithms. Genes targeted by somatic copy-number alterations (SCNAs), in particular, play central roles in oncogenesis and cancer therapy [5]. Several tools have been made available to assess the potential of genes that get affected by SCNAs in causing colorectal cancer. 'Genomic Identification of Significant Targets in Cancer' (GISTIC) tool has successfully been used in identifying 'driver SCNAs' based on the frequency and amplitude of observed events [6], [7]. The second aspect of changes happening in tumour cells is at the transcription level. Differential expression analysis has been carried out to find out important genes playing a role in causing colorectal cancer. There could be several mechanisms by which the SCNA affected genes exert their effect at functional level. Amplifications and deletions in the genomic region are reflected in the transcript levels and could be detected by carrying out expression microarray based studies. By employing exon arrays, we gain extra dimension of the events happening at the exon level, which may lead to alternative splicing resulting in different gene isoforms. Alternative splicing is a crucial step in the generation of protein diversity and its misregulation is observed in many human cancer types [8]. The quest to explore the relationship between copy number changes and the expression level of affected genes/exons has received limited success owing to a number of reasons [9]. Technological improvements in the array design for cytogenetics as well as transcriptomics have improved the accuracy and precision of the data generated. Combining this with better analytical techniques and algorithms, possibilities of finding target genes responsible for causing colorectal cancer has further increased. Past few decades have seen a quest for finding novel genes that can serve as therapeutic targets or biomarkers. However, genes or proteins do not function alone but interact with each other to form networks or pathways so as to carry out biological functions [10]. Network-based approaches to finding biomarkers more closely represent in vivo molecular biology where a perturbation in one gene may affect many downstream genes. Cancer has thus been rightly addressed as a systems biology disease [11] as opposed to diseases caused by changes in few genes or mutations. Reconstructing gene regulatory networks in healthy and diseased tissues is therefore critical to understanding cancer phenotypes and devising effective therapeutics [12]. With the availability of tools and techniques to capture the molecular changes happening at different stages of the central dogma with increased precision and accuracy, we are yet to develop an integrated comprehensive picture that could help us in finding better targets for colorectal cancer. In the present study, we aim to integrate the information from the cytogenetic analysis with the exon level expression data using paired normal-tumour samples from colorectal cancer patients. A set of driver genes suggested by GISTIC analysis (termed as ‘driver genes’ from now onwards) were queried at gene, exon and network level. Both DNA and RNA for cytogenetics and transcriptomic studies, respectively, were extracted from the same tissue in a single workflow to minimize variation. This study provides evidence for explaining different possible mechanisms by which SCNA affected driver genes can exert their functional effect. A subset of driver genes were found to show gene level changes in expression. Most of these genes also indicated the exon level changes resulting in the formation of different isoforms. Network of GISTIC genes showed a clear shift in the transcription factors (TF) regulation. ATP8B1 gene was found to have novel association with colorectal cancer at cytogenetic, gene and exon level.

Materials and Methods

The study is approved by the ethics committee and Institutional Review Board (IRB) of King Abdullah International medical Research Center after due review process of the ethical aspects of the proposal. The necessary procedural and ethical consent forms were signed by each patient prior to sample collection.

Sample collection and RNA extraction

Sample collection was done as described previously [13]. The type and stage of all patient samples are provided in Table S1. The study was approved by institutional review board after the due process. Patients were consented and the records maintained in an approved manner. RNA was extracted from the same piece of tissue that was used to extract DNA for cytogenetic studies in a single workflow. Maceherey Nagel trio prep kit (Germany) was used to extract DNA and RNA in the same protocol. Quality and quantity was checked using Nanodrop (Thermo Fisher Scientific, USA).

Exon microarray

GeneChip Human Exon 1.0 ST Arrays along with WT Terminal Labelling and Controls Kit and Hybridization, Wash, and Stain Kit were obtained from Affymetrix USA. Ambion WT Expression Kit was obtained by Ambion, USA. 31 Tumour and 29 normal samples from 32 patients were processed. The data was extracted using Expression Console software from Affymetrix, USA. Quality control was carried out using Principal Component Analysis (PCA) and Integromics biomarker suite (TIBCO spotfire). All the data is deposited in GEO database with an accession number GSE50421.

Data Analysis

Before carrying out any gene/exon level analysis, principal component analysis (PCA) was done to identify outliers. Data from 4 normal samples and 7 tumour samples was subsequently removed.

Data analysis specifically with driver genes

Gene level analysis

For checking expression levels of 144 driver gene list generated by GISTIC analysis (13), we employed two different softwares – Expression console and AltAnalyze [14]. Two different softwares were used to confirm our results using independent methods. Signal estimates were derived from the CEL files of 60 samples (29 normal and 31 tumour) using Robust Multi-Array Average (RMA) for normalizing the data. The core exon-level probe sets were used to summarize the gene expression levels. The same list of 144 driver genes with expression values calculated using ‘Altanalyze’ was used for inference based (GENIE3) pathway/network analysis.

Exon level expression analysis

Altanalyze program was used to evaluate alternative splicing in driver genes. The raw data was filtered to remove probe sets that were considered to be non-expressed. A splicing score for filtered exons was calculated using splicing index method and exon/intron/splicing annotation were assigned to these results. A splice index p-value cut-off of <0.05 was used to filter alternative exon results. AltExonViewer – a component of Altanalyze and DomainGraph - a Cytoscape plugin were used to visualize splice index values and alternatively spliced exons. The splicing index (SI) value was calculated as described in [15]. Briefly, SI is log2 ratio of normalized intensities of tumour and normal samples. In our analysis ‘sample 1’ in the numerator was normal and ‘sample 2’ in the denominator was tumour.

Causal network analysis

For network analysis, knowledge and inference based approaches were used. For inference based approaches, GENIE3 [16], [17] was used to generate gene regulatory networks for tumour and normal samples. To generate the network, driver genes were classified as TF genes and target genes. Although 1000 interactions were inferred for each of the group, an interaction score>0.1 was chosen as a cut-off value. Using this information the regulatory networks were independently inferred for tumour and normal samples using Cytoscape.

Data analysis with entire probeset

Integromics Biomarker Discovery Suite

In order to find differentially expressed genes in our dataset, without any prior bias, data from CEL files was analyzed using the Integromics software (TIBCO Spotfire, USA) pipeline for affymetrix exon 1.0 ST arrays. Quantile normalization was done after removing the outliers using PCA. Both Significance Analysis of Microarrays (SAM) and Linear Models for MicroArray data (LIMMA) analyses was carried out with a cut-off of 0.01 for adjusted p-value and fold change of>1 or <−1. Two different yet complimentary methodologies were used to make our results more confident. Gene ontology enrichment was carried out on a list of 760 differentially expressed genes obtained from LIMMA analysis.

Ingenuity pathway Analysis

List of 760 genes from SAM/LIMMA analysis done using Integromics was used to perform ‘core’ and ‘biomarker’ analyses. Core analysis was done as described before [13]. For biomarker analysis following filters were used: Consider only molecules where (species  =  Human) AND (tissues/cell lines  =  KM-12 OR HCT-116 OR RKO OR Colon Cancer Cell Lines not otherwise specified OR COLO205 OR HT29 OR HCC-2998 OR HCT-15 OR SW-480 OR Other Colon Cancer Cell Lines OR Tissues and Primary Cells not otherwise specified OR SW-620) AND (diseases  =  Cancer) AND ((biomarker applications  =  All Biomarker Applications) AND (biomarker diseases  =  colon cancer OR colon carcinoma OR colon neoplasm OR colorectal adenoma OR colorectal cancer OR colorectal carcinoma)).

Results

The analysis strategy leading to following results is illustrated in Figure 1a&b.
Figure 1

Flow diagram for Analysis Strategy.

A)The entire analyses is categorized into fours stages from ‘Data Generation’ to ‘Network Analyses'. B) Analysis strategy using different programs is displayed in this diagram. There are three components of the analysis – Gene, Exon and Network handled by different programs. Gene level analyses are conducted using ‘Affymetrix, Expression/Transcriptome analysis console’ and ‘Tibco Spotfire’. Exon level analysis is carried out by ‘AltAnalyze’ and ‘Affymetrix power tools’. Network analyses employed ‘GENIE3’, ‘IPA’ and ‘Cytoscape’. ‘Nexus Copy Number’ is a program used in earlier studies to eventually generate a list of 144 driver genes.

Flow diagram for Analysis Strategy.

A)The entire analyses is categorized into fours stages from ‘Data Generation’ to ‘Network Analyses'. B) Analysis strategy using different programs is displayed in this diagram. There are three components of the analysis – Gene, Exon and Network handled by different programs. Gene level analyses are conducted using ‘Affymetrix, Expression/Transcriptome analysis console’ and ‘Tibco Spotfire’. Exon level analysis is carried out by ‘AltAnalyze’ and ‘Affymetrix power tools’. Network analyses employed ‘GENIE3’, ‘IPA’ and ‘Cytoscape’. ‘Nexus Copy Number’ is a program used in earlier studies to eventually generate a list of 144 driver genes.

A small subset of genes identified by GISTIC show a significant change in expression level

We studied expression patterns of driver genes at gene level using AltAnalyze and Expression Console softwares. These analyses produced complimentary results and yielded a list of 20 genes that were found to have a significant fold change of greater than 2 and a p-value <0.01 [Table 1]. 9 genes experienced a down regulation. BCAS1 with highest GISTIC score of 5.323 was among the most significantly downregulated genes. 11 genes showed an upregulation with IL6 and INHBA showing the highest fold change [Figure 2a (AltAnalyze), 2b (Expression Console)]. Fold change values of all 144 driver genes are given in Table S2.
Table 1

Driver Genes from GISTIC analysis showing more than two fold change in expression value as calculated by two different programs – AltAnalyze and Expression Console.

Gene SymbolGISTIC scoreAltAnalyzeExpression Console
BCAS1a , b 5.323−4.50771−5.02
AURKAa , b 4.7723.0440882.31
ATP8B1a 2.326−2.49709−2.55
IL6a , b 26.3533182.45
INHBAa , c 1.9443.0719114.18
PTP4A3a , b , c 1.8922.3669512.41
ABP1a , b 1.859−2.54991−2.78
AGR31.772−5.12325N/A
MACC11.7723.2729253.57
UCA11.1552.058481N/A
FGFR2a , b 0.892−2.72455−3.28
ADAM12a , b 0.8714.6467412.33
MUC130.704−2.15979N/A
MUC4a , b , c 0.66−4.18466−5.06
SLC7A11a 0.6373.0513632.72
ICAM10.5322.655218N/A
EXO10.4052.2683132.05
FAM46Ca 0.271−1.7528−2.03
HTRA10.2312.301062N/A
SLC35D1a 0.217−2.0238−2.06

N/A  =  No annotation for these genes were found in the analysis program.

Genes found to be differentially expressed in SAM/LIMMA analyses. CLDN7 and LOX genes were additional driver genes that were differentially expressed.

Genes found to have high splice index values.

Genes found to be eligible as biomarkers for colorectal cancer.

Figure 2

Significant change in expression value at gene level was observed in 20/144 genes.

Two different algorithms were used to measure expression values from Exon array data to support the results. AltAnalyze (1a) and Expression Console (1b) show complimentary results with maximum changes observed in BCAS1, INHBA, IL6 and MUC4 genes.

Significant change in expression value at gene level was observed in 20/144 genes.

Two different algorithms were used to measure expression values from Exon array data to support the results. AltAnalyze (1a) and Expression Console (1b) show complimentary results with maximum changes observed in BCAS1, INHBA, IL6 and MUC4 genes. N/A  =  No annotation for these genes were found in the analysis program. Genes found to be differentially expressed in SAM/LIMMA analyses. CLDN7 and LOX genes were additional driver genes that were differentially expressed. Genes found to have high splice index values. Genes found to be eligible as biomarkers for colorectal cancer. Three significantly down regulated genes (BCAS1, ABP1 and AGR3) were found in amplified regions of the genome whereas upregulated HTRA1 was found mostly in regions of loss. These genes experienced a change in their transcription factor in tumour and normal samples [Table 2].
Table 2

Differentially regulated genes found to have incoherent expression levels and genomic changes.

GeneGenomic levelGene level (AA, EC)Network level
ABP1Gain 53.33, Loss 13.33−2.55, −2.78TF in Normal  =  CTCFL TF Tumour  =  Unknown
AGR3Gain 60, Loss 0−5.12TF in Normal  = SMAD2 TF Tumour  =  Unknown
BCAS1Gain 80, Loss 0−4.5, −5TF in Normal  = SMAD4 TF Tumour  =  Unknown
HTRA1Gain 6.7, Loss 26.72.3TF in Normal  = CTCFLTF Tumour  = AHR

AA  =  Fold change value as calculated by AltAnalyze program.

EC  =  Fold change value as calculated by Expression Console program.

TF  =  Transcription Factor. Unknown is the TF that is not found in the driver genes.

AA  =  Fold change value as calculated by AltAnalyze program. EC  =  Fold change value as calculated by Expression Console program. TF  =  Transcription Factor. Unknown is the TF that is not found in the driver genes. Differential expression analysis using tumour-normal paired sample data of exon arrays from 32 patients was carried out. After removing outliers using PCA [Figure 3], 25 normal and 24 tumour samples were found suitable for further analyses. We employed non-parametric (SAM) and parametric (LIMMA) methods and found complimentary results. 6242 genes were differentially expressed with adjusted p-value of <0.01. 760 genes were found to be differentially regulated (fold change>1or <−1) of which 15 genes were common with driver genes from GISTIC analysis [Figure 4a-c and Figure S1]. BCAS1, AURKA, ATP8B1, IL6 and INHBA were the differentially regulated genes found to be among the top scorers in list of driver genes.
Figure 3

Principal Component Analysis of Exon array data from 32 patients.

60 samples from 32 patients were subjected to PCA and the outliers were removed. 4 normal and 7 tumour samples were removed from the final analysis.

Figure 4

Differential Expression analysis of genes yielded genes common with GISTIC analysis.

(a) Venn diagram of common genes among GISTIC, SAM and LIMMA analyses. All genes were annotated and compared using IPA ‘compare’ function. 43 genes from Integromics analyses were not mapped by IPA.15 genes are common among all three analyses. (b) Binned fold change bar chart of LIMMA analysis. Total 6242 genes were having adjusted p-value <0.01 of which 759 showed significant fold change (<−1 or>1). (c) Volcano plot showing highly significant genes (pink  = downregulated, orange = upregulated) in terms of p- value and fold change.

Principal Component Analysis of Exon array data from 32 patients.

60 samples from 32 patients were subjected to PCA and the outliers were removed. 4 normal and 7 tumour samples were removed from the final analysis.

Differential Expression analysis of genes yielded genes common with GISTIC analysis.

(a) Venn diagram of common genes among GISTIC, SAM and LIMMA analyses. All genes were annotated and compared using IPA ‘compare’ function. 43 genes from Integromics analyses were not mapped by IPA.15 genes are common among all three analyses. (b) Binned fold change bar chart of LIMMA analysis. Total 6242 genes were having adjusted p-value <0.01 of which 759 showed significant fold change (<−1 or>1). (c) Volcano plot showing highly significant genes (pink  = downregulated, orange = upregulated) in terms of p- value and fold change.

Significant changes in isoform expression is exhibited by genes identified by GISTIC analysis

Exon level analysis of 144 driver genes was carried out. 29 exons belonging to 13 genes were shown to have significant changes in isoform expression as reflected in their splice index scores [Table 3]. While exons E25-1 of MUC4 and E3-2 of PTP4A3 showed high negative SI values, exons E2-2 of IL6 and E21-2 of ADAM12 showed high positive SI values. Negative SI values indicate exons are enriched in tumour samples and are skipped or repressed in normal samples and vice versa for positive SI values. For MUC4 gene, exons E25-1, E2-2, E2-1, I4-6 and I3-6 recorded negative SI values and exon E8-1 recorded a positive SI value. Exons E2-2 and E4-3 of IL6 recorded positive SI values. [Figure 5 ai-ii & bi-ii and figure S2]. Exons in MYLK and ANK3 showed significant change in splicing pattern but did not show a significant fold change in gene expression value. Microarray Detection of Alternative Splicing (MiDAS) p-value range of 0.01–0.04 was observed in the filtered results.
Table 3

Significant changes in exon level expression of driver genes.

ProbesetGeneIDExon IDRegulation CallSISI p-valueMiDAS p-value
2712254MUC4E25-1Upregulated−1.808732.79E-080.021609
3118824PTP4A3E3-2Upregulated−1.617119.95E-120.012046
2712381MUC4E2-2Upregulated−1.34777.19E-070.026458
3310161FGFR2E11-1Upregulated−1.296994.55E-080.023093
3310118FGFR2E29-1Upregulated−1.277231.86E-090.01652
3910391BCAS1E4-3Upregulated−1.251873.35E-070.021612
3031671ABP1E11-1Upregulated−1.249181.22E-100.014292
3310159FGFR2E12-1Upregulated−1.175742.50E-070.026414
2712382MUC4E2-1Upregulated−1.151521.33E-050.037457
3290897ANK3E60-1Upregulated−1.148251.19E-040.041986
2712354MUC4I4-6Upregulated−1.127433.39E-070.028327
2712361MUC4I3-6Upregulated−1.053748.25E-070.03115
3290925ANK3E53-5Upregulated−1.051772.60E-050.040567
3268341HTRA1E1-8Downregulated1.0073255.53E-100.016611
3310147FGFR2E17-1Downregulated1.0861542.78E-060.031111
3268338HTRA1E1-5Downregulated1.0862033.38E-110.013645
2388253EXO1E18-1Downregulated1.088063.11E-050.044937
3311901ADAM12E10-1Downregulated1.0897061.70E-090.018986
3910807AURKAE1-2Downregulated1.1053872.78E-070.026314
3820445ICAM1E3-3Downregulated1.2228863.84E-070.027735
2712282MUC4E8-1Downregulated1.2675741.63E-050.039593
2692532MYLKE13-1Downregulated1.2737644.27E-070.030277
3268334HTRA1E1-1Downregulated1.2880451.63E-120.011119
3310142FGFR2E20-1Downregulated1.4037021.07E-090.015743
3268335HTRA1E1-2Downregulated1.4047291.43E-120.011016
2992599IL6E4-3Downregulated1.4066594.60E-050.044146
3311894ADAM12E11-1Downregulated1.4205161.99E-100.016023
3311853ADAM12E21-2Downregulated2.0167341.96E-090.018149
2992593IL6E2-2Downregulated2.4488611.85E-070.027456

List of 29 exons corresponding to 13 genes are listed along with their respective splice index and MiDAS p values.

Probeset IDs are according to Affymetrix database

SI  =  Splice Index

MiDAS  =  Microarray Detection of Alternative Splicing

Figure 5

Differential Expression at exon level was observed in thirteen GISTIC genes.

Exon expression (i) and splice index (ii) values were mapped for both tumour and normal samples for twenty nine exons affecting thirteen genes. Exon 25-1 of MUC4 gene (a) shows highest negative splice index value (a ii) whereas exon 2-2 of IL6 (b) showed highest value of 2.44 (b ii). Exon expression and splice index values for rest 27 exons are provided as supplementary figure 1.

Differential Expression at exon level was observed in thirteen GISTIC genes.

Exon expression (i) and splice index (ii) values were mapped for both tumour and normal samples for twenty nine exons affecting thirteen genes. Exon 25-1 of MUC4 gene (a) shows highest negative splice index value (a ii) whereas exon 2-2 of IL6 (b) showed highest value of 2.44 (b ii). Exon expression and splice index values for rest 27 exons are provided as supplementary figure 1. List of 29 exons corresponding to 13 genes are listed along with their respective splice index and MiDAS p values. Probeset IDs are according to Affymetrix database SI  =  Splice Index MiDAS  =  Microarray Detection of Alternative Splicing

Causal Network analysis displays switch in transcription factors in tumor samples

Causal Network analysis displays switch in transcription factors in tumour samples. The most significant influence of TF genes in tumour and normal samples was reflected in the alteration in number of directed edges [Table 4]. CHAF1A, AHR, PRPF4B, ZNF200, SMAD2, RUVBL1, SMAD4, TSHZ1, CTCFL and CEBPE were among the top influencers. The regulatory hierarchy between these groups changed with respect to TFs. While RUVBL1 transformed into a master regulator in tumours, TSHZ1 has lost its capacity to master regulate other genes in tumours [Figure 6a]. Significant rearrangement in modularity is also observed between these two groups. More target genes function as modules in tumours as observed in modules regulated by AHR, CHAF1A and PRPF4B. Further, there is significant regulatory crosstalk among the genes in the normal group [Figure 6b]. While the tumours have lost the scale-free properties in its regulatory interaction, normal group exhibit approximate scale-free out degree distributions, signifying the potential of TF to regulate host of target genes. The tumours do not show this type of regulation which indicate unidirectional feed forward regulatory mode.
Table 4

Transcription factor genes showing significant change in their effect as represented by the change in number of outbound and inbound edges in tumour and normal samples.

Transcription FactorIn bound edges (Normal, Tumour)Out bound edges (Normal, Tumour)
CHAF1A 1,1 16,12
AHR 0,2 5,14
PRPF4B 2,1 11,14
ZNF200 1,2 3,9
SMAD2 3,1 13,8
RUVBL1 1,0 8,7
CEBPE 2,1 7,5
SMAD71,22,5
HIST1H1B0,12,4
MLL31,13,4
SMAD4 2,0 14,4
ZNF2171,16,4
ZNF4421,03,4
SPO110,03,3
NHLH20,11,2
TP531,14,2
CTCFL 1,1 6,1
SMAD30,04,1
TCF7L21,06,1
TSHZ1 0,1 12,1
Figure 6

Gene Regulatory Network Inference diagram for Tumor and Normal Groups.

A hierarchical network topology is used to visualize the degrees of interaction between transcription factor genes and target genes. (a) The inferred network for tumour group showing RUVBL1 as master regulator. (b) The inferred network for normal group showing TSHZ1as master regulator.

Gene Regulatory Network Inference diagram for Tumor and Normal Groups.

A hierarchical network topology is used to visualize the degrees of interaction between transcription factor genes and target genes. (a) The inferred network for tumour group showing RUVBL1 as master regulator. (b) The inferred network for normal group showing TSHZ1as master regulator.

Functional analysis of differentially expressed genes confirms their role in colorectal cancer and reveal important pathways and biomarkers

Gene Ontology enrichment of 760 differentially expressed genes obtained from SAM/LIMMA analyses showed cell division, mitosis and cell adhesion to be the most significant biological processes affected [figure S3]. Ingenuity pathway analysis of 760 genes showed cancer and gastrointestinal disease among the top functions followed by cellular movement growth and proliferation [Figure 7a]. NF-kB signalling, cell cycle G2/M DNA damage checkpoint regulation, colorectal cancer metastasis were among the top scoring pathways followed by agranulocyte adhesion [figure 7b]. TGFB1 was among the top upstream regulators. Network analysis suggests MYC, MMP and IL6 genes as important nodes [Fig 7c-e]. Biomarker analysis reveals 28 molecules relevant for colorectal cancer [Table 5]. INHBA, CLDN7 and MUC4 were eligible as biomarkers and are common with GISTIC, SAM and LIMMA analyses results.
Figure 7

Core Analysis of Differentially Expressed Genes using IPA.

Core analysis using IPA was carried out using set of 760 genes that were differentially expressed in tumour samples. Important biological functions (a) pathways (b) and networks (c-e) were revealed by this analysis.

Table 5

Biomarker molecules among the differentially expressed genes.

SymbolFold Changep-valueLocationType(s)
ANGPT22.0556.89E-14Extracellular Spacegrowth factor
CD341.1044.27E-10Plasma Membraneother
CD441.2569.28E-14Plasma Membraneenzyme
CDH131.3384.13E-11Plasma Membraneother
CLDN7−1.0301.81E-09Plasma Membraneother
CXCL101.0738.50E-03Extracellular Spacecytokine
EREG1.3522.39E-05Extracellular Spacegrowth factor
GDF151.1916.33E-06Extracellular Spacegrowth factor
IGF21.3983.85E-04Extracellular Spacegrowth factor
IL83.6412.55E-17Extracellular Spacecytokine
INHBA2.2251.71E-12Extracellular Spacegrowth factor
KRT20−2.0324.90E-12Cytoplasmother
MET1.0491.87E-07Plasma Membranekinase
MKI671.0151.58E-10Nucleusother
MMP21.7642.31E-09Extracellular Spacepeptidase
MUC1−1.0701.65E-05Plasma Membranetranscription regulator
MUC4−2.2643.71E-11Extracellular Spacegrowth factor
OSMR1.1664.95E-07Plasma Membranetransmembrane receptor
PDCD4−1.5729.64E-13Nucleusother
PDGFRB1.0901.47E-10Plasma Membranekinase
PDPN1.1365.95E-11Plasma Membranetransporter
PTGS22.3035.99E-09Cytoplasmenzyme
PTP4A31.1322.47E-12Plasma Membranephosphatase
SERPINE11.4184.29E-09Extracellular Spaceother
SPP12.6486.37E-10Extracellular Spacecytokine
TIMP11.9389.22E-13Extracellular Spaceother
TNS41.4863.15E-07Cytoplasmother
TSPAN8−1.0312.83E-09Plasma Membraneother

Fold change and p-values were calculated using Integromics biomarker suite.

Core Analysis of Differentially Expressed Genes using IPA.

Core analysis using IPA was carried out using set of 760 genes that were differentially expressed in tumour samples. Important biological functions (a) pathways (b) and networks (c-e) were revealed by this analysis. Fold change and p-values were calculated using Integromics biomarker suite.

Discussion

In this study we attempt to understand the transcription level changes in driver genes affected by SCNAs in colorectal cancer. We queried these genes from three perspectives using gene/exon/network level analysis tools. Our integrated analysis at genomic/transcriptomic levels resulted in finding genes of high priority that can be experimentally studied to establish their role in colorectal cancer. Functional significance of differentially expressed genes confirmed the outcome of our analyses. Due to the unavailability of high resolution cytogenetic arrays large size of chromosomal regions were implicated in causing colorectal cancer through copy number changes. The commercial SNP genotyping arrays focus on variants that are present in 5% or more of the population and feature a limited number of CNV probes. Therefore, sub microscopic structural variants are poorly captured by available SNP genotyping arrays that were designed to evaluate SNPs. The recent introduction of the Affymetrix CytoScan HD Array (CNV-targeted array), which is based on the validated Genome-Wide Human SNP Array 6.0 and contains more than 2.6 million markers for copy number variants and approximately 750,000 SNPs, has enabled the detection of copy number aberrations with high resolution across the genome [18]. Data from this platform was used to obtain the list of driver genes which could thus be considered to be most precise and accurate. Earlier, results from different groups often lead to a high level of discordance with an overlap of <5% in some cases. GISTIC analysis was able to address this issue and differentiate between passenger and driver mutations with a high level of precision and accuracy [6]. The correlation between expression and CN data is very complex [19] and is very much affected by the type of platform used for generating the data as well as the analysis strategies. Our analysis strategy aimed at reducing the confounding factors. We extracted DNA/RNA from the same piece of tissue in a single protocol [20]. We used matched paired tumour-normal control which is arguably the best way to do a comparative study [21]. Several approaches have been employed for cancer gene prioritization by integrative analysis [22], [23]. We employed a modified two-step approach by filtering the cancer genes using GISTIC and then carried out exon expression analysis.

Gene Level

The effect of chromosomal changes is not always direct. Global amplification at genomic level would result into higher level expression of selected genes [9]. Our gene level analysis showed only 20 of 144 driver genes to experience significant change at transcription level. 5 of these genes were listed among the top scoring driver genes. Our results correlate focal amplifications with increased expression levels and have been reported earlier using array CGH (aCGH) for FGFR2, GNAS and AURKA genes [24]. With an average amplicon size of 4.56 Mb, it could be misleading to report all affected genes/non-coding regions to be associated with CRC. FAM46C, EGFR2 and IL6 genes from our analysis have also been listed in the updated cancer gene census [5], [25]. BCAS1 gene that scored the highest in the GISTIC analysis was earlier reported to undergo alternative splicing and downregulation [26] which is consistent with our results. Upregulation of human SLC7A11 mRNA in stromal fibroblast cells from liver metastases is associated with metastatic colorectal cancer in human [27]. PTP4A3 reported to be significantly upregulated in this study has recently been implicated in colon tumorigenesis [28]. ATP8B1 is the gene that has not yet been reported to have any association with colorectal cancer and would be an important molecule for further studies. Our differential expression analysis of tumour and normal exon array datasets using two independent tools (SAM and LIMMA) showed an overlap of 15 genes with the driver genes. Analysis of>44,000 probes resulting in differentially expressed genes provide an unbiased approach and lends further confidence in the list of overlapping genes. Both LIMMA and SAM approaches generated same results at the functional level as evidenced by Ingenuity Pathway Analysis (IPA). ABP1, AGR3 and BCAS1 showed downregulation despite amplification at the genome level. This is supported by earlier studies for BCAS1in MCF7 cell line [29]. We observed that the influence of SMAD4 transcription factor was lost in tumour cells and may be responsible for this observation. Similar changes in transcription factors was observed for two other genes indicating the switching behaviour as discussed below.

Exon Level

Exon level analysis for measuring gene expression changes is challenging but rewarding. Even the improved algorithms have limitations in providing absolute quantification of the transcript levels. Earlier methods have found exon level data to be more informative about the nature and level of transcripts [30]. Now with the knowledge that more than 90% of all genes undergo alternative splicing to produce more than one transcript for a gene, the potential of exon level data is being realized more than ever [31]–[33]. Gene centric approach for carrying out integrated analysis using aCGH and exon array data has yielded more of confirmatory results and lack the use of full potential of whole genome arrays [34]. FGFR2 gene was shown to be amplified and upregulated which is misleading due to the truncated form of FGFR2. Wt FGFR2 gene was not measured and hence could not be compared with our study [34]. Exon level studies in colorectal cancer have been few and carry several limitations in terms of data analysis. Many of these studies compared tumour and normal from different sources [35]. Our results show a subset of differentially expressed genes to experience change in splicing pattern. 29 exons belonging to 13 genes showed significant splice index (SI) and MiDAS p-values. Both the values are strong indicators to measure alternative splicing. MUC4 is very well known to undergo alternative splicing and cause cancer [36] but the alternative splicing of IL6 is novel in association with CRC. ADAM12 that scored a high SI value has been implicated earlier in lung cancer [37]. 5 genes (ACTN1, CALD1, SLC3A2, CTTN and FN1) reported earlier as differentially spliced [38] have been found in the our study as well which used a different analysis strategy. ANK3 is known to use alternative transcription start sites in colorectal cancer [8] and could also be the mechanism for other gene MYLK for which we did not see a significant change in gene level expression.

Network Level

Pathways analysis has been used to measure the relevance of the genes affected by CNAs by creating networks among them [1]. However, these networks are limited in their useful interpretation owing to the absence of directionality. Our causal network analysis provides more useful information on the genes involved in these networks. We observed a significant difference in the number of target genes between tumour and normal. In case of genes found in the amplified region but were downregulated we observed a loss of the transcription factor regulation, whereas in the upregulated gene found in the deleted region there was a switch in the transcription factor. From the list of driver genes we chose the transcription factors and studied their change in behaviour in tumour samples. AHR has been established as a tumour suppressor gene in colon and other cancers [39]. This study further explains the enhanced role of AHR by the increased number of outbound (target) genes in tumour. Our study provides evidence that TSHZ1 loses its role as master regulator in normal cells while RUVBL1 assumes that role. These provide interesting opportunities for mechanistic studies of network/pathways affected in CRC. It has been envisaged through integrated studies that many different genomic alterations potentially dys-regulate the same pathways in complex diseases [40]. Further studies of the regulatory level changes in this study will be able to establish this concept in CRC. Functional role of the differentially expressed genes and the identification of MYC, MMP and IL6 as important nodes in the affected networks provide leads that need to be validated to establish their association with CRC. Biomarkers, especially MUC4 will be an important molecule to study mechanistically and establish their use in clinical studies. This study provides wealth of analyzed data and an enriched list of genes that can serve as potential clues to understand the biology of colorectal cancer. Significance of Microarray analysis. SAM analysis was performed using Integromics biomarker discovery suites on all samples. The results were complimentary to LIMMA analysis as reflected in the number of differentially expressed genes. (TIF) Click here for additional data file. Splice index and exon expression plots for remaining 11 genes. Splice index and exon expression values of all 13 genes that were found significant among the driver genes were plotted. Comparison of ‘Normal’ and ‘Tumor’ samples is depicted to observe the change in splice index as well the expression pattern at exon level. (DOCX) Click here for additional data file. Biological processes enrichment chart for differentially expressed genes. This bar plot shows the differentially expressed genes (as obtained from Integromics) are enriched in three functions viz., cell division, mitosis and cell adhesion. (PNG) Click here for additional data file. Types and stages of all the patient samples used in the study. (DOCX) Click here for additional data file. Fold change values of all 144 GISTIC genes as calculated by AltAnalyze Program. (TXT) Click here for additional data file.
  40 in total

1.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.

Authors:  Qun Pan; Ofer Shai; Leo J Lee; Brendan J Frey; Benjamin J Blencowe
Journal:  Nat Genet       Date:  2008-11-02       Impact factor: 38.330

2.  Identification of chromosomal copy number variations and novel candidate loci in hereditary nonpolyposis colorectal cancer with mismatch repair proficiency.

Authors:  Weixiang Chen; Lin Yuan; Ying Cai; Xiaocheng Chen; Yayun Chi; Ping Wei; Xiaoyan Zhou; Daren Shi
Journal:  Genomics       Date:  2013-02-20       Impact factor: 5.736

3.  Integrated analysis of whole genome exon array and array-comparative genomic hybridization in gastric and colorectal cancer cells.

Authors:  Kazuyuki Furuta; Tokuzo Arao; Kazuko Sakai; Hideharu Kimura; Tomoyuki Nagai; Daisuke Tamura; Keiichi Aomatsu; Kanae Kudo; Hiroyasu Kaneda; Yoshihiko Fujita; Kazuko Matsumoto; Yasuhide Yamada; Kazuyoshi Yanagihara; Masaru Sekijima; Kazuto Nishio
Journal:  Cancer Sci       Date:  2011-11-28       Impact factor: 6.716

4.  Alternative splicing of SLC39A14 in colorectal cancer is regulated by the Wnt pathway.

Authors:  Kasper Thorsen; Francisco Mansilla; Troels Schepeler; Bodil Øster; Mads H Rasmussen; Lars Dyrskjøt; Rotem Karni; Martin Akerman; Adrian R Krainer; Søren Laurberg; Claus L Andersen; Torben F Ørntoft
Journal:  Mol Cell Proteomics       Date:  2010-10-11       Impact factor: 5.911

Review 5.  Emerging roles of MUC4 in cancer: a novel target for diagnosis and therapy.

Authors:  Ajay P Singh; Pallavi Chaturvedi; Surinder K Batra
Journal:  Cancer Res       Date:  2007-01-15       Impact factor: 12.701

6.  Comparative analysis of algorithms for integration of copy number and expression data.

Authors:  Riku Louhimo; Tatiana Lepikhova; Outi Monni; Sampsa Hautaniemi
Journal:  Nat Methods       Date:  2012-02-12       Impact factor: 28.547

7.  Detection and measurement of alternative splicing using splicing-sensitive microarrays.

Authors:  Karpagam Srinivasan; Lily Shiue; Justin D Hayes; Ross Centers; Sean Fitzwater; Rebecca Loewen; Lillian R Edmondson; Jessica Bryant; Michael Smith; Claire Rommelfanger; Valerie Welch; Tyson A Clark; Charles W Sugnet; Kenneth J Howe; Yael Mandel-Gutfreund; Manuel Ares
Journal:  Methods       Date:  2005-12       Impact factor: 3.608

8.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.

Authors:  Craig H Mermel; Steven E Schumacher; Barbara Hill; Matthew L Meyerson; Rameen Beroukhim; Gad Getz
Journal:  Genome Biol       Date:  2011-04-28       Impact factor: 13.583

9.  Integration of DNA copy number alterations and transcriptional expression analysis in human gastric cancer.

Authors:  Biao Fan; Somkid Dachrut; Ho Coral; Siu Tsan Yuen; Kent Man Chu; Simon Law; Lianhai Zhang; Jiafu Ji; Suet Yi Leung; Xin Chen
Journal:  PLoS One       Date:  2012-04-23       Impact factor: 3.240

10.  Identifying causal genes and dysregulated pathways in complex diseases.

Authors:  Yoo-Ah Kim; Stefan Wuchty; Teresa M Przytycka
Journal:  PLoS Comput Biol       Date:  2011-03-03       Impact factor: 4.475

View more
  12 in total

1.  Molecular classification of colorectal cancer using the gene expression profile of tumor samples.

Authors:  Mamoon Rashid; Ramesh K Vishwakarma; Ahmad M Deeb; Mohamed A Hussein; Mohammad A Aziz
Journal:  Exp Biol Med (Maywood)       Date:  2019-05-15

2.  Multi-omics Integration Analysis Robustly Predicts High-Grade Patient Survival and Identifies CPT1B Effect on Fatty Acid Metabolism in Bladder Cancer.

Authors:  Venkatrao Vantaku; Jianrong Dong; Chandrashekar R Ambati; Dimuthu Perera; Sri Ramya Donepudi; Chandra Sekhar Amara; Vasanta Putluri; Shiva Shankar Ravi; Matthew J Robertson; Danthasinghe Waduge Badrajee Piyarathna; Mariana Villanueva; Friedrich-Carl von Rundstedt; Balasubramanyam Karanam; Leomar Y Ballester; Martha K Terris; Roni J Bollag; Seth P Lerner; Andrea B Apolo; Hugo Villanueva; MinJae Lee; Andrew G Sikora; Yair Lotan; Arun Sreekumar; Cristian Coarfa; Nagireddy Putluri
Journal:  Clin Cancer Res       Date:  2019-03-07       Impact factor: 12.531

3.  Long non-coding RNA LINC01296 is a potential prognostic biomarker in patients with colorectal cancer.

Authors:  Jia-Jun Qiu; Jing-Bin Yan
Journal:  Tumour Biol       Date:  2015-04-17

Review 4.  Colorectal cancer in Saudi Arabia as the proof-of-principle model for implementing strategies of predictive, preventive, and personalized medicine in healthcare.

Authors:  Mesnad Alyabsi; Abdulrahman Alhumaid; Haafiz Allah-Bakhsh; Mohammed Alkelya; Mohammad Azhar Aziz
Journal:  EPMA J       Date:  2019-08-31       Impact factor: 6.543

5.  Correction: Integrated exon level expression analysis of driver genes explain their role in colorectal cancer.

Authors: 
Journal:  PLoS One       Date:  2015-03-26       Impact factor: 3.240

6.  Elucidating the Reprograming of Colorectal Cancer Metabolism Using Genome-Scale Metabolic Modeling.

Authors:  Cheng Zhang; Mohammed Aldrees; Muhammad Arif; Xiangyu Li; Adil Mardinoglu; Mohammad Azhar Aziz
Journal:  Front Oncol       Date:  2019-07-30       Impact factor: 6.244

7.  Gene expression profile analysis of colorectal cancer to investigate potential mechanisms using bioinformatics.

Authors:  Yubin Kou; Suya Zhang; Xiaoping Chen; Sanyuan Hu
Journal:  Onco Targets Ther       Date:  2015-04-08       Impact factor: 4.147

8.  Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer.

Authors:  Musa Nur Gabere; Mohamed Aly Hussein; Mohammad Azhar Aziz
Journal:  Onco Targets Ther       Date:  2016-06-01       Impact factor: 4.147

9.  Effect of sampling procedure on the quality control metrics of cytoscan HD array for studying cytogenetic aspects of colorectal cancer.

Authors:  Ahmad M Deeb; Zeyad Yousef; Mishal Al-Johani; Mohammad Azhar Aziz
Journal:  Int J Health Sci (Qassim)       Date:  2018 Jul-Aug

10.  iGEMS: an integrated model for identification of alternative exon usage events.

Authors:  Sanjana Sood; Krzysztof J Szkop; Asif Nakhuda; Iain J Gallagher; Carl Murie; Robert J Brogan; Jaakko Kaprio; Heikki Kainulainen; Philip J Atherton; Urho M Kujala; Thomas Gustafsson; Ola Larsson; James A Timmons
Journal:  Nucleic Acids Res       Date:  2016-04-19       Impact factor: 19.160

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.