Literature DB >> 29182684

Colorectal cancer stages transcriptome analysis.

Tianyao Huo1, Ronald Canepa2, Andrei Sura1, François Modave1, Yan Gong3,4.   

Abstract

Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths in the United States. The purpose of this study was to evaluate the gene expression differences in different stages of CRC. Gene expression data on 433 CRC patient samples were obtained from The Cancer Genome Atlas (TCGA). Gene expression differences were evaluated across CRC stages using linear regression. Genes with p≤0.001 in expression differences were evaluated further in principal component analysis and genes with p≤0.0001 were evaluated further in gene set enrichment analysis. A total of 377 patients with gene expression data in 20,532 genes were included in the final analysis. The numbers of patients in stage I through IV were 59, 147, 116 and 55, respectively. NEK4 gene, which encodes for NIMA related kinase 4, was differentially expressed across the four stages of CRC. The stage I patients had the highest expression of NEK4 genes, while the stage IV patients had the lowest expressions (p = 9*10-6). Ten other genes (RNF34, HIST3H2BB, NUDT6, LRCh4, GLB1L, HIST2H4A, TMEM79, AMIGO2, C20orf135 and SPSB3) had p value of 0.0001 in the differential expression analysis. Principal component analysis indicated that the patients from the 4 clinical stages do not appear to have distinct gene expression pattern. Network-based and pathway-based gene set enrichment analyses showed that these 11 genes map to multiple pathways such as meiotic synapsis and packaging of telomere ends, etc. Ten of these 11 genes were linked to Gene Ontology terms such as nucleosome, DNA packaging complex and protein-DNA interactions. The protein complex-based gene set analysis showed that four genes were involved in H2AX complex II. This study identified a small number of genes that might be associated with clinical stages of CRC. Our analysis was not able to find a molecular basis for the current clinical staging for CRC based on the gene expression patterns.

Entities:  

Mesh:

Year:  2017        PMID: 29182684      PMCID: PMC5705125          DOI: 10.1371/journal.pone.0188697

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths in the United States [1]. Among the five subtypes of CRC (adenocarcinomas, carcinoid tumors, gastrointestinal stromal tumors, lymphomas and sarcomas), adenocarcinomas are the most common (95% of all CRCs). Currently the staging of CRC, referred to as clinical staging, is based on results of physical exams, biopsies, and imaging tests (CT or MRI scan, X-rays, PET scan, etc.). The criteria of staging are based on: 1) how far the cancer has grown into the wall of the intestine; 2) whether it has reached nearby structures; and 3) whether it has spread to the nearby lymph nodes or to distant organs. The results of surgery can be combined with clinical staging to determine the pathologic stages. The most often used CRC staging system is the AJCC cancer staging manual developed by American Joint Committee on Cancer (AJCC), based on conditions of primary tumor (T), regional lymph nodes (N) and distant metastasis (M) [2]. The earliest stage cancers are called stage 0, then range from stage I through IV, with additional sub-stages identified with the letters A, B and C [3]. Several genes, such as WNT, WAPK/PI3K, TGF-β, TP, have been associated with CRC. For instance, mutations in adenomatous polyposis col (APC) gene, a tumor suppressor gene, were found to be responsible for familial adenomatous polyposis and then further developed to CRC [4]. MisMatch Repair system genes such as MLH1 and MSH2 gene were found to be associated with Lynch syndrome, the most frequent form of hereditary CRC [5, 6]. Further, a 12-gene recurrence score assay has been developed as a prognostic factor in stage II-III colon or rectal carcinoma [7-9]. Even though many genes have been associated with an increased risk of CRC, the genetic differences across different stages of CRC have not been clearly identified. So far, only one study had assessed the gene expression levels of three candidate genes (MMP9, MMP28 and TIMP1) across CRC stages and found no statistically significant differences based on the stage of CRC [10]. There have been no studies in the literature comparing the gene expression levels in the entire transcriptome across CRC stages. The purpose of this study is to explore transcriptome-wide gene expression differences across different stages of CRC followed by gene ontology, gene set network analysis approaches based on the publicly available RNAseq dataset in The Cancer Genome Atlas (TCGA) [11].

Materials and methods

Data acquisition

The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) is a joint effort between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) to facilitate the sharing of data and speed up cancer research [11, 12]. The Eli and Edythe L. Broad (Broad) Institute of MIT and Harvard is a joint venture between both institutions and several area hospitals (https://www.broadinstitute.org/about-us). Their “FireHose” project ingests, aggregates, standardizes, and processes TCGA data via automated pipelines in an attempt to accelerate analysis and discoveries (https://confluence.broadinstitute.org/display/GDAC/Rationale). The Broad Institute has established pipelines for processing each TCGA dataset and the outputs from each stage of the pipeline are made available as a versioned set. Illumina HiSeq expression data was processed by Broad Institute to output both reads per kilobase per million mapped reads (RPKM) expression values [13] and RNA-seq by Expectation-Maximization (RSEM) values [14] normalized to “upper quartile count at 1000”. TCGA clinical data and expression data were manually downloaded from the Broad Institute (TCGA data version 2016_01_28) via the firebrowse.org website. (http://firebrowse.org/?cohort=COADREAD&download_dialog=true). The code used to download the data can be accessed here: https://github.com/indera/crc_transcriptome_analysis.

Data merging

Using Python 2.7.10 and version 0.19.0 of the Pandas module, the expression data from the Broad Institute was read into a Pandas dataframe, transposed, and re-saved. The clinical data were also transposed in the same manner. Additionally, in order to cut down on the size of the data and number of components of interest, only a subset of the columns from the clinical data were kept for the analysis. These included common demographic data such as patient gender, race, ethnicity, and age; clinical data such as cancer stage, associated International Classification of Diseases (ICD) 10 codes, presence of polyps, whether analysis had been done for common mutations such as KRAS and BRAF; and finally, approximately 85 different aliquot identifiers from the TCGA dataset itself. Matching of clinical data with expression data was performed using TCGA's "hybridization REF" identifier from the expression data and searching against the aliquot identifiers present in the clinical data. Eventually, 377 patients with gene expression data from 20,532 genes were included in the final analysis.

Differential expression analysis

Gene expression differences were evaluated across the disease stages using linear regression. The standard deviation of the gene expression level for each gene was computed. The genes with standard deviation of zero, which indicates no change in the gene expression, were removed from further analysis. To select top genes that are differentially expressed across cancer stages, a linear regression model was performed for each gene to test the trend in gene expression with increasing cancer stages. The analyses adjusted for age, gender and race/ethnicity of the patients. Genes with p ≤0.0001 were considered suggestive and the expression level by cancer stages were presented for these genes. Analyses were performed using R version 3.3.1 and SAS 9.4 (Cary, NC).

Principal component analysis

In order to identify gene expression pattern of the selected CRC samples across different stages, all the genes with p≤0.001 in the linear model analysis were included in the principal component analysis using SAS. Ten principal components (PCs) were identified and the first two PCs were plotted according to the staging status of the CRC patients.

Gene annotation and gene set enrichment analysis

Genes with expression difference of p ≤ 0.0001 were evaluated further in gene annotation using DAVID [15]. Then the gene IDs and official gene names were used for further analysis. ConsensusPathDB tool [16, 17] was then used to perform network-based and pathway-based analyses on these top genes. ConsensusPathDB consists of a comprehensive collection of human, mouse and yeast molecular interaction data integrated from 32 different public repositories and a web interface with a set of computational methods and visualization tools to explore these data (http://consensuspathdb.org). This tool applies computational methods for statistical over-representation and enrichment analysis and reports network modules, pathways and functional information that are significantly enriched by any given gene list. ConsensusPathDB provides 4 types of predefined annotation gene sets: neighborhood-based entity sets (NESTs) which includes protein-protein interactions, biochemical interactions, gene regulatory and genetic interactions, protein complexes, pathways (including metabolic, signaling and gene regulatory pathways) and GO terms [16]. For computing the significance of the enrichment of the annotation sets with respect to user-input gene list, this tool applies Wilcoxon’s matched-pairs signed-rank test.

Results

Demographics

The TCGA database contains clinical information for 629 patients but only 396 unique patients have both gene expression data and clinical data. The numbers of patients with CRC in stage I through IV were 59, 147, 116 and 55 respectively and 19 patients did not have stage information and there were no patients in the stage 0. The mean age of these patients was 64 ± 12 years. Further, 46.4% were women, 69.2% were white, 16.2% were Black/African American, 14.6% were Asian, American Indian/Alaska Native and of unspecified race, and 1.1% were Hispanics. From a clinical standpoint, 76.7% had colon cancer and 23.3% had rectal cancer. The demographic and relevant clinical information of these patients stratified by CRC stage are summarized in The final analysis included 377 patients with clinical data including staging information and gene expression in 20,532 genes. SD: standard deviation. Continuous variables were summarized as mean and SD and categorical variables were summarized as number (%).

Linear model for gene expression

Eleven genes had p ≤0.0001 in the differential gene expression analysis according to the clinical staging. NEK4 gene, which encodes for NIMA related kinase 4, was differentially expressed across the four stages of CRC. The samples from the stage I patients had the highest expression of NEK4 genes, while the stage IV had the lowest expressions (p = 4.50*10−6) (, ). Ten other genes had p value of 0.0001 in the unadjusted differential expression analysis including two with decreasing gene expression levels in more advanced CRC stages (RNF34 and NUDT6) and eight with increasing gene expression levels in more advanced CRC stages (LRCH4, HIST3H2BB, SPSB3, HIST2H4A, TMEM79, AMIGO2, GLB1L and C20orf135) (). Principal component analysis result indicated that the first principal component (PC1) explained 16% of the variability, while PC2 explained 9.7% and PC3 explained 4.8% of the variability in the gene expression data in all the CRC samples. illustrated PC1 vs. PC2 for all the CRC samples across four stages. The samples from these four stages do not appear to have distinct gene expression patterns.

Gene annotation and network-based analysis

Network analysis showed that the top eleven genes map to multiple pathways such as meiotic synapsis and packaging of telomere ends, etc. (). Ten of these 11 genes were linked to Gene Ontology (GO) terms such as nucleosome, DNA packaging complex and protein-DNA interactions (). The protein complex-based gene set analysis showed that four genes were involved in H2AX complex II with q value of 5.72*10–5 (). The enriched neighborhood based sets analysis of these 11 genes () identified CDC like kinase 2 be connected with most genes (386 genes) in the neighborhood. RNF4 and RNF8 genes, in the same family as one of the top genes (RNF34), were also well-connected with multiple genes in pathways. Finally, the induced network module analysis identified several genes with gene protein interaction: HIST2H4A, HIST3H2BB, LRCH4 and NUDT6 ().

Discussion

Using publically available data from TCGA, this study explored the gene expression differences across four stages of CRC. We found that eleven genes showed suggestive level of evidence for differential expression in a linear fashion. These genes map to multiple pathways and were linked to GO terms. Further, several few genes were enriched in protein complexes. However, a principal component analysis was not able to identify a molecular basis for the current CRC staging process. This might be due to the following: 1) due to the limitation of publically available data, our study was not able to compare the gene expression data from different CRC stages with a normal control; 2) the CRC staging system currently uses the size of lesion for staging, not molecular basis; and 3) the principal component analysis was able to cover only ~30% of the variance in the gene expression data. Such analysis has not been done previously in the literature. Among the genes with suggestive level of significance, only a few had possible link with cancer in the literature. The gene with the strongest p value for differential expression by stage is gene, which encodes NIMA related kinase 4, a serine/threonine protein kinase required for normal entry info replicative senescence. In cell culture, suppression of NEK4 doubled the number of replications needed to reach senescence, reduced cellular reactions to double-stranded DNA damage in both recruitment of repair proteins and arresting of further cell divisions, and also reduced activity of the p53 tumor suppressor protein [18]. Our study suggested that the CRC patients in the higher stages have lower NEK4 gene expression compared to lower stages, this is consistent with the direction shown in tissue culture [18] that lower expression was associated with worse diagnosis. gene, which encodes ring finger protein 34, was first known and characterized as hRFI (human ring finger homologous to inhibitor of apoptosis protein type) in 2005, was shown to have anti-apoptotic properties [19], and later was shown to also play a role in regulation of p53 via ubiquitination and subsequent proteasomal degradation [20]. Overexpression of this gene was shown to confer the resistance to 5-fluorouracil-induced apoptosis in colorectal cancer cells via activation of NF-kappaB and upregulation of BCL-2 and BCL-XL [21]. In our study, had lower expression in those in the more advanced clinical stages of CRC patients. This seems indicate that more advanced CRC patients may be more sensitive to 5-fluorouacil treatment compared to patients in earlier stages, but this is outside the scope of our study. However, it is worth noting that 5-fluorouacil is currently recommended as one of the adjuvant chemotherapy agents for stage III and high-risk stage II colon cancer patients [22]. and , both encoding histone proteins, were also among the top differentially expressed genes, increasing in expression with increasing cancer stages. Eukaryotic DNA that is not currently being replicated is stored in a wrapped and coiled form around four pairs of histone proteins that provide support for the coiled DNA. Histones are also sensitive to post-translational modification, such as acetylation and deacetylation, which the cells use to help regulate transcription [23]. A direct link to the role of increased histone protein expression isn’t clear, perhaps further examination of co-expression levels of histone acetyltransferases and deacetylases would suggest a link. Members of the gene family exhibit behaviors that include controlling the level of cellular metabolites and signaling compounds as well as degrading “potentially mutagenic” oxidized nucleotides” [24]. The trend of downregulation of this gene across cancers stages would indeed contribute to the ability of cancer cells to continue to grow, divide, and evade normal cellular precautions. gene encodes leucine rich repeats and calponin homology domain containing 4, which is a protein that contains leucine-rich repeats at its amino terminus and that is known to be involved in ligand binding. , which encodes adhesion molecule with Ig like domain 1, is a leucine-rich repeat family member. mRNA was found to be differentially expressed in near half of cancer vs. normal tissue from gastric adenocarcinoma patients [25]. In an antisense study, it was found that the inhibition of AMIGO2 expression negatively impact tumor growth and altered chromosomal stability [25]. Our study has some limitations: 1). TCGA CRC data only included data on samples from cancer patients, therefore the only analysis we could perform was within cancer samples and using controls from a different source would bring too much confounding. 2) The data from TCGA had many field with missing information, such as medication information, which may be altering gene expression in some of the genes or loci of interest. Therefore, no meaningful analysis can be performed with the medication data. In conclusion, our study identified several genes that might be associated with clinical stages of CRC. Our analysis also suggests that the current clinical staging might not have molecular basis according to the gene expression patterns.

Result of the enriched neighborhood-based sets (NESTs).

(PDF) Click here for additional data file.

Induced network module analysis.

(PDF) Click here for additional data file.

Enriched pathway-based sets.

(PDF) Click here for additional data file.

Enriched gene ontology-based sets.

(PDF) Click here for additional data file.

Enriched protein complex-based sets.

(PDF) Click here for additional data file.
Table 1

Demographics of patients by CRC cancer stages.

CharacteristicStage IStage IIStage IIIStage IVTotal
59(14.89%)147(37.12%)116(29.29%)55(13.88%)377(100%)
Age Mean, SD65 ± 1267 ± 1263 ± 1360 ± 1364 ± 13
Height Mean, SD (cm)172 ± 10.8166.9 ± 12.8169.0 ± 10.8171.8 ± 10.9169.1 ± 11.8
Weight (Kg)83.1 ± 19.777.8 ± 23.381.4 ± 20.180.6 ± 17.780.3 ± 21.2
BMI28.128.028.527.328.1
Sex
    Female25725424175 (46.4%)
    Male34756231202 (53.6%)
Vital Status
    Alive5713310339332 (88.1%)
    Dead214131645 (11.9%)
Race
    White43938639261 (69.2%)
    Black/African American820221161 (16.2%)
    Other8348555 (14.6%)
Ethnicity
    Hispanic or Latino01124 (1.1%)
    Not Hispanic or Latino4912110547322 (85.4%)
    Other102510651 (13.5%)
Cancer Type
    Colon461198341289 (76.7%)
    Rectal1328331488 (23.3%)

SD: standard deviation. Continuous variables were summarized as mean and SD and categorical variables were summarized as number (%).

Table 2

The top genes in the linear regression analysis.

GeneGene IDGene Full NameP (unadjusted)P (adjusted)
NEK46787NIMA related kinase 49.00E-064.50E-06
LRCH44034leucine rich repeats and calponin homology domain containing 41.00E-042.40E-05
HIST3H2BB128312histone cluster 3 H2B family member b1.00E-048.90E-05
SPSB390864splA/ryanodine receptor domain and SOCS box containing 31.00E-041.33E-04
HIST2H4A8370histone cluster 2 H4 family member a1.00E-041.50E-04
TMEM7984283transmembrane protein 791.00E-041.71E0-4
AMIGO2347902adhesion molecule with Ig like domain 21.00E-041.71E0-4
GLB1L79411galactosidase beta 1 like1.00E-042.00E-04
RNF3480196ring finger protein 341.00E-042.23E-04
C20orf135140701 1.00E-042.40E-04
NUDT611162nudix hydrolase 61.00E-043.10E-04
  25 in total

Review 1.  The Nudix hydrolase superfamily.

Authors:  A G McLennan
Journal:  Cell Mol Life Sci       Date:  2006-01       Impact factor: 9.261

Review 2.  Blinded by the Light: The Growing Complexity of p53.

Authors:  Karen H Vousden; Carol Prives
Journal:  Cell       Date:  2009-05-01       Impact factor: 41.582

3.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

Review 4.  Histone acetylation in chromatin structure and transcription.

Authors:  M Grunstein
Journal:  Nature       Date:  1997-09-25       Impact factor: 49.962

5.  Overexpression of hRFI (human ring finger homologous to inhibitor of apoptosis protein type) inhibits death receptor-mediated apoptosis in colorectal cancer cells.

Authors:  Tsuyoshi Konishi; Shin Sasaki; Toshiaki Watanabe; Joji Kitayama; Hirokazu Nagawa
Journal:  Mol Cancer Ther       Date:  2005-05       Impact factor: 6.261

6.  Cancer Statistics, 2017.

Authors:  Rebecca L Siegel; Kimberly D Miller; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2017-01-05       Impact factor: 508.702

Review 7.  The APC gene in colorectal cancer.

Authors:  R Fodde
Journal:  Eur J Cancer       Date:  2002-05       Impact factor: 9.162

8.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

Review 9.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.

Authors:  Katarzyna Tomczak; Patrycja Czerwińska; Maciej Wiznerowicz
Journal:  Contemp Oncol (Pozn)       Date:  2015

10.  Germline deletions in the EPCAM gene as a cause of Lynch syndrome - literature review.

Authors:  Katarzyna Tutlewska; Jan Lubinski; Grzegorz Kurzawski
Journal:  Hered Cancer Clin Pract       Date:  2013-08-12       Impact factor: 2.857

View more
  13 in total

1.  LRCH1 deficiency enhances LAT signalosome formation and CD8+ T cell responses against tumors and pathogens.

Authors:  Chang Liu; Xiaoyan Xu; Lei Han; Xiaopeng Wan; Lingming Zheng; Chunyang Li; Zhaohui Liao; Jun Xiao; Ruiyue Zhong; Xin Zheng; Qiong Wang; Zonghai Li; Hualan Chen; Bin Wei; Hongyan Wang
Journal:  Proc Natl Acad Sci U S A       Date:  2020-07-29       Impact factor: 11.205

2.  In vivo selection of highly metastatic human ovarian cancer sublines reveals role for AMIGO2 in intra-peritoneal metastatic regulation.

Authors:  Yueying Liu; Jing Yang; Zonggao Shi; Xuejuan Tan; Norman Jin; Catlin O'Brien; Connor Ott; Anna Grisoli; Eric Lee; Kelly Volk; Meghan Conroy; Emily Franz; Annamarie Bryant; Leigh Campbell; Brian Crowley; Stephen Grisoli; Aris T Alexandrou; Chunyan Li; Elizabeth I Harper; Marwa Asem; Jeff Johnson; Annemarie Leonard; Katie Santanello; Ashley Klein; Qingfei Wang; Siyuan Zhang; Tyvette S Hilliard; M Sharon Stack
Journal:  Cancer Lett       Date:  2021-01-30       Impact factor: 8.679

3.  Colon Cancer Progression Is Reflected to Monotonic Differentiation in Gene Expression and Pathway Deregulation Facilitating Stage-specific Drug Repurposing.

Authors:  Marilena M Bourdakou; George M Spyrou; George Kolios
Journal:  Cancer Genomics Proteomics       Date:  2021 Nov-Dec       Impact factor: 4.069

4.  Analysis of the autophagy gene expression profile of pancreatic cancer based on autophagy-related protein microtubule-associated protein 1A/1B-light chain 3.

Authors:  Yan-Hui Yang; Yu-Xiang Zhang; Yang Gui; Jiang-Bo Liu; Jun-Jun Sun; Hua Fan
Journal:  World J Gastroenterol       Date:  2019-05-07       Impact factor: 5.742

5.  Identification of a 6-gene signature predicting prognosis for colorectal cancer.

Authors:  Shuguang Zuo; Gongpeng Dai; Xuequn Ren
Journal:  Cancer Cell Int       Date:  2019-01-05       Impact factor: 5.722

Review 6.  Distant Metastasis in Colorectal Cancer Patients-Do We Have New Predicting Clinicopathological and Molecular Biomarkers? A Comprehensive Review.

Authors:  Stanislav Filip; Veronika Vymetalkova; Jiri Petera; Ludmila Vodickova; Ondrej Kubecek; Stanislav John; Filip Cecka; Marketa Krupova; Monika Manethova; Klara Cervena; Pavel Vodicka
Journal:  Int J Mol Sci       Date:  2020-07-24       Impact factor: 5.923

7.  AMIGO2 as a novel indicator of liver metastasis in patients with colorectal cancer.

Authors:  Akimitsu Tanio; Hiroaki Saito; Masataka Amisaki; Kazushi Hara; Ken Sugezawa; Chihiro Uejima; Yoichiro Tada; Kyoichi Kihara; Manabu Yamamoto; Kanae Nosaka; Ryo Sasaki; Mitsuhiko Osaki; Futoshi Okada; Yoshiyuki Fujiwara
Journal:  Oncol Lett       Date:  2021-02-10       Impact factor: 2.967

8.  Negative Regulation of ULK1 by microRNA-106a in Autophagy Induced by a Triple Drug Combination in Colorectal Cancer Cells In Vitro.

Authors:  Rebeca Salgado-García; Jossimar Coronel-Hernández; Izamary Delgado-Waldo; David Cantú de León; Verónica García-Castillo; Eduardo López-Urrutia; Ma Concepción Gutiérrez-Ruiz; Carlos Pérez-Plasencia; Nadia Jacobo-Herrera
Journal:  Genes (Basel)       Date:  2021-02-09       Impact factor: 4.096

Review 9.  Checking NEKs: Overcoming a Bottleneck in Human Diseases.

Authors:  Andressa Peres de Oliveira; Luidy Kazuo Issayama; Isadora Carolina Betim Pavan; Fernando Riback Silva; Talita Diniz Melo-Hanchuk; Fernando Moreira Simabuco; Jörg Kobarg
Journal:  Molecules       Date:  2020-04-13       Impact factor: 4.411

Review 10.  Insulin-Like Growth Factor 2 (IGF2) Signaling in Colorectal Cancer-From Basic Research to Potential Clinical Applications.

Authors:  Aldona Kasprzak; Agnieszka Adamek
Journal:  Int J Mol Sci       Date:  2019-10-03       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.