Literature DB >> 21483658

Systematic enrichment analysis of gene expression profiling studies identifies consensus pathways implicated in colorectal cancer development.

Jesús Lascorz1, Kari Hemminki, Asta Försti.   

Abstract

BACKGROUND: A large number of gene expression profiling (GEP) studies on colorectal carcinogenesis have been performed but no reliable gene signature has been identified so far due to the lack of reproducibility in the reported genes. There is growing evidence that functionally related genes, rather than individual genes, contribute to the etiology of complex traits. We used, as a novel approach, pathway enrichment tools to define functionally related genes that are consistently up- or down-regulated in colorectal carcinogenesis.
MATERIALS AND METHODS: We started the analysis with 242 unique annotated genes that had been reported by any of three recent meta-analyses covering GEP studies on genes differentially expressed in carcinoma vs normal mucosa. Most of these genes (218, 91.9%) had been reported in at least three GEP studies. These 242 genes were submitted to bioinformatic analysis using a total of nine tools to detect enrichment of Gene Ontology (GO) categories or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. As a final consistency criterion the pathway categories had to be enriched by several tools to be taken into consideration.
RESULTS: Our pathway-based enrichment analysis identified the categories of ribosomal protein constituents, extracellular matrix receptor interaction, carbonic anhydrase isozymes, and a general category related to inflammation and cellular response as significantly and consistently overrepresented entities.
CONCLUSIONS: We triaged the genes covered by the published GEP literature on colorectal carcinogenesis and subjected them to multiple enrichment tools in order to identify the consistently enriched gene categories. These turned out to have known functional relationships to cancer development and thus deserve further investigation.

Entities:  

Keywords:  Carcinogenesis; colorectal cancer; enrichment analysis; gene expression profiling

Year:  2011        PMID: 21483658      PMCID: PMC3072670          DOI: 10.4103/1477-3163.78268

Source DB:  PubMed          Journal:  J Carcinog        ISSN: 1477-3163


BACKGROUND

Colorectal cancer (CRC) is the third most common cancer, comprising 9.7% of all cancer cases, and is the fourth leading cause of cancer death worldwide, accounting for 8% of all cancer deaths.[1] Many gene expression profiling (GEP) studies on colorectal carcinogenesis have been performed in the last decade using microarray technology. However, comparative analysis of the differentially expressed genes reported by independent studies shows a relatively limited degree of overlap, and no reliable biomarker profile discriminating cancerous from normal tissue has been identified. The majority of the published GEP studies on colorectal carcinogenesis has already been subjected to meta-analyses that have aimed at establishing consistent signature profiles for tumor development.[2-4] These meta-analyses have collected published lists of differentially expressed genes from the original GEP studies comparing CRC to normal tissue and then selected the genes reported in multiple studies. The genes reported only sporadically are thought to have resulted from inherent noise or biases in the different platforms and analysis methods employed.[5] The consistently reported genes are considered to be biologically relevant to CRC. There is an increasing interest in searching for networks of genes, instead of single genes, contributing to the etiology of complex diseases, since changes in biological characteristics require coordinate variation in expression of gene sets.[6] Enrichment analysis tools, which estimate overrepresentation of particular gene categories or pathways in a gene list, are a useful approach in this direction. Our goal was to define functional categories [Gene Ontology (GO) terms or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways] that are consistently overrepresented among differentially expressed genes inferred from the published GEP studies on colorectal carcinogenesis. We collected the list of genes from three published meta-analyses and used them as an input list for an overrepresentation analysis with several independent enrichment tools, which are based on diverse statistical and bioinformatic algorithms.[7] The strategy of applying multiple tools is recommended for the most satisfactory results.[8] The stringent selection criteria for the genes to be analyzed and the requirement for concordance between enrichment analysis results helped us to identify consistently enriched gene categories of likely relevance in colorectal carcinogenesis.

MATERIALS AND METHODS

Gene expression profiling studies

We collected data from three meta-analyses, covering 34 GEP studies on the colorectal carcinogenesis process, published between the years 2001 and 2007.[2-4] Two of the meta-analyses reported a list of genes which had a consistent direction in gene expression change between carcinoma and normal mucosa in at least three single GEP studies,[23] while the threshold was two GEP studies in the oldest meta-analysis[4] [Table 1].
Table 1

Three meta-analyses of gene expression profiling studies on CRC carcinogenesis process

Three meta-analyses of gene expression profiling studies on CRC carcinogenesis process

Gene list collection

For the meta-analysis by Sagynaliev et al.[4] we used Entrez Gene from NCBI (www.ncbi.nlm.nih.gov/gene/), and the Gene ID conversion tool from the DAVID bioinformatics resources[9] to convert the reported gene identifiers into the official HUGO gene symbol, which was used as the identifier for the reported genes. Next, the three gene lists from the three meta-analyses were combined, resulting in a list of 242 unique annotated genes [Table 2].
Table 2

The list of 242 unique, annotated genes reported in the three meta-analyses of GEP studies on CRC carcinogenesis used for enrichment analyses

The list of 242 unique, annotated genes reported in the three meta-analyses of GEP studies on CRC carcinogenesis used for enrichment analyses

Enrichment analysis

We performed enrichment analyses using the databases GO (Biological Process and Molecular Function),[10] and KEGG pathways.[11] For all enrichment tools, the input gene set consisted of the same 242-gene list. The nine selected enrichment software tools differed in the statistical model applied for the enrichment analysis and in the method of correction for multiple testing [Table 3]. The tools were used with the default options: significance threshold of 0.05 for adjusted P value, at least two genes from the input list in the enriched category, and the whole genome as the reference background. For GATHER, the recommended ln(Bayes factor) >6 was used as the significance threshold.
Table 3

Enrichment tools used and their characteristics

Enrichment tools used and their characteristics

Consistently enriched categories

We considered only the GO or KEGG categories reported to be significantly enriched by several enrichment tools as consistently overrepresented in the 242-gene list. This strategy, based on testing multiple tools, is recommended in order to obtain the most satisfactory results.[8] We selected as a threshold the number of tools reporting at least four common enriched categories, so that only top-ranked categories were finally considered. This threshold was five enrichment tools for GO Biological Process, six enrichment tools for GO Molecular Function, and three enrichment tools for KEGG pathways [Table 4].
Table 4

Number of overrepresented GO and KEGG categories in the 242-gene list for each of the enrichment tools used

Number of overrepresented GO and KEGG categories in the 242-gene list for each of the enrichment tools used

RESULTS

Data collection and gene selection

A total of 242 unique mapped genes [Table 2] were reported in at least one of the three meta-analyses (65 of them in two and 26 in all three meta-analyses), 145 (59.9%) of the genes were up-regulated and 97 (40.1%) down-regulated in cancer vs normal tissue. Twenty-four of the 242 genes (9.9%) had been reported by two single GEP studies and 218 genes (90.1%) by at least three single GEP studies.

Enrichment analyses

Nine enrichment tools were used to obtain significantly overrepresented categories (GO Biological Process, GO Molecular Function, and KEGG pathways) [Table 5].
Table 5A

Results of all enrichment tools used with the 242 gene list: Gene ontology biological process categories

Results of all enrichment tools used with the 242 gene list: Gene ontology biological process categories Results of all enrichment tools used with the 242 gene list: Gene ontology molecular function categories Results of all enrichment tools used with the 242 gene list: KEGG pathway categories

Identification of consistently enriched categories

The number of reported enriched categories showed considerable variability with the different tools used [Table 4] even though the same significance threshold (P<.05 after correction for multiple testing) and analysis conditions (whole genome as the reference background and at least two genes from the input list in the enriched category) were applied. Differences were also observed in the number of genes in a particular category and the enrichment P values reported by each tool [Table 5]. To avoid false positives among the varying results, only the categories reported to be enriched by several tools (five enrichment tools for GO Biological Process, six for GO Molecular Function, and three for KEGG pathways) were considered to be consistently enriched. Using this selection criteria, ten general GO Biological Process categories (cell proliferation, inflammatory response, multicellular organismal metabolic process, regulation of cell proliferation, response to chemical stimulus, response to external stimulus, response to nutrient, response to stress, response to wounding, and translational elongation); five GO Molecular Function categories (carbonate dehydratase activity, cytokine activity, extracellular matrix binding, receptor binding, and structural constituent of ribosome); and four KEGG pathways (extracellular matrix receptor interaction, focal adhesion, nitrogen metabolism, and ribosome) were consistently overrepresented in the 242 gene list [Table 6]. The ratio of enrichment was higher for the more specific and well-defined KEGG pathways than for the broad GO categories [Figure 1]. A very high overlap of the individual genes among these categories was also observed [Table 7]. Based on this overlap, four biologically meaningful category groups were finally obtained:
Table 6

Consistently enriched GO and KEGG categories

Figure 1

Bar chart of enrichment ratios for GO and KEGG categories in the 242-gene list. Ratio of enrichment = the number of observed genes divided by the number of expected genes from each GO or KEGG category in the 242-gene list (according to WebGestalt or, alternatively, DAVID or GOTM tools). GO BP: Gene Ontology Biological Process; GO MF: Gene Ontology Molecular Function; KEGG: Kyoto Encyclopedia of Genes and Genomes.

Table 7

Overlap of the genes from the consistently enriched GO and KEGG categories

Consistently enriched GO and KEGG categories Bar chart of enrichment ratios for GO and KEGG categories in the 242-gene list. Ratio of enrichment = the number of observed genes divided by the number of expected genes from each GO or KEGG category in the 242-gene list (according to WebGestalt or, alternatively, DAVID or GOTM tools). GO BP: Gene Ontology Biological Process; GO MF: Gene Ontology Molecular Function; KEGG: Kyoto Encyclopedia of Genes and Genomes. Overlap of the genes from the consistently enriched GO and KEGG categories Seventeen common genes included in the GO Biological Process translational elongation, the GO Molecular Function structural constituent of ribosome, and the KEGG pathway ribosome. Genes in the two KEGG pathways extracellular matrix receptor interaction and focal adhesion that were also included in the broad categories of GO Molecular Function receptor binding and GO Biological Process response to external stimulus. The five genes included in both the GO Molecular Function category carbonate dehydratase activity and the KEGG pathway nitrogen metabolism. A large group of seven general GO Biological Process categories (inflammatory response, response to chemical stimulus, response to external stimulus, response to nutrient, response to stress, and response to wounding), together with two general GO Molecular Function categories (cytokine activity and receptor binding).

DISCUSSION

The large number of microarray studies on colorectal carcinogenesis has shown a low degree of overlap in the identified genes. We extracted the 242 unique genes reported in three meta-analyses of GEP studies on colorectal carcinogenesis.[2-4] Only the meta-analysis by Cardoso et al.[2] includes a descriptive exploration of the main GO categories present among the differentially expressed genes. In an attempt to overcome the known lack of reproducibility at individual gene level among the GEP studies, we used up to nine bioinformatic enrichment tools to statistically determine which GO categories or KEGG pathways were significantly overrepresented in the 242-gene list. A total of 34 independent GEP studies were included in the three meta-analyses. Most of them used whole-genome expression arrays, which include probes for expression analysis of thousands of genes. Thus, we used all genes in the genome as background for the enrichment analysis. Although this might be an overestimation, the heterogeneity in the number of genes interrogated in every single one of the 34 GEP experiments does not allow application of a more appropriate restricted background. We believe that our rigorous strategy for the selection of enriched categories overcomes the forced probable overestimation of the reference background. After application of rigorous selection criteria, a total of 19 categories (15 GO terms and 4 KEGG pathways) were considered as consistently overrepresented. When considering the individual genes from each of these 19 categories, a very high degree of overlap among the categories was observed, reducing the number of categories with biological significance to four clearly different groups. First, the same 17 ribosomal proteins (RPs) were present in the GO Biological Process translational elongation, the GO Molecular Function structural constituent of ribosome, and the KEGG pathway ribosome (RPL3, RPL6, RPL7, RPL8, RPL18A, RPL23, RPL29, RPL30, RPL31, RPLP2, RPSA, RPS2, RPS5, RPS7, RPS18, RPS19, and RPS23) [Figure 2]. All of them showed increased expression in tumor vs normal tissue. It is known that different expression patterns of RPs exist in CRC. Also, ribosomal biogenesis has clearly been linked to cancer[12] and several studies have pointed out two possible functions of RPs in colorectal carcinogenesis: perturbation of their function in protein biosynthesis and direct influence in tumorigenesis through extraribosomal functions (summarized in Lai et al.[13] ). Second, the KEGG terms extracellular matrix receptor interaction and focal adhesion shared nine genes (COL1A1, COL1A2, COL3A1, COL4A1, COL11A1, FN1, ITGA2, SPP1, and THBS2) [Figure 3]. Specific interactions of the extracellular matrix molecules control cellular activities such as adhesion, differentiation, apoptosis, and proliferation.[14] Third, the GO category carbonate dehydratase activity and the KEGG pathway nitrogen metabolism included the same five carbonic anhydrase (CA) isozymes (CA1, CA2, CA4, CA7, and CA12) [Figure 4]. All five mRNAs are down-regulated in CRC compared to normal tissue, as also shown in another study for CA2 and CA12.[15] Recent data have confirmed the functional contribution of CAs, especially CA9 and CA12, to hypoxic tumor growth and progression.[16] Inhibition of CA9, which is overexpressed in many tumor types in response to the hypoxia inducible factor (HIF) pathway, is being tested as anticancer therapeutic strategy.[17] Finally, a very general group of GO categories related to inflammation and cellular response included a large number of genes (between 14 and 59). Interestingly, this category included two genes that have been identified through genome-wide association studies as low-risk inherited genetic variants contributing to CRC risk.[18] These genes, the proto-oncogene MYC (8q24) and the bone morphogenetic protein gene BMP4 (14q22.2), were up-regulated in carcinoma tissue. Thus, judging by the functional class of the genes from the identified enriched categories, they look promising candidates for studies aimed at investigating their possible influence in CRC development.
Figure 2

Representation of the KEGG ribosome category (map03010), with the 17 genes from the 242 gene list indicated in red

Figure 3

Representation of the KEGG extracellular matrix receptor interaction category (map04512), with location of the ten genes from the 242 gene list indicated in red.

Figure 4

Representation of the KEGG nitrogen metabolism category (map00910), with location of the reaction catalyzed by the five carbonic anhydrase isozymes from the 242 gene list indicated in red

Representation of the KEGG ribosome category (map03010), with the 17 genes from the 242 gene list indicated in red Representation of the KEGG extracellular matrix receptor interaction category (map04512), with location of the ten genes from the 242 gene list indicated in red. Representation of the KEGG nitrogen metabolism category (map00910), with location of the reaction catalyzed by the five carbonic anhydrase isozymes from the 242 gene list indicated in red In general, we observed a considerable variation in the number of enriched categories reported by each tool although there was uniformity in the analysis conditions used. However, despite this apparent variation, most of the enriched categories reported by the more stringent tools (those reporting a small number of enriched categories) were ranked among the top-categories by the more generous tools (those reporting a larger number of enriched categories). We considered this result of special interest because of previously reported lack of reproducibility between different enrichment tools.[7819] This variability has been attributed to the statistical models applied by the enrichment analysis, to the method of correction for multiple testing, and to differences in the versions of the GO and KEGG data sources used. Thus, our strategy of using several bioinformatic tools to extract biologically related genes consistently involved in colorectal carcinogenesis proved to be successful.

CONCLUSIONS

We used the list of 242 unique mapped genes from three meta-analyses of GEP studies on colorectal carcinogenesis for a systematic enrichment analysis of GO categories and KEGG pathways, applying up to nine different enrichment tools. After applying stringent selection criteria to avoid false positive results, the ribosomal proteins group, the extracellular matrix receptor interaction category, the carbonic anhydrase isozymes, and a general category related to inflammation emerged as significantly and consistently overrepresented categories. These categories have known functional relationships to CRC development and their value as diagnostic markers and therapeutic targets deserve further investigation.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JL and AF conceived and designed the study. JL conducted the analyses and wrote the initial manuscript. KH provided oversight and conceptual guidance to the project. KH and AF contributed to the final manuscript. All authors read and approved the final manuscript.

AUTHOR'S PROFILE

Prof. Kari Hemminki, 1973 PhD in Medicine, University of Helsinki, Finland 1973 MD in Medicine, University of Helsinki, Finland 1975 Docent in Biochemistry, University of Helsinki, Finland 1976 - 1978 Postdoc in Molecular Biology, John Hopkins, Baltimore, USA. Jesus Lascorz, 2001 BsC in Biochemistry, University of Zaragoza, Spain 2002 - 2003 Molecular Genetics Laboratory, Central Institute of Mental Health, University of Heidelberg, Mannheim, Germany 2008 PhD in Human Biology, Institute of Human Genetics, University Erlangen-Nürnberg, Germany Asta Foersti, 1984 M.Sc. in Biochemistry, University of Kuopio, Finland 1992 PhD in Biochemistry and Biotechnology, University of Kuopio, Finland Dr. Jason A. Zell, is Assistant Professor of Medicine (Hematology/Oncology) and Epidemiology at the School of Medicine, University of California Irvine
Table 5B

Results of all enrichment tools used with the 242 gene list: Gene ontology molecular function categories

Table 5C

Results of all enrichment tools used with the 242 gene list: KEGG pathway categories

  27 in total

1.  The KEGG resource for deciphering the genome.

Authors:  Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Yasushi Okuno; Masahiro Hattori
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Authors:  Fátima Al-Shahrour; Ramón Díaz-Uriarte; Joaquín Dopazo
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

Review 3.  Growth control and ribosome biogenesis.

Authors:  Harri Lempiäinen; David Shore
Journal:  Curr Opin Cell Biol       Date:  2009-09-30       Impact factor: 8.382

4.  Meta-analysis of colorectal cancer gene expression profiling studies identifies consistently reported candidate biomarkers.

Authors:  Simon K Chan; Obi L Griffith; Isabella T Tai; Steven J M Jones
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2008-03       Impact factor: 4.254

5.  Carbonic anhydrase IX is highly expressed in hereditary nonpolyposis colorectal cancer.

Authors:  Anssi M Niemelä; Piritta Hynninen; Jukka-Pekka Mecklin; Teijo Kuopio; Antti Kokko; Lauri Aaltonen; Anna-Kaisa Parkkila; Silvia Pastorekova; Jaromir Pastorek; Abdul Waheed; William S Sly; Torben F Orntoft; Mogens Kruhøffer; Hannu Haapasalo; Seppo Parkkila; Antti J Kivelä
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2007-09       Impact factor: 4.254

Review 6.  New insights into the aetiology of colorectal cancer from genome-wide association studies.

Authors:  Albert Tenesa; Malcolm G Dunlop
Journal:  Nat Rev Genet       Date:  2009-06       Impact factor: 53.242

7.  Sequence biases in large scale gene expression profiling data.

Authors:  Asim S Siddiqui; Allen D Delaney; Angelique Schnerch; Obi L Griffith; Steven J M Jones; Marco A Marra
Journal:  Nucleic Acids Res       Date:  2006-07-13       Impact factor: 16.971

8.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization.

Authors:  Jing Chen; Eric E Bardes; Bruce J Aronow; Anil G Jegga
Journal:  Nucleic Acids Res       Date:  2009-05-22       Impact factor: 16.971

9.  Genomewide association studies and human disease.

Authors:  John Hardy; Andrew Singleton
Journal:  N Engl J Med       Date:  2009-04-15       Impact factor: 91.245

10.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

View more
  16 in total

Review 1.  Gastrointestinal malignancy and the microbiome.

Authors:  Maria T Abreu; Richard M Peek
Journal:  Gastroenterology       Date:  2014-01-07       Impact factor: 22.682

2.  Identification of key pathways and genes in colorectal cancer using bioinformatics analysis.

Authors:  Bin Liang; Chunning Li; Jianying Zhao
Journal:  Med Oncol       Date:  2016-08-31       Impact factor: 3.064

3.  Integrated regulatory mechanisms of miRNAs and targeted genes involved in colorectal cancer.

Authors:  Jianxin Wang; Hualong Yu; Lan Ye; Lei Jin; Miao Yu; Yanfeng Lv
Journal:  Int J Clin Exp Pathol       Date:  2015-01-01

Review 4.  COL11A1/(pro)collagen 11A1 expression is a remarkable biomarker of human invasive carcinoma-associated stromal cells and carcinoma progression.

Authors:  Fernando Vázquez-Villa; Marcos García-Ocaña; José A Galván; Jorge García-Martínez; Carmen García-Pravia; Primitiva Menéndez-Rodríguez; Carmen González-del Rey; Luis Barneo-Serra; Juan R de Los Toyos
Journal:  Tumour Biol       Date:  2015-03-12

5.  Genome-scale analysis of DNA methylation in colorectal cancer using Infinium HumanMethylation450 BeadChips.

Authors:  Vladimir A Naumov; Edward V Generozov; Natalya B Zaharjevskaya; Darya S Matushkina; Andrey K Larin; Stanislav V Chernyshov; Mikhail V Alekseev; Yuri A Shelygin; Vadim M Govorun
Journal:  Epigenetics       Date:  2013-07-17       Impact factor: 4.528

6.  An approach for the identification of targets specific to bone metastasis using cancer genes interactome and gene ontology analysis.

Authors:  Shikha Vashisht; Ganesh Bagler
Journal:  PLoS One       Date:  2012-11-14       Impact factor: 3.240

7.  Validation of COL11A1/procollagen 11A1 expression in TGF-β1-activated immortalised human mesenchymal cells and in stromal cells of human colon adenocarcinoma.

Authors:  José A Galván; Jorge García-Martínez; Fernando Vázquez-Villa; Marcos García-Ocaña; Carmen García-Pravia; Primitiva Menéndez-Rodríguez; Carmen González-del Rey; Luis Barneo-Serra; Juan R de los Toyos
Journal:  BMC Cancer       Date:  2014-11-23       Impact factor: 4.430

8.  Identification of oral cancer related candidate genes by integrating protein-protein interactions, gene ontology, pathway analysis and immunohistochemistry.

Authors:  Ravindra Kumar; Sabindra K Samal; Samapika Routray; Rupesh Dash; Anshuman Dixit
Journal:  Sci Rep       Date:  2017-05-30       Impact factor: 4.379

9.  Screening for implicated genes in colorectal cancer using whole‑genome gene expression profiling.

Authors:  Long-Ci Sun; Hai-Xin Qian
Journal:  Mol Med Rep       Date:  2018-04-11       Impact factor: 2.952

10.  Identification of key target genes and pathways in laryngeal carcinoma.

Authors:  Feng Liu; Jintao Du; Jun Liu; Bei Wen
Journal:  Oncol Lett       Date:  2016-06-17       Impact factor: 2.967

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.