Literature DB >> 35154602

Assessment of colon cancer molecular mechanism: a system biology approach.

Babak Arjmand1, Mahmood Khodadoost2, Somayeh Jahani Sherafat3, Mostafa Rezaei Tavirani4, Nayebali Ahmadi4, Maryam Hamzeloo Moghadam5, Sina Rezaei Tavirani6, Binazir Khanabadi6, Majid Iranshahi6.   

Abstract

AIM: The current study aimed to assess and compare colon cancer dysregulated genes from the GEO and STRING databases.
BACKGROUND: Colorectal cancer is known as the third most common kind of cancer and the second most important reason for global cancer-related mortality rates. There have been many studies on the molecular mechanism of colon cancer.
METHODS: From the STRING database, 100 differentially expressed proteins related to colon cancers were retrieved and analyzed by network analysis. The central nodes of the network were assessed by gene ontology. The findings were compared with a GSE from GEO.
RESULTS: Based on data from the STRING database, TP53, EGFR, HRAS, MYC, AKT1, GAPDH, KRAS, ERBB2, PTEN, and VEGFA were identified as central genes. The central nodes were not included in the significant DEGs of the analyzed GSE.
CONCLUSION: A combination of different database sources in system biology investigations provides useful information about the studied diseases. ©2021 RIGLD, Research Institute for Gastroenterology and Liver Diseases.

Entities:  

Keywords:  Bioinformatics; Colon cancer; Human; Network analysis; Protein expression

Year:  2021        PMID: 35154602      PMCID: PMC8817753     

Source DB:  PubMed          Journal:  Gastroenterol Hepatol Bed Bench        ISSN: 2008-2258


Introduction

Colorectal cancer is known as the third most common kind of cancer and the second most important reason for global cancer-related mortality rates (1). It is one of the lethal cancers that is associated with problems in diagnosis as well as therapy (2). Many investigations into colon cancer and its molecular mechanism have been performed using different methods (3). Because high throughput methods are widely common in different fields of medical sciences, there are many documents about colon cancer which are concerned with high throughput methods (4). Proteomics, genomics, and bioinformatics are the important high throughput methods that are tied together to solve various problems in medical research. The results of several investigations into colon cancer that were administrated by proteomics, genomics, metabolomics, or bioinformatics have been published. Bioinformatics is a critical field applied to create new concepts by using the analysis results of genomic and proteomic studies (5-7). Dysregulated metabolites, genes, and proteins in colon cancer patients have been studied using bioinformatics. In such studies, much is gathered from databanks or published articles and analyzed using bioinformatic tools (8-10). First, the diversity of data sources, and second, the multiplicity of analysis methods are interesting points about these studies. Based on the selected source and method of investigation, results can be different. It seems clear that an explanation of the investigation protocol is required to determine the most accurate findings (8, 11). GEO is a useful source of data, including gene expression profiles of assessed samples. Many researchers select GEO as a source of data to analyze differentially expressed genes in a defined condition. GEO is not only suitable source of data, but it is also equipped with useful software such as GEO2R which helps the primary analysis of data. Fold change and statistical validation of data are two important findings from GEO. The style of gene regulation, i.e. up- or downregulation is accessible in GEO2R analysis of the studied DEGs (12, 13). STRING is another useful source of data that provides the related dysregulated proteins in the studied condition. There are many published articles that are concerned with “disease query” of string. Combination of STRING and Cytoscape software is a powerful tool in the bioinformatic analysis of data (14, 15). In the present investigation, dysregulated genes in human colon cancer were assessed by using one recorded experiment in GEO and STRING sources to elucidate the findings.

Methods

In this study, 100 proteins associated with colon cancer were extracted from the STRING database using the “disease query option.” The proteins were interacted by Cytoscape software v 3.7.2 (16) by undirected edges, and the network comprising 100 nodes and 2811 links was constructed. The main connected components, including 95 nodes and 5 isolated proteins, were analyzed by the “NetworkAnalyzer” plug in of Cytoscape software. The network was visualized based on degree value by considering the color and size of the nodes. Based on degree value, the 10 top nodes of the main connected component were selected as the hub nodes of the network. The hubs were included in the ClueGO v2.5.7 (17) application of Cytoscape to analyze gene ontology. The related pathways were extracted from KEGG 08.05.2020. A p-value ≤ 0.01 and network specificity; medium were applied to determine the pathways. The GSE127069 of 6 patients, entitled “RNA sequencing for cancer tissues and adjacent tissues of third-stage rectal cancer patients with and without blood vascular thrombus” in GEO (18) was selected for analysis. The volcano plot of gene expression profiles of colon cancer tissue versus adjacent tissue was provided to statistically match the data. The top genes based on fold change (1.5

Results

The network, including a main connected component (shown in Figure 1) and 5 isolated proteins, was constructed for the extracted data from the STRING database. Four centrality parameters, i.e. degree (K), betweenness centrality (BC), closeness centrality (CC), and stress, were determined for the nodes of the main connected component (Table 1). TP53, EGFR, HRAS, MYC, AKT1, GAPDH, KRAS, ERBB2, PTEN, and VEGFA were identified as hub nodes. Thirty-one dysregulated terms in 2 groups of pathways which were related to the hub nodes of the colon cancer network were identified. The pathways that are classified in the two groups and the related proteins are presented in Table 2.
Figure 1

Main connected component of colon cancer network. Among 100 extracted proteins from STRING database, 95 individuals are included in the subnetwork. The nodes are layout base on degree value; bigger size and red to green refer to increment of degree value

Table 1

The nodes of the main connected component and four centrality parameters are presented

Rdisplay nameDegreeBetweenness CentralityCloseness CentralityStress
1TP53920.0160.9793026
2EGFR910.0180.9693038
3HRAS910.0150.9692940
4MYC900.0140.9592758
5AKT1870.0110.9312410
6GAPDH860.0100.9222248
7KRAS860.0120.9222444
8ERBB2850.0110.9132268
9PTEN840.0110.9042178
10VEGFA840.0090.9042020
11CCND1830.0090.8952026
12CDH1830.0080.8951982
13CDKN2A830.0100.8952034
14CASP3810.0070.8791748
15EGF810.0090.8791840
16ESR1810.0080.8791806
17JUN810.0110.8791952
18STAT3810.0060.8791638
19ALB800.0060.8701582
20NOTCH1800.0070.8701772
21CTNNB1790.0090.8621802
22IL6790.0080.8621628
23INS770.0040.8471290
24CD44760.0090.8391548
25SRC760.0060.8391368
26ANXA5750.0040.8321228
27MAPK3730.0030.8171030
28TNF720.0070.8101214
29MMP9710.0030.803898
30IGF1700.0030.797848
31BCL2L1690.0030.790846
32ACTB680.0030.783830
33MTOR680.0020.783748
34FGF2670.0020.777738
35FN1670.0020.777702
36MMP2670.0020.777718
37PTGS2660.0030.770808
38SNAI1660.0030.770792
39CXCL8650.0020.764624
40CDKN1A630.0030.752668
41CXCR4630.0030.752598
42SMAD4630.0050.752912
43KDR620.0020.746556
44CDH2610.0020.740646
45CASP8600.0020.734508
46HIF1A600.0010.734406
47HIST2H3PS2600.0020.734604
48CDK4580.0020.723576
49EPCAM580.0050.723854
50IGF1R580.0010.723362
51MET580.0020.723458
52SNAI2580.0020.723448
53BRCA1570.0040.718694
54IL2570.0040.718662
55PECAM1570.0010.718390
56SOX2570.0020.718542
57ATM560.0030.712556
58IL10560.0030.712530
59MDM2560.0010.712418
60MCL1550.0010.707314
61CASP9540.0010.701332
62CDK2540.0030.701562
63CSF2530.0030.696534
64CYCS530.0020.696386
65MUC1530.0020.696516
66PIK3CA530.0070.696830
67ZEB1520.0010.691286
68HNF4A510.0020.686370
69PROM1510.0010.686216
70IL1B500.0020.681354
71MMP7500.0010.681296
72CCNB1490.0010.676300
73CTLA4470.0020.667406
74CD274460.0030.662484
75DNMT1460.0020.662418
76FOXP3460.0030.662420
77PPARG460.0010.662180
78CDK1440.0010.653208
79CDX2340.0010.606238
80MLH1340.0020.610272
81ALDH1A1320.0000.60356
82AXIN1320.0000.60374
83MSH2320.0010.603216
84TNFRSF10B310.0000.59930
85AXIN2300.0000.59588
86TOP1300.0010.595112
87APC280.0010.588124
88LGR5260.0000.58076
89TYMS260.0010.577116
90CEACAM5250.0010.577130
91DDX53240.0000.57316
92MSH6220.0000.56662
93PMS2170.0000.55010
94CD4160.0000.54738
95CD8A110.0000.52514
Table 2

The biochemical pathways which are related to the 10 hub nodes are presented. Term p-value, Term p-value Corrected with Bonferroni step down, group p-value, and group p-value Corrected with Bonferroni step down were 0.00. %AG, Nr. G, and AG refer to percentage of associated genes, number of associated genes, and associated genes respectively. Highlighted row refers to the first group and the other terms belong to the second group (Endometrial group)

GOTerm% AGNr. GAG
HIF-1 signaling pathway55[AKT1, EGFR, ERBB2, GAPDH, VEGFA]
ErbB signaling pathway76[AKT1, EGFR, ERBB2, HRAS, KRAS, MYC]
Sphingolipid signaling pathway45[AKT1, HRAS, KRAS, PTEN, TP53]
Mitophagy43[HRAS, KRAS, TP53]
Longevity regulating pathway44[AKT1, HRAS, KRAS, TP53]
Longevity regulating pathway53[AKT1, HRAS, KRAS]
VEGF signaling pathway74[AKT1, HRAS, KRAS, VEGFA]
Fc epsilon RI signaling pathway43[AKT1, HRAS, KRAS]
Prolactin signaling pathway43[AKT1, HRAS, KRAS]
Thyroid hormone signaling pathway45[AKT1, HRAS, KRAS, MYC, TP53]
GnRH secretion53[AKT1, HRAS, KRAS]
AGE-RAGE signaling pathway in diabetic complications44[AKT1, HRAS, KRAS, VEGFA]
Colorectal cancer76[AKT1, EGFR, HRAS, KRAS, MYC, TP53]
Renal cell carcinoma64[AKT1, HRAS, KRAS, VEGFA]
Pancreatic cancer86[AKT1, EGFR, ERBB2, KRAS, TP53, VEGFA]
Endometrial cancer148[AKT1, EGFR, ERBB2, HRAS, KRAS, MYC, PTEN, TP53]
Glioma86[AKT1, EGFR, HRAS, KRAS, PTEN, TP53]
Prostate cancer77[AKT1, EGFR, ERBB2, HRAS, KRAS, PTEN, TP53]
Thyroid cancer114[HRAS, KRAS, MYC, TP53]
Melanoma86[AKT1, EGFR, HRAS, KRAS, PTEN, TP53]
Bladder cancer177[EGFR, ERBB2, HRAS, KRAS, MYC, TP53, VEGFA]
Chronic myeloid leukemia75[AKT1, HRAS, KRAS, MYC, TP53]
Acute myeloid leukemia64[AKT1, HRAS, KRAS, MYC]
Small cell lung cancer44[AKT1, MYC, PTEN, TP53]
Non-small cell lung cancer96[AKT1, EGFR, ERBB2, HRAS, KRAS, TP53]
Breast cancer58[AKT1, EGFR, ERBB2, HRAS, KRAS, MYC, PTEN, TP53]
Hepatocellular carcinoma47[AKT1, EGFR, HRAS, KRAS, MYC, PTEN, TP53]
Gastric cancer57[AKT1, EGFR, ERBB2, HRAS, KRAS, MYC, TP53]
Central carbon metabolism in cancer128[AKT1, EGFR, ERBB2, HRAS, KRAS, MYC, PTEN, TP53]
Choline metabolism in cancer44[AKT1, EGFR, HRAS, KRAS]
PD-L1 expression and PD-1 checkpoint pathway in cancer65[AKT1, EGFR, HRAS, KRAS, PTEN]
The volcano plot of gene expression profiles of colon cancer tissue versus adjacent tissue for the analyzed GSE is presented in Figure 2. Based on the volcano plot, the samples are comparable. A list of the significant and known genes of the GEO analysis is given in Table 3. The top 21 rows of Table 3 refer to the downregulated genes, and the other 6 genes are upregulated.
Figure 2

Volcano plot of gene expression profiles of colon cancer tissue versus adjacent tissue

Table 3

Significant and known genes of the GEO analysis are presented. Among the 27 genes the top 21 DEGs are down-regulated and the other 6 genes are upregulated

RSpot IDGene name
1P52761slr0709
2P57784Snrpa1
3P49155xylI
4P57417flgN
5P56746CLDN15
6P80162CXCL6
7P47148PXP2
8P96036spt5
9P20443Sag
10P24716copR
11P60079MW2494
12P44648trmB
13P62115psbN
14P94795nifH
15P26439HSD3B2
16P88119env
17P96403MT0231
18P63777citA
19P75978ymfN
20P62741HBG1
21P56001rpoA
22P41292MT-ATP8
23P67259NMB0796
24P59988uspB
25P82762LCR47
26P52119ratB
27P68097CYCS
The nodes of the main connected component and four centrality parameters are presented The biochemical pathways which are related to the 10 hub nodes are presented. Term p-value, Term p-value Corrected with Bonferroni step down, group p-value, and group p-value Corrected with Bonferroni step down were 0.00. %AG, Nr. G, and AG refer to percentage of associated genes, number of associated genes, and associated genes respectively. Highlighted row refers to the first group and the other terms belong to the second group (Endometrial group)

Discussion

Many diseases contained in the STRING database have related dysregulated proteins listed. In this research, 100 proteins that are dysregulated in human colon cancers were retrieved. The data was organized in the protein-protein interaction unit (Figure 1). The constructed network analysis revealed that the network is a scale-free network, in which the number of limited nodes which are known as central nodes can be selected as critical nodes of the analyzed network (19). As shown in Table 1, the centrality parameters of nodes were determined. TP53, EGFR, HRAS, MYC, AKT1, GAPDH, KRAS, ERBB2, PTEN, and VEGFA are appeared as hub nodes of the assessed network. The hub genes are the important central nodes that can be discriminated from the other nodes of the network as critical individuals (20). As shown in Table 1, the other centrality parameters of the hub nodes are also high values; thus, it can be concluded that the hub nodes are potent hub-bottleneck nodes. A usual and simple analysis of data was conducted to find the critical nodes of the studied network. As represented in Table 2, the related pathways for the central nodes were identified through gene ontology analysis. It seems that a complete analysis of data is formed, and a useful interpretation is accessible. Based on previous investigations, TP53 is the top central gene related to colon cancer and known as a biomarker of many cancers (21). As specificity and sensitivity are the two main properties of biomarkers (22), it can be concluded that TP53 cannot be considered as a biomarker of colon cancer. Like TP53, the other introduced critical nodes are also related to different types of cancers. Thus, it can be concluded that the well-known data in the STRING database can be matched with various kinds of cancers. As reported, EGFR is a key element in colorectal cancers (23), and many documents point to EGFR as a biomarker of cancers such as head and neck squamous cell carcinomas and primary non-small cell lung cancer (24, 25). In another part of the study, colon cancer tissue was compared with adjacent tissues. As depicted in Figure 2, the data indicated that analysis is possible. In total, 27 significant DEGs that discriminate cancerous tissue from the adjacent tissue were identified. In the first attempt, it was concluded that the evidence for a correlation between the findings and the results of STRING analysis is insufficient (Compare the contents of Table 3 and the introduced 10 central nodes). As the number of DEGs in the GEO analysis is limited to 27, inclusion of data in an interactome cannot be conducted to form a scale-free network. Significant and known genes of the GEO analysis are presented. Among the 27 genes the top 21 DEGs are down-regulated and the other 6 genes are upregulated Main connected component of colon cancer network. Among 100 extracted proteins from STRING database, 95 individuals are included in the subnetwork. The nodes are layout base on degree value; bigger size and red to green refer to increment of degree value Volcano plot of gene expression profiles of colon cancer tissue versus adjacent tissue The best way to analyze this set of genes is to add their first neighbors. STRING is a rich source of neighbors, and there are options in STRING that allow researchers to add an adequate number of the first neighbors to the queried genes. This mode of analysis enables the investigator to construct a scale-free network and analyze the queried DEGs. The discriminated values of centrality parameters for the queried genes, which were induced by the added first neighbors in addition to the fold change values, provide a clear concept for selecting the critical DEGs from among the studied genes. It can be concluded that each type of analysis is unique in its properties and findings. Based on researcher favorites, a study can be designed to obtain a different result that is useful from that point of view. Many studies have been concerned with this combination mode of analysis with different numbers of added first neighbors to discriminate the queried DEGs (26, 27). The analysis of data from GEO and STRING sources revealed that each kind of analysis has its benefits; however, analysis using the sources separately also provided useful results. It seems that the combination mode of analysis is a suitable and more complete method for finding a clear concept and interpretation of the studied disease.

Conflict of interests

The authors declare that they have no conflict of interest.
  25 in total

1.  The Gene Expression Omnibus Database.

Authors:  Emily Clough; Tanya Barrett
Journal:  Methods Mol Biol       Date:  2016

2.  Tuning Push-Pull Electronic Effects of AIEgens to Boost the Theranostic Efficacy for Colon Cancer.

Authors:  Hai-Tao Feng; Shaomin Zou; Ming Chen; Feng Xiong; Mong-Hong Lee; Lekun Fang; Ben Zhong Tang
Journal:  J Am Chem Soc       Date:  2020-06-16       Impact factor: 15.419

3.  Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data.

Authors:  Nadezhda T Doncheva; John H Morris; Jan Gorodkin; Lars J Jensen
Journal:  J Proteome Res       Date:  2018-12-05       Impact factor: 4.466

4.  Comparison of KRAS and EGFR gene status between primary non-small cell lung cancer and local lymph node metastases: implications for clinical practice.

Authors:  Leina Sun; Qiang Zhang; Huanling Luan; Zhongli Zhan; Changli Wang; Baocun Sun
Journal:  J Exp Clin Cancer Res       Date:  2011-03-17

5.  Molecular identification of Giardia lamblia; is there any correlation between diarrhea and genotyping in Iranian population?

Authors:  Nader Pestechian; Hamidullah Rasekh; Mohammad Rostami-Nejad; Hossein Ali Yousofi; Ahmad Hosseini-Safa
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2014

6.  Gene screening of colorectal cancers via network analysis.

Authors:  Vahid Mansouri; Mostafa Rezaei Tavirani; Sina Rezaei Tavirani
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2019

7.  Immunological reactions by T cell and regulation of crucial genes in treated celiac disease patients.

Authors:  Mohammad Rostami-Nejad; Zahra Razzaghi; Somayeh Esmaeili; Sina Rezaei-Tavirani; Alireza Akbarzadeh Baghban; Reza Vafaee
Journal:  Gastroenterol Hepatol Bed Bench       Date:  2020

Review 8.  Molecular Mechanisms of Colon Cancer Progression and Metastasis: Recent Insights and Advancements.

Authors:  Ahmed Malki; Rasha Abu ElRuz; Ishita Gupta; Asma Allouch; Semir Vranic; Ala-Eddin Al Moustafa
Journal:  Int J Mol Sci       Date:  2020-12-24       Impact factor: 5.923

Review 9.  Natural Polyphenols as Targeted Modulators in Colon Cancer: Molecular Mechanisms and Applications.

Authors:  Jing Long; Peng Guan; Xian Hu; Lingyuan Yang; Liuqin He; Qinlu Lin; Feijun Luo; Jianzhong Li; Xingguo He; Zhiliang Du; Tiejun Li
Journal:  Front Immunol       Date:  2021-02-16       Impact factor: 7.561

10.  Fibrinogen Dysregulation is a Prominent Process in Fatal Conditions of COVID-19 Infection; a Proteomic Analysis.

Authors:  Mostafa Rezaei-Tavirani; Mohammad Rostami Nejad; Babak Arjmand; Sina Rezaei Tavirani; Mohammadreza Razzaghi; Vahid Mansouri
Journal:  Arch Acad Emerg Med       Date:  2021-03-15
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.