Literature DB >> 28443230

Prediction of optimal gene functions for osteosarcoma using network-based- guilt by association method based on gene oncology and microarray profile.

Xinrang Chen1.   

Abstract

In the current study, we planned to predict the optimal gene functions for osteosarcoma (OS) by integrating network-based method with guilt by association (GBA) principle (called as network-based gene function inference approach) based on gene oncology (GO) data and gene expression profile. To begin with, differentially expressed genes (DEGs) were extracted using linear models for microarray data (LIMMA) package. Then, construction of differential co-expression network (DCN) relying on DEGs was implemented, and sub-DCN was identified using Spearman correlation coefficient (SCC). Subsequently, GO annotations for OS were collected according to known confirmed database and DEGs. Ultimately, gene functions were predicted by means of GBA principle based on the area under the curve (AUC) for GO terms, and we determined GO terms with AUC >0.7 as the optimal gene functions for OS. Totally, 123 DEGs and 137 GO terms were obtained for further analysis. A DCN was constructed, which included 123 DEGs and 7503 interactions. A total of 105 GO terms were identified when the threshold was set as AUC >0.5, which had a good classification performance. Among these 105 GO terms, 2 functions had the AUC >0.7 and were determined as the optimal gene functions including angiogenesis (AUC =0.767) and regulation of immune system process (AUC =0.710). These gene functions appear to have potential for early detection and clinical treatment of OS in the future.

Entities:  

Keywords:  Area under the curve; Differential co-expression network; Gene ontology; Guilt by association; Osteosarcoma; Spearman correlation coefficient

Year:  2017        PMID: 28443230      PMCID: PMC5396855          DOI: 10.1016/j.jbo.2017.04.003

Source DB:  PubMed          Journal:  J Bone Oncol        ISSN: 2212-1366            Impact factor:   4.072


Introduction

Osteosarcoma (OS) is the most commonly diagnosed histological form of primary bone tumor with high morbidity, and it is mainly prevalent in teenagers and young people [1]. Pain is the most common early symptom of OS and can cause fracture of the affected bone. Currently, multiple therapeutic strategies for OS, for example, surgical resection, chemotherapy and radiotherapy, have significantly improved the prognosis of patients with OS [2]. However, the overall survival rates rarely exceed 60–65% [3]. Moreover, a significant portion of OS patients develop metastasis even after curative resection of the primary tumor. There is still a long way to go for the management of OS [4]. Therefore, with an attempt to continue to make progress in the diagnosis and management of OS, identification of sensitive and specific minimally invasive signatures is one of the most important challenges. Genetic aberration has been demonstrated as an important factor that may play significant roles in OS pathogenesis. For instance, Zhu and colleagues [5] have reported that SOX9 is over-expressed in OS tissues compared with controls. Moreover, it is indicated that FOXM1 is up-regulated, which suggests FOXM1 might be an valuable bio-signature for OS [6]. Nevertheless, these genes do not treat the OS efficiently or selectively, because molecules frequently don’t work individually, yet co-operate with the other genes. Additionally, genetic factors can disturb the protein levels, thereby in turn perturb the molecule interactions. Network is characterized by the complicated interactions and the complex interwoven relationships that control cellular functions [7]. Hence, understanding the networks will be beneficial to provide new insights to explore the molecular pathogenesis of OS. The concept of differential co-expression network (DCN) has been employed to the studies of OS, due to statistical confidence of single connections, overlap with protein interaction, and mathematical convenience [8]. In addition, improving our knowledge of gene function in uncharacterized genes is a major task [9]. Remarkably, gene interactions can be applied to deduce functional relationships based on a principle known as “guilt by association” (GBA) [10]. GBA has been indicated to predict gene function in various types of biological networks, for example, gene co-expression network [11]. Thus, in our work, we integrated network-based method with GBA principle (called as network-based gene function inference approach) to further identify the optimal gene functions for OS using gene ontology (GO) data and gene expression profile. To begin with, differentially expressed genes (DEGs) were extracted using linear models for microarray data (LIMMA) package. Then, construction of DCN based on DEGs was implemented, and sub-DCN was identified using Spearman correlation coefficient (SCC). Subsequently, GO annotations for OS were collected according to known confirmed database and DEGs. Ultimately, gene functions were predicted by means of GBA principle based on the area under the curve (AUC) for GO terms, and we determined GO terms with AUC >0.7 as the optimal gene functions for OS. These gene functions appear to have potential for early detection and clinical treatment of OS in the future.

Materials and methods

Gene expression profile and data pre-treatment

OS-related microarray data (accession number E-GEOD-36001) [12] were downloaded from ArrayExpress database based on the platform of A-MEXP-930-Illumina Human-6 v2 Expression BeadChip. There were 19 OS samples and 6 normal samples in E-GEOD-36001. Prior to analysis, we firstly pre-processed the microarray profile of E-GEOD-36001. Specifically, robust multi-array average (RMA) was used to perform background correction [13]. Quartile algorithm was utilized to implement quartile normalization [14], following by perfect match (PM)/mismatch (MM) correction using microarray suite (MAS) 5.0 package [15]. Ultimately, the data on probe levels were converted into gene symbols relying on annotate package [16]. Overall, 19,027 genes were reserved for further exploitation.

Identification of DEGs

The LIMMA package [17] in R language and an empirical Bayes framework were applied in our analysis to achieve DEGs between OS and normal samples. A t-test was conducted, and the multiple test was applied to correct the raw P values using the Benjamini & Hochberg [18] method based on false discovery rate (FDR). DEGs were extracted on the basis of FDR <0.05 and |log fold change (FC)| ≥2.

Generation of DCN

Cytoscape (http://cytoscape.org/) is an open source software which integrates bio-molecular interactions with high-throughput expression data as well as other molecular states into a unified conceptual network [19]. Therefore, we inputted DEGs into the Cytoscape software to visualize the DCN. Furthermore, with the goal of assessing the co-expressed strength of each interaction in the DCN, SCC was used in our study. As we all know, SCC is used to measure the co-expression probability of two variables by assessing the strength of association of two co-expressed variables and it ranges from −1 to 1 inclusive [20], [21]. The SCC absolute value of one interaction was determined as the weight value of the corresponding edge, and the higher the weight value was, the more relevant the interaction was related to the disease. Thus, we selected the edges with weight values higher than 0.8 to construct the sub-DCN. In order to display the sub-DCN more vividly, Cytoscape software was applied.

GO annotation for DEGs

GO annotation has been broadly utilized as functional enrichment studies for large-scale genes [22]. In our study, human GO annotations comprised of 19,003 GO terms covering 18,402 genes were retrieved from the GO consortium (http://geneontology.org/). In an attempt to obtain stable performance, we filtered for the GO annotations on gene size such that each remaining GO term had associated genes >20, and a total of 1313 GO terms were left to be used in our analysis. Next, DEGs identified above were mapped to the 1313 GO annotations. Finally, the GO slim set was obtained, consisting of 123 DEGs and 137 GO terms.

Identifying gene functions based on “GBA” prediction

Gene networks have been widely used in gene function prediction algorithms, many based on complex extensions of the “GBA” principle. As demonstrated here, GBA prediction approach was utilized to predict significant gene functions in OS progression. For GBA method, we used three-fold cross-validation to extract a ranked list scoring genes in DCN as to how they belonged in the known GO terms. The sum of co-expression values between the training set (co-expression) and the candidate gene was divided by the sum of co-expression values between the genes outside the training set and the candidate gene to analyze degree of candidacy. In detail, for each gene i in the DCN, all other neighbored genes of gene i were mapped to each GO category K, and the multifunctionality (MF) score for each gene i within the K-GO term based on the following equation: In this formula, stood for the number of genes in GO term i, and denoted the count of genes outside GO category i. Then, based on support vector machine (SVM), AUC for each GO group K was calculated, and the mean AUC across all GO terms was determined. Thus, the AUC scores were ordered from the highest to the lowest, the ranks of GO terms were ordered oppositely. The AUC of 0.5 stands for classification at chance levels, while the AUC of 1.0 denotes a perfect classification. In the literature about the gene function prediction, the AUC greater than 0.7 are regarded good [23]. In our study, GO terms with AUC >0.7 were identified and regarded as the optimal gene functions.

Results

DCN construction

Before DCN generation, we firstly identified DEGs between OS and normal samples using t-test. Based on FDR <0.05 and |log FC=≥ 2, a total of 123 DEGs were identified. The top 20 DEGs were shown in Table 1. The most significant DEGs were HOXB7, RHPN2, SRGN, FOXF2, and PLVAP.
Table 1

List of the top 20 differentially expressed genes (DEGs).

Genes|log (fold change)|False discovery rate (FDR)
HOXB72.0858981.33E−07
RHPN22.1474066.32E−07
SRGN5.1743413.34E−06
FOXF22.3188694.51E−06
PLVAP3.4042511.98E−05
COX7A12.9887355.41E−05
APOE3.3998526.21E−05
SPINT22.3011936.65E−05
LXN2.5233818.93E−05
TNFRSF1B2.4271941.10E−04
VAMP83.4946501.61E−04
CBS2.8686251.76E−04
HCLS12.6867051.81E−04
PHGDH2.3620452.15E−04
GIMAP72.2581772.74E−04
C1QC2.8850762.76E−04
HBB5.1915612.86E−04
C1QA3.5215702.93E−04
TYROBP3.5118733.17E−04
C1QB3.1848353.17E−04
List of the top 20 differentially expressed genes (DEGs). In order to further explore the biological activities of DEGs, we constructed a DCN using the above-identified 123 DEGs. In this DCN, there were 123 nodes and 7503 interactions. Under the DCN, node degree was more than just an statistic about a gene, and significantly, degree could explain the structures of the network. Then, topological degree for each node was computed through summing up the nodes it linked directly, and the degree distribution for each node was shown in Fig. 1A. We found that the ranges of degrees for a large number of DEGs (about 79%) was from 59 to 74, and genes (TNFRSF1B and HLA-E) had the highest degrees of 78. In addition to degree, the interacted strength was an index to estimate the interactions in the DCN. Thus, we used SCC to assign a weight value to each edge, and the specific information were demonstrated in Fig. 1B where a square stood for an edge in the DCN. The deeper the color was, the larger the weight value was. Significantly, among the interactions, there was a good liner correlation which indicated that the DCN had a good network scale property. Then, we selected the interactions with weight values >0.8 to establish the sub-DCN which was revealed in Fig. 2. In the sub-DCN, there were 46 nodes and 438 interactions. The degree distribution was shown in Table 2. Interestingly, in this sub-DCN, TNFRSF1B also had the highest degree (degree=33).
Fig. 1

Differentially co-expressed network (DCN) construction for osteosarcoma (OS) based on differentially expressed genes (DEGs). A. Degree distribution of genes in the DCN. B. Weights distribution of edges in the DCN. Heatmap clarified weight distribution for each interaction.

Fig. 2

Sub-DCN using the cut-off threshold of weight value >0.8. In the sub-DCN, there were 46 nodes and 438 interactions.

Table 2

Degree distribution of 46 genes in the sub-differential co-expression network (DCN).

GenesDegreeGenesDegree
TNFRSF1B33HBD20
ZFP3632STOM19
HLA-E31DOCK218
ARHGDIB28HLA-DMA17
HLA-DPA128APOE17
VAMP826LAPTM517
FABP424ICAM217
GIMAP724FCGRT16
CD7424ACP515
PLEK23APOD15
COMP23COX7A115
HCLS123HLA-DMB15
PLAC923CD9314
JCHAIN23ATP8B414
SERPINA323FAM46C14
C2orf4022CD1414
SRGN22SPP113
CTSK21HLA-DRA12
CSF1R21FCER1G12
CAT21ALOX5AP12
ADIRF21CKM8
HCST21VWF4
C1QC21
Differentially co-expressed network (DCN) construction for osteosarcoma (OS) based on differentially expressed genes (DEGs). A. Degree distribution of genes in the DCN. B. Weights distribution of edges in the DCN. Heatmap clarified weight distribution for each interaction. Sub-DCN using the cut-off threshold of weight value >0.8. In the sub-DCN, there were 46 nodes and 438 interactions. Degree distribution of 46 genes in the sub-differential co-expression network (DCN).

Identifying gene functions using GBA prediction

In our study, the MF score was first calculated for each gene in a GO term, which influenced the counting membership in a given GO term by how much the gene contributed to the corresponding GO function. The higher the MF score of a gene was, the greater extent to which it should be a good candidate for owing any biological process. Consequently, a single ranked list of genes that best captured candidacy across all functions was equivalent to a list of genes sorted by MF score. Intuitively, if one was forced to choose a single ranking, the gene with the most GO annotations could be predicted as being in all GO categories. This was because if one gene was enriched in 100 GO terms (high MF score), and another was in only one (low MF score), by placing the former gene ahead of the latter gene in a fixed ranking, we made a correct prediction more often across all GO categories. Thus, with the goal of classifying OS patients and normal controls, 3-fold cross-validation was utilized to compute the AUC for GO functions based on MF scores. The AUC distribution for GO terms was shown in Fig. 3. The AUC for the majority of functional terms ranged from 0.4 to 0.7, especially from 0.5 to 0.65. AUC was used as a predictor to select significant GO terms, so, a total of 105 GO terms were identified based on AUC >0.5. Moreover, among these 105 GO terms, 2 GO terms had the AUC >0.7 and were determined as the optimal gene functions including angiogenesis (AUC =0.767) and regulation of immune system process (AUC =0.710).
Fig. 3

Gene function prediction performance using guilt by association (GBA). The histogram of AUCs across all GO terms which can be obtained using a single list constructed from number of coexpression partners.

Gene function prediction performance using guilt by association (GBA). The histogram of AUCs across all GO terms which can be obtained using a single list constructed from number of coexpression partners.

Discussion

Compared to gene focused researches, gene set and gene signature focused functional investigations appear much rewarding in understanding functional insights [24], and integrative functional genomic analysis of tumors promotes the understanding of the molecular mechanisms of cancers [25]. Network modeling on the basis of co-expression pattern analysis has been widely applied in a variety of cancers to understand the biology processes of cancers, and to obtain clinical insights [26]. Recently, a large amount of techniques have been created to extend GBA to indirect connections to predict gene functions [27], [28]. Gene interactions can be applied to deduce functional relationships based on a principle known as GBA [10]. The GBA principle forms the basis for most gene function prediction methods, which typically use relational information (e.g. interactions) to predict new gene membership in gene function categories [29]. However, combination of gene function prediction and network analysis are sparse. Generally speaking, network based-GBA analysis method can make exhaustively examining issues faster and easier than the simple GBA approach. Thus, in this work, we combined GBA principle with DCN-based analysis to further explore both direct and indirect optimal gene functions for OS on the basis of GO information as well as gene expression data. Finally, a total of 105 GO terms were identified based on AUC >0.5, which had a good classification ability. Moreover, 2 out of 105 GO terms had the AUC >0.7 and were determined as the optimal gene functions including angiogenesis and regulation of immune system process. The ability of tumors to progress to more malignant phenotypes depends on tumor microenvironment. Blood capillaries, made up of endothelial cells, are usually static under physiological conditions but are induced to proliferate by direct exposure to angiogenic factors, for instance, vascular endothelial growth factor (VEGF) [30]. The proliferation and migration of endothelial cells play an important role in tumor angiogenesis [31]. As we all know, angiogenesis, development of new blood vessels from preexisting vascular bed, plays important roles during tumor growth and metastasis [32]. Significantly, the concept that malignant neoplasm onset, growth, and invasion rely on angiogenesis is broadly recognized and accepted [33], [34]. Theoretically, cancer cells, like normal cells, also need the delivery of oxygen as well as nutrients through blood vessels to survive and grow. In situ carcinoma, there seems to be a prolonged dormant period during which the cancer is not angiogenic, and is restrained in growth [35]. When sufficient cancer cells have transformed to the angiogenic phenotype from the quiescent phenotype, neovascularization may initiate, and thus rapid tumor growth can proceed. Through the literatures, we found that gene function of ‘angiogenesis’ has already been established to be important for OS. For example, previous studies have implicated that high recurrence rate and metastatic potential of OS are connected with high levels of angiogenesis [36], [37]. Moreover, Wang et al. [38] have used differential co-expression network to find that angiogenesis is closely related to OS progression and metastasis. Most importantly, suppression of cancer angiogenesis has been regarded as an attractive strategy for tumor therapy [39]. Thus, anti-angiogenic therapy might be a consideration for OS patients, with the majority of OS showing high vascularization [40]. During the tumor occurrence and metastasis, cancer cells can modulate the immune system of the host to find strategies that enable them to survive from the immune surveillance [41]. Cancer immunoediting is one of the two major strategies to get away from immune surveillance [42]. Innate and adaptive immunity appear to contribute to cancer immunoediting [43]. Previously, lymphocytes and IFNγ have indicated to prevent tumor immunoediting, thereby preventing the selection of less immunogenic tumor cells [44]. Indeed, immune cells (for example, macrophages cells) release soluble agents like chemokines and cytokines promoting the migration and infiltration of leukocytes that exert important functions in tumor development [45]. OS is frequently infiltrated by immune cells including macrophages and T cells [46]. Moreover, macrophage migration inhibitory factor is an proinflammatory cytokine which exerts an crucial function in the immune system [47]. Macrophage migration inhibitory factor contributes to cell proliferation, survival, and tumor-related angiogenesis [48], [49]. Moreover, Han et al. [50] have suggested that macrophage migration inhibitory factor can serve as a prognostic marker and a potential therapeutic target for OS. Interestingly, a link between p53 and IFN system was recently discovered in regulating tumor suppression and immunity [51]. Abnormalities of p53 gene in OS occur with a high incidences approaching 50% of all cases [52]. Currently, growing evidence implicates that the immune system is a fundamental player in cancer and a key determinant of prognosis and response to therapy [53], [54]. While, understanding the crosstalk between OS cells and the immune system, and how they drive tumorigenesis is still in its infancy. Hence, it is urgently needed to develop novel drugs that would potentiate the immune system in OS patients to act against this disease by means of immunomodulatory methods. The current study had several limitations. First, there were limited clinical samples which might lead to biased estimates. Second, our analysis was conducted using bioinformatics methods but the conclusions have not been verified by means of any lab technique. Last but not least, we did not compare the obtained findings using other approaches. Despite these limitations, our findings had important implications for the molecular mechanisms underlying OS, yet further experimental study is still needed to validate our study. In a nutshell, using network-based method with GBA based on GO and microarray profile of OS, we extracted 2 optimal gene functions during OS progression including angiogenesis and regulation of immune system process which might contribute to the successful identification of therapeutic targets for OS. Further studies might shed novel insights into the role of the gene functions in the pathophysiology of OS.

Conflicts of Interest

We declare that there are no conflicts of interest.
  51 in total

1.  Gene coexpression networks in human brain identify epigenetic modifications in alcohol dependence.

Authors:  Igor Ponomarev; Shi Wang; Lingling Zhang; R Adron Harris; R Dayne Mayfield
Journal:  J Neurosci       Date:  2012-02-01       Impact factor: 6.167

2.  Genetic amplification of the vascular endothelial growth factor (VEGF) pathway genes, including VEGFA, in human osteosarcoma.

Authors:  Jilong Yang; Da Yang; Yan Sun; Baocun Sun; Guowen Wang; Jonathan C Trent; Dejka M Araujo; Kexin Chen; Wei Zhang
Journal:  Cancer       Date:  2011-04-14       Impact factor: 6.860

Review 3.  Interferons, immunity and cancer immunoediting.

Authors:  Gavin P Dunn; Catherine M Koebel; Robert D Schreiber
Journal:  Nat Rev Immunol       Date:  2006-11       Impact factor: 53.106

4.  Forkhead box protein M1 predicts outcome in human osteosarcoma.

Authors:  Chong-Lang Fan; Jian Jiang; Hu-Cheng Liu; Dong Yang
Journal:  Int J Clin Exp Med       Date:  2015-09-15

5.  miRNA signatures associate with pathogenesis and progression of osteosarcoma.

Authors:  Kevin B Jones; Zaidoun Salah; Sara Del Mare; Marco Galasso; Eugenio Gaudio; Gerard J Nuovo; Francesca Lovat; Kimberly LeBlanc; Jeff Palatini; R Lor Randall; Stefano Volinia; Gary S Stein; Carlo M Croce; Jane B Lian; Rami I Aqeilan
Journal:  Cancer Res       Date:  2012-02-20       Impact factor: 12.701

Review 6.  Angiogenesis in cancer, vascular, rheumatoid and other disease.

Authors:  J Folkman
Journal:  Nat Med       Date:  1995-01       Impact factor: 53.440

7.  Spatio-temporal analysis of type 2 diabetes mellitus based on differential expression networks.

Authors:  Shao-Yan Sun; Zhi-Ping Liu; Tao Zeng; Yong Wang; Luonan Chen
Journal:  Sci Rep       Date:  2013       Impact factor: 4.379

8.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.

Authors:  Lourdes Peña-Castillo; Murat Tasan; Chad L Myers; Hyunju Lee; Trupti Joshi; Chao Zhang; Yuanfang Guan; Michele Leone; Andrea Pagnani; Wan Kyu Kim; Chase Krumpelman; Weidong Tian; Guillaume Obozinski; Yanjun Qi; Sara Mostafavi; Guan Ning Lin; Gabriel F Berriz; Francis D Gibbons; Gert Lanckriet; Jian Qiu; Charles Grant; Zafer Barutcuoglu; David P Hill; David Warde-Farley; Chris Grouios; Debajyoti Ray; Judith A Blake; Minghua Deng; Michael I Jordan; William S Noble; Quaid Morris; Judith Klein-Seetharaman; Ziv Bar-Joseph; Ting Chen; Fengzhu Sun; Olga G Troyanskaya; Edward M Marcotte; Dong Xu; Timothy R Hughes; Frederick P Roth
Journal:  Genome Biol       Date:  2008-06-27       Impact factor: 13.583

9.  Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment.

Authors:  Tamasin N Doig; David A Hume; Thanasis Theocharidis; John R Goodlad; Christopher D Gregory; Tom C Freeman
Journal:  BMC Genomics       Date:  2013-07-11       Impact factor: 3.969

10.  Incorporating motif analysis into gene co-expression networks reveals novel modular expression pattern and new signaling pathways.

Authors:  Shisong Ma; Smit Shah; Hans J Bohnert; Michael Snyder; Savithramma P Dinesh-Kumar
Journal:  PLoS Genet       Date:  2013-10-03       Impact factor: 5.917

View more
  5 in total

1.  Identification of prognostic biomarkers in colorectal cancer using a long non-coding RNA-mediated competitive endogenous RNA network.

Authors:  Minjie He; Yan Lin; Yuzhen Xu
Journal:  Oncol Lett       Date:  2019-01-15       Impact factor: 2.967

2.  Key Experimental Factors of Machine Learning-Based Identification of Surgery Cancellations.

Authors:  Fengyi Zhang; Xinyuan Cui; Renrong Gong; Chuan Zhang; Zhigao Liao
Journal:  J Healthc Eng       Date:  2021-02-20       Impact factor: 2.682

3.  Homology-based reconstruction of regulatory networks for bacterial and archaeal genomes.

Authors:  Luis Romero; Sebastian Contreras-Riquelme; Manuel Lira; Alberto J M Martin; Ernesto Perez-Rueda
Journal:  Front Microbiol       Date:  2022-07-19       Impact factor: 6.064

4.  Genetic Variant of NFIB is Associated With the Metastasis of Osteosarcoma in Chinese Population.

Authors:  Leilei Xu; Jun Ni; Yongjie Wang; Yang Dong; Shoufeng Wang
Journal:  Technol Cancer Res Treat       Date:  2019-01-01

5.  Resistin enhances angiogenesis in osteosarcoma via the MAPK signaling pathway.

Authors:  Hsiao-Chi Tsai; Shih-Ping Cheng; Chien-Kuo Han; Yuan-Li Huang; Shih-Wei Wang; Jie-Jen Lee; Cheng-Ta Lai; Yi-Chin Fong; Chih-Hsin Tang
Journal:  Aging (Albany NY)       Date:  2019-11-13       Impact factor: 5.682

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.