Literature DB >> 30868060

A risk score model for the prediction of osteosarcoma metastasis.

Siqi Dong1, Hongjun Huo2, Yu Mao3, Xin Li3, Lixin Dong3.   

Abstract

Osteosarcoma is the most common primary solid malignancy of the bone, and its high mortality usually correlates with early metastasis. In this study, we developed a risk score model to help predict metastasis at the time of diagnosis. We downloaded and mined four expression profile datasets associated with osteosarcoma metastasis from the Gene Expression Omnibus. After data normalization, we performed LASSO logistic regression analysis together with 10-fold cross validation using the GSE21257 dataset. A combination of eight genes (RAB1,CLEC3B,FCGBP,RNASE3,MDL1,ALOX5AP,VMO1 and ALPK3) were identified as being associated with osteosarcoma metastasis. These genes were put into a gene risk score model, and the prediction efficiency of the model was then validated using three independent datasets (GSE33383, GSE66673, and GSE49003) by plotting receiver operating characteristic curves. The expression levels of the eight genes in all datasets were shown as heatmaps, and gene ontology gene annotation and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were performed. These eight genes play a role in cancer-related biological processes, such as apoptosis and biosynthetic processes. Our results may aid in elucidating the possible mechanisms of osteosarcoma metastasis, and may help to facilitate the individual management of patients with osteosarcoma after treatment.

Entities:  

Keywords:  metastasis: bioinformatic analysis; osteosarcoma; risk score model

Mesh:

Year:  2019        PMID: 30868060      PMCID: PMC6396159          DOI: 10.1002/2211-5463.12592

Source DB:  PubMed          Journal:  FEBS Open Bio        ISSN: 2211-5463            Impact factor:   2.693


area under receiver operating characteristic curve Gene Expression Omnibus gene ontology Kyoto Encyclopedia of Genes and Genomes least absolute shrinkage and selection operator receiver operating characteristic As the most common primary malignant bone tumor in childhood and adolescence, osteosarcoma exhibits highly aggressive and early systemic metastasis 1, 2. Osteosarcoma systemic metastasis, especially pulmonary metastasis, is still the most prominent reason for osteosarcoma‐caused death as over 90% of patients with osteosarcoma die from pulmonary metastases 3, 4. Despite great advancement in the treatment for osteosarcoma, only 11–30% of patients suffering from osteosarcoma metastasis survive after the combination of surgery resection and chemotherapy 5, 6. Hence, it is of great importance to explore novel biomarkers and therapeutic targets for osteosarcoma metastasis prediction. In recent years, developments in molecular biology have provided new insights into potential diagnostic and therapeutic biomarkers for osteosarcoma. Previous study demonstrated that prometastasis genes such as MYC 7, 8 and RAS 9 facilitate osteosarcoma metastasis and metastasis‐resistant genes including nm23 10, p16 11 and KiSS‐1 metastasis‐suppressor 12 inhibited the metastasis process in osteosarcoma. Furthermore, microarray technology has been widely used for screening a series of metastasis‐related genes in osteosarcoma 13, 14. On the other hand, recent release of gene expression microarray profile data and clinical information in the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas has provided large amounts of microarray data that can be applied to identify a series of highly specific and sensitive markers. Gene expression profiling based on these datasets has been utilized to identify critical genes associated with metastasis 15, 16. For example, differentially expressed pathways related to the metastasis of osteosarcoma were identified by performing bioinformatics analysis based on GEO data 14. A series of osteosarcoma metastasis‐associated genes was also identified by performing weighted gene coexpression network analysis 13. Besides, the gene expression signature has aroused great attention and has been widely constructed to predict the metastasis and prognosis of different cancers. In order to help predict the metastasis at time of diagnosis, we downloaded and mined four gene expression microarray datasets from GEO which were used as a training set or validation set. After normalization, we performed the least absolute shrinkage and selection operator (LASSO) logistic regression model along with 10‐fold cross validation to construct a metastasis prediction score model. Receiver operating characteristic (ROC) curves were plotted to validate the prediction efficiency of the model. Finally, metastasis‐associated genes were put into gene ontology (GO) biological process enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways analysis.

Materials and methods

Gene expression profiles and data pre‐processing

Gene expression datasets were retrieved from GEO using the key words ‘osteosarcoma’ and ‘metastasis’. Four datasets that met the following criteria were downloaded: gene expression data and information about metastasis were described. Four gene expression datasets, namely GSE21257 (total number: 53, metastases: 14), GSE33383 (total number: 53, metastases: 34), GSE66673 (total number: 24, metastases: 12) and GSE49003 (total number: 12, metastases: 6), were retrieved with Affymetrix platforms. The metastasis information and samples used for microarray analysis of these patients were collected at the time of diagnosis. Then background correction and normalization were performed using r software 17 and bioconductor 18. In order to reduce non‐biological variability across arrays, the gene expression profiles in different datasets were quantile normalized separately. The quantile normalization forces the distributions of the samples to be the same on the basis of the quantiles of the samples by replacing each point of a sample with the mean of the corresponding quantile 19, 20. Briefly, these normalization methods firstly arrange the logarithmic transformed microarray data into a G × N matrix X, where G and N are total numbers of genes and arrays, respectively; sort each column of X to give X sort; take the means across the rows of X sort and assign this mean to each element in the row to get X sort; and finally obtain the normalized version X norm of X by rearranging each column of X sort to have the same ordering as in the original X 21. Subsequently, probes were mapped to gene symbols. Empty probes were discarded according to the annotation platform of each expression profile. Average expression values were calculated for duplicated samples and missing values were estimated using weighted K‐nearest neighbors 22.

Construction of metastasis prediction of risk score model

A logistic regression model along with the LASSO method for variable selection and shrinkage was applied to build a metastasis prediction of risk model by using the r package glmnet (https://CRAN.R-project.org/package=glmnet) 23. The penalty regularization parameter λ was determined via the cross‐validation routine cv.glmnet before running the main algorithm with an n‐fold value equal to 10. The λ value was finalized by using lambda.min, which is the value of lambda giving minimum mean cross‐validated error 23, 24, 25. A series of genes combined with the corresponding efficiency were identified from the GSE21257 training set and used to construct a metastasis prediction of risk score model. Based on the model, the risk score for each individual was calculated.

Validation of the risk score model

In order to confirm the robustness and accuracy of the risk score model, the remaining three datasets (GSE33383, GSE66673 and GSE49003) were used as validation sets. The classification effect was comprehensively evaluated in terms of area under the ROC curve (AUC).

Function enrichment analysis

Genes from the risk score model were put into GO biological function and KEGG enrichment analysis to elucidate the biological implications of the genes in the signature. cytoscape software (National Institute of General Medical Sciences, Bethesda, MD, USA) combined with ClueGO and CluePedia Plugins was applied to perform the enrichment analysis.

Results

Data preprocessing and risk score model construction

Based on the expression profile of GSE21257, we used a LASSO logistic regression combined with 10‐fold cross validation to build a classifier to predict metastasis in patients with osteosarcoma (Fig. 1). A combination of eight genes was selected as the best predictor of metastasis in the training cohort: RAB1, CLEC3B, FCGBP, RNASE3, MDL1, ALOX5AP, VMO1 and ALPK3. A risk score formula was derived to calculate a risk score of metastasis for each patient based on the expression level of eight genes: RAB1 × −0.286 + CLEC3B × −0.073 + FCGBP × −0.061 + RNASE3 × −0.548 + MDL1 × −0.139 + ALOX5AP × −0.017 + VMO1 × −0.002 + ALPK3 × 0.092.
Figure 1

Risk score model construction using LASSO logistic regression analysis along with 10‐fold cross validation. (A) Partial likelihood deviance was plotted versus log(Lambda). The vertical dotted line indicates the lambda value with the minimum error and the largest lambda value where the deviance is within one SE of the minimum. (B) LASSO coefficient profiles of the genes associated with the metastasis of osteosarcoma.

Risk score model construction using LASSO logistic regression analysis along with 10‐fold cross validation. (A) Partial likelihood deviance was plotted versus log(Lambda). The vertical dotted line indicates the lambda value with the minimum error and the largest lambda value where the deviance is within one SE of the minimum. (B) LASSO coefficient profiles of the genes associated with the metastasis of osteosarcoma.

Expression profile of key genes in different datasets

The expression level of eight genes from the signature was plotted as a heatmap and shown in Fig. 2. According to the results, the expression levels of RAB1, CLEC3B, FCGBP, RNASE3, MDL1, ALOX5AP and VMO1 were relatively lower in patients with metastatic osteosarcoma than that in non‐metastatic osteosarcoma. On the contrary, patients with metastatic osteosarcoma tended to have a higher expression level of ALPK3 than those with non‐metastatic osteosarcoma. Similar results were observed not only in the training set (GSE21257) but also in the other three datasets (GSE33383, GSE66673 and GSE49003).
Figure 2

The expression level of eight genes in all the four datasets. Heatmaps were plotted to reveal the expression level of eight genes in GSE21257 (A), GSE33383 (B), GSE66673 (C) and GSE49003 (D) datesets.

The expression level of eight genes in all the four datasets. Heatmaps were plotted to reveal the expression level of eight genes in GSE21257 (A), GSE33383 (B), GSE66673 (C) and GSE49003 (D) datesets.

Stability and validity verification

GSE21257, GSE33383, GSE66673 and GSE49003 datasets were all utilized to verify the robustness and transferability of the risk score model generated by the LASSO logistic regression. The ROC curves were plotted to assess the prediction accuracy. According to the results in Fig. 3, the risk score model can distinguish the metastatic individuals from the non‐metastatic individuals with high accuracy (AUC = 0.861, P < 0.01). Moreover, independent cohorts were also collected to act as an external validation cohort. High accuracy was also demonstrated in three independent cohorts, which suggested the stability of the risk score model.
Figure 3

Prediction efficiency of the gene risk score was evaluated using ROC curves. The ROC curves are shown for risk score model in GSE21257 (A), GSE33383 (B), GSE66673 (C) and GSE49003 (D) datesets.

Prediction efficiency of the gene risk score was evaluated using ROC curves. The ROC curves are shown for risk score model in GSE21257 (A), GSE33383 (B), GSE66673 (C) and GSE49003 (D) datesets.

Functional enrichment analysis of genes from the risk score model

In order to identify the biological pathways and processes correlated with the eight genes, GO biological process enrichment and KEGG signaling pathways analysis were performed. According to the results, the eight genes play important roles in cancer‐related biological processes such as cell apoptosis and the leukotriene biosynthetic process (Fig. 4).
Figure 4

Functional enrichment analysis depicting the biological pathways and processes associated with genes in the risk score. The results are shown of GO biological process enrichment (A) and KEGG signaling pathways analysis (B).

Functional enrichment analysis depicting the biological pathways and processes associated with genes in the risk score. The results are shown of GO biological process enrichment (A) and KEGG signaling pathways analysis (B).

Discussion

Metastasis is the main factor that affects the prognosis of osteosarcoma, and several factors such as differential gene expression are involved in this progress. Early diagnosis or prediction of metastasis is rather critical considering there is a great difference in the survival rate between patients with metastatic osteosarcoma (10–20%) and non‐metastatic osteosarcoma (50–78%) 26, 27. Hence, construction of a prediction or early diagnosis model would benefit the treatment and prognosis evaluation. In the present study, we downloaded and mined four gene datasets from GEO and further construct a risk score model. As the gene expression profiles were downloaded from four datasets, the data were firstly normalized. Normalization aims to make the samples of the data more comparable and the following downstream analysis reliable. After normalization, we fitted a logistic regression model and used LASSO for variable selection and shrinkage, which is a well‐established method for selection of the most predictive markers with high throughput data. The LASSO logistic regression model allows integration of multiple biomarkers into one tool providing more accurate prediction of disease progression than single biomarkers alone. The regularization parameter was chosen as the largest value where the error was within 1 standard error of the minimum as determined by 10‐fold cross validation 23, 25, 28. Considering the microarray expression profile used in the present study is of high‐throughput biological data, the common problem, ‘curse‐of‐dimensionality’ (small sample size combined with a very large number of genes) was taken into consideration. On the other hand, LASSO manages high‐dimensional regression variables with no prior feature selection step by shrinking all regression coefficients toward zero and thus forcing many regression variables to be exactly zero 29. Consequently, a series of variables along with the regression coefficients were selected and a formula was constructed to act as a risk score model for the prediction of osteosarcoma metastasis. Therefore, the LASSO model can be applied to solve the ‘curse‐of‐dimensionality’ problem. To further elucidate the underlying mechanism of metastasis in osteosarcoma, genes in the risk score model were put into annotation and function enrichment analysis. These genes were found to be involved in several cancer‐related activities such as cell apoptosis and the leukotriene biosynthetic process. Previous studies have identified that RAB1 plays a role in squamous carcinoma cervical cancer 30. CLEC3B is down‐regulated and inhibits proliferation in clear cell renal cell carcinoma 31. The participation of FCGBP in gastric tumorigenesis and progression was also revealed 32, and FCGBP is validated as a key regulator of the epithelial–mesenchymal transition process that contributed to the metastasis and prognosis of gallbladder cancer 33. Moreover, the expression levels of ALOX5AP are significantly correlated with the survival time of esophageal squamous cell carcinoma patients 34. Whether the influence of these genes may also have an effect in osteosarcoma and contribute to the progression of the osteosarcoma deserves further exploration. Our study here identified some core genes in the metastasis of osteosarcoma and further constructed a risk score model, which may facilitate further exploration of mechanisms. However, there are some limitations to our study. First, the numbers of patients in all the four GEO datasets are relatively small. More patients and clinical information should be collected to further validate the stability of the model. Second, some genes might be excluded because of our rigorous screening criteria. Third, the function annotation analysis of target genes was based on bioinformatics analysis. More experiments will be needed for validation or even correction and to confirm the KEGG pathway analysis and GO enrichment results. In conclusion, we constructed an eight‐gene risk score by performing logistic regression analysis along with 10‐fold cross validation based on datasets downloaded from GEO. The stability and accuracy were further assessed in three independent cohort. Future studies suggested that genes from the risk score participate in several cancer‐related biological processes. This risk score model has provided new insight into the prediction of osteosarcoma metastasis and has potential prognostic and therapeutic implications for osteosarcoma.

Conflict of interest

The authors declare no conflict of interest.

Author contributions

SD contributed to the study design, data profiling and manuscript draft. YM downloaded and analyzed data. HH, XL and LD performed language editing. Final manuscript was reviewed and approved by all the authors reviewed.
  32 in total

1.  Treatment of osteosarcoma at first recurrence after contemporary therapy: the Memorial Sloan-Kettering Cancer Center experience.

Authors:  Alexander J Chou; Pamela R Merola; Leonard H Wexler; Richard G Gorlick; Yatin M Vyas; John H Healey; Michael P LaQuaglia; Andrew G Huvos; Paul A Meyers
Journal:  Cancer       Date:  2005-11-15       Impact factor: 6.860

2.  Osteogenic sarcoma of the extremity with detectable lung metastases at presentation. Results of treatment of 23 patients with chemotherapy followed by simultaneous resection of primary and metastatic lesions.

Authors:  G Bacci; M Mercuri; A Briccoli; S Ferrari; F Bertoni; D Donati; C Monti; A Zanoni; C Forni; M Manfrini
Journal:  Cancer       Date:  1997-01-15       Impact factor: 6.860

3.  Prognostic miRNA classifier in early-stage mycosis fungoides: development and validation in a Danish nationwide study.

Authors:  Lise M Lindahl; Søren Besenbacher; Anne H Rittig; Pamela Celis; Andreas Willerslev-Olsen; Lise M R Gjerdrum; Thorbjørn Krejsgaard; Claus Johansen; Thomas Litman; Anders Woetmann; Niels Odum; Lars Iversen
Journal:  Blood       Date:  2017-12-05       Impact factor: 22.113

4.  KAI1, a metastasis suppressor gene for prostate cancer on human chromosome 11p11.2.

Authors:  J T Dong; P W Lamb; C W Rinker-Schaeffer; J Vukanovic; T Ichikawa; J T Isaacs; J C Barrett
Journal:  Science       Date:  1995-05-12       Impact factor: 47.728

Review 5.  Osteosarcoma (osteogenic sarcoma).

Authors:  Piero Picci
Journal:  Orphanet J Rare Dis       Date:  2007-01-23       Impact factor: 4.123

6.  A 25-gene classifier predicts overall survival in resectable pancreatic cancer.

Authors:  David J Birnbaum; Pascal Finetti; Alexia Lopresti; Marine Gilabert; Flora Poizat; Jean-Luc Raoul; Jean-Robert Delpero; Vincent Moutardier; Daniel Birnbaum; Emilie Mamessier; François Bertucci
Journal:  BMC Med       Date:  2017-09-20       Impact factor: 8.775

7.  Identification of key miRNAs and genes associated with stomach adenocarcinoma from The Cancer Genome Atlas database.

Authors:  Jixi Liu; Fang Liu; Yanfen Shi; Huangying Tan; Lei Zhou
Journal:  FEBS Open Bio       Date:  2018-01-02       Impact factor: 2.693

8.  Genes regulated in metastatic osteosarcoma: evaluation by microarray analysis in four human and two mouse cell line systems.

Authors:  Roman Muff; Ram Mohan Ram Kumar; Sander M Botter; Walter Born; Bruno Fuchs
Journal:  Sarcoma       Date:  2012-11-13

9.  A systematic evaluation of normalization methods in quantitative label-free proteomics.

Authors:  Tommi Välikangas; Tomi Suomi; Laura L Elo
Journal:  Brief Bioinform       Date:  2018-01-01       Impact factor: 11.622

10.  Super enhancer inhibitors suppress MYC driven transcriptional amplification and tumor progression in osteosarcoma.

Authors:  Demeng Chen; Zhiqiang Zhao; Zixin Huang; Du-Chu Chen; Xin-Xing Zhu; Yi-Ze Wang; Ya-Wei Yan; Shaojun Tang; Subha Madhavan; Weiyi Ni; Zhan-Peng Huang; Wen Li; Weidong Ji; Huangxuan Shen; Shuibin Lin; Yi-Zhou Jiang
Journal:  Bone Res       Date:  2018-04-04       Impact factor: 13.567

View more
  15 in total

1.  Prognostic and immunological roles of Fc fragment of IgG binding protein in colorectal cancer.

Authors:  Qunchuan Zhuang; Aling Shen; Liya Liu; Meizhu Wu; Zhiqing Shen; Huixin Liu; Ying Cheng; Xiaoying Lin; Xiangyan Wu; Wei Lin; Jiapeng Li; Yuying Han; Xiaoping Chen; Qi Chen; Jun Peng
Journal:  Oncol Lett       Date:  2021-05-13       Impact factor: 2.967

2.  Prognostic Value of a Stemness Index-Associated Signature in Primary Lower-Grade Glioma.

Authors:  Mingwei Zhang; Xuezhen Wang; Xiaoping Chen; Feibao Guo; Jinsheng Hong
Journal:  Front Genet       Date:  2020-05-05       Impact factor: 4.599

3.  IgG Fc Binding Protein (FCGBP) is Down-Regulated in Metastatic Lesions and Predicts Survival in Metastatic Colorectal Cancer Patients.

Authors:  Ziming Yuan; Zhixun Zhao; Hanqing Hu; Yihao Zhu; Weiyuan Zhang; Qingchao Tang; Rui Huang; Feng Gao; Chaoxia Zou; Guiyu Wang; Xishan Wang
Journal:  Onco Targets Ther       Date:  2021-02-11       Impact factor: 4.147

4.  A bioinformatic analysis: the overexpression and clinical significance of FCGBP in ovarian cancer.

Authors:  Kai Wang; Chenan Guan; Xianwen Shang; Xiang Ying; Shuangshuang Mei; Hanxiao Zhu; Liang Xia; Zeying Chai
Journal:  Aging (Albany NY)       Date:  2021-03-03       Impact factor: 5.682

5.  ALOX5AP Predicts Poor Prognosis by Enhancing M2 Macrophages Polarization and Immunosuppression in Serous Ovarian Cancer Microenvironment.

Authors:  Xiang Ye; Limei An; Xiangxiang Wang; Chenyi Zhang; Wenqian Huang; Chenggong Sun; Rongrong Li; Hanlin Ma; Hongyan Wang; Min Gao
Journal:  Front Oncol       Date:  2021-05-19       Impact factor: 6.244

6.  Machine learning-based CT radiomics features for the prediction of pulmonary metastasis in osteosarcoma.

Authors:  Helcio Mendonça Pereira; Maria Eugenia Leite Duarte; Igor Ribeiro Damasceno; Luiz Afonso de Oliveira Moura Santos; Marcello Henrique Nogueira-Barbosa
Journal:  Br J Radiol       Date:  2021-06-19       Impact factor: 3.629

7.  Comprehensive RNA Sequencing in Adenoma-Cancer Transition Identified Predictive Biomarkers and Therapeutic Targets of Human CRC.

Authors:  Mingzhe Zhu; Yanqi Dang; Zhenhua Yang; Yang Liu; Li Zhang; Yangxian Xu; Wenjun Zhou; Guang Ji
Journal:  Mol Ther Nucleic Acids       Date:  2020-02-04       Impact factor: 8.886

8.  Construction of a Five-Super-Enhancer-Associated-Genes Prognostic Model for Osteosarcoma Patients.

Authors:  Zhanbo Ouyang; Guohua Li; Haihong Zhu; Jiaojiao Wang; Tingting Qi; Qiang Qu; Chao Tu; Jian Qu; Qiong Lu
Journal:  Front Cell Dev Biol       Date:  2020-10-30

9.  Pretreatment Prediction of Relapse Risk in Patients with Osteosarcoma Using Radiomics Nomogram Based on CT: A Retrospective Multicenter Study.

Authors:  Jin Liu; Tao Lian; Haimei Chen; Xiaohong Wang; Xianyue Quan; Yu Deng; Juan Yao; Ming Lu; Qiang Ye; Qianjin Feng; Yinghua Zhao
Journal:  Biomed Res Int       Date:  2021-02-04       Impact factor: 3.411

10.  Prognostic Signature of Osteosarcoma Based on 14 Autophagy-Related Genes.

Authors:  Wei Qi; Qian Yan; Ming Lv; Delei Song; Xianbin Wang; Kangsong Tian
Journal:  Pathol Oncol Res       Date:  2021-07-16       Impact factor: 3.201

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.