Chenggang Yang1, Jing Ren2, Bangling Li2, Chuandi Jin2, Cui Ma1, Cheng Cheng2, Yaolan Sun2, Xiaofeng Shi1. 1. Department of Research and Development, Gu'an Bojian Bio‑Technology Co., Ltd., Langfang, Hebei 065000, P.R. China. 2. Department of Big Data, Beijing Medintell Bioinformatic Technology Co., Ltd., Beijing 100081, P.R. China.
Abstract
Postmenopausal osteoporosis (PMOP) is a major public health concern worldwide. The present study aimed to provide evidence to assist in the development of specific novel biomarkers for PMOP. Differentially expressed genes (DEGs) were identified between PMOP and normal controls by integrated microarray analyses of the Gene Expression Omnibus (GEO) database, and the optimal diagnostic gene biomarkers for PMOP were identified with LASSO and Boruta algorithms. Classification models, including support vector machine (SVM), decision tree and random forests models, were established to test the diagnostic value of identified gene biomarkers for PMOP. Functional annotations and protein‑protein interaction (PPI) network constructions were also conducted. Integrated microarray analyses (GSE56815, GSE13850 and GSE7429) of the GEO database were employed, and 1,320 DEGs were identified between PMOP and normal controls. An 11‑gene combination was also identified as an optimal biomarker for PMOP by feature selection and classification methods using SVM, decision tree and random forest models. This combination was comprised of the following genes: Dehydrogenase E1 and transketolase domain containing 1 (DHTKD1), osteoclast stimulating factor 1 (OSTF1), G protein‑coupled receptor 116 (GPR116), BCL2 interacting killer, adrenoceptor β1 (ADRB1), neogenin 1 (NEO1), RB binding protein 4 (RBBP4), GPR87, cylicin 2, EF‑hand calcium binding domain 1 and DEAH‑box helicase 35. RBBP4 (degree=12) was revealed to be the hub gene of this PMOP‑specific PPI network. Among these 11 genes, three genes (OSTF1, ADRB1 and NEO1) were speculated to serve roles in PMOP by regulating the balance between bone formation and bone resorption, while two genes (GPR87 and GPR116) may be involved in PMOP by regulating the nuclear factor‑κB signaling pathway. Furthermore, DHTKD1 and RBBP4 may be involved in PMOP by regulating mitochondrial dysfunction and interacting with ESR1, respectively. In conclusion, the findings of the current study provided an insight for exploring the mechanism and developing novel biomarkers for PMOP. Further studies are required to test the diagnostic value for PMOP prior to use in a clinical setting.
Postmenopausal osteoporosis (PMOP) is a major public health concern worldwide. The present study aimed to provide evidence to assist in the development of specific novel biomarkers for PMOP. Differentially expressed genes (DEGs) were identified between PMOP and normal controls by integrated microarray analyses of the Gene Expression Omnibus (GEO) database, and the optimal diagnostic gene biomarkers for PMOP were identified with LASSO and Boruta algorithms. Classification models, including support vector machine (SVM), decision tree and random forests models, were established to test the diagnostic value of identified gene biomarkers for PMOP. Functional annotations and protein‑protein interaction (PPI) network constructions were also conducted. Integrated microarray analyses (GSE56815, GSE13850 and GSE7429) of the GEO database were employed, and 1,320 DEGs were identified between PMOP and normal controls. An 11‑gene combination was also identified as an optimal biomarker for PMOP by feature selection and classification methods using SVM, decision tree and random forest models. This combination was comprised of the following genes: Dehydrogenase E1 and transketolase domain containing 1 (DHTKD1), osteoclast stimulating factor 1 (OSTF1), G protein‑coupled receptor 116 (GPR116), BCL2 interacting killer, adrenoceptor β1 (ADRB1), neogenin 1 (NEO1), RB binding protein 4 (RBBP4), GPR87, cylicin 2, EF‑hand calcium binding domain 1 and DEAH‑box helicase 35. RBBP4 (degree=12) was revealed to be the hub gene of this PMOP‑specific PPI network. Among these 11 genes, three genes (OSTF1, ADRB1 and NEO1) were speculated to serve roles in PMOP by regulating the balance between bone formation and bone resorption, while two genes (GPR87 and GPR116) may be involved in PMOP by regulating the nuclear factor‑κB signaling pathway. Furthermore, DHTKD1 and RBBP4 may be involved in PMOP by regulating mitochondrial dysfunction and interacting with ESR1, respectively. In conclusion, the findings of the current study provided an insight for exploring the mechanism and developing novel biomarkers for PMOP. Further studies are required to test the diagnostic value for PMOP prior to use in a clinical setting.
Osteoporosis is a bone metabolic disorder, characterized by low bone mineral density (BMD) and microarchitectural deterioration with increased bone fragility and subsequent susceptibility to fractures (1). It has been reported that osteoporosis is induced by an imbalance between bone resorption by osteoclasts and bone deposition by osteoblasts (2). Postmenopausal osteoporosis (PMOP) is a major public health concern worldwide that frequently presents in postmenopausal women due to the estrogen deficiency and continuous calcium loss that occurs with aging (3). A proactive approach that identifies patients at high risk of developing PMOP is recommended to prevent bone loss (4).With the advancement of high-throughput technologies, gene microarray analysis has become an effective method for identifying differentially expressed genes (DEGs) and, therefore, potential biomarkers in various diseases. Multiple studies (5–7) have utilized gene microarray analysis to identify key genes in the pathogenesis of PMOP. Integrated multiple gene microarray analysis may contribute to the identification of more accurate gene biomarkers.The present study aimed to develop accurate biomarkers and provide clues for exploring the underlying mechanism of PMOP. By integrating multiple microarray analysis in this present study, DEGs between PMOP patients and normal controls were identified. Based on these DEGs, the optimal gene combination with the greatest diagnostic value for PMOP was determined. Functional annotation and protein-protein interaction (PPI) network constructions were also performed to explore the biological functions of DEGs. These findings will help elucidate the mechanism underlying PMOP development and uncover novel diagnostic biomarkers.
Materials and methods
Microarray expression profiling
Microarray datasets of PMOP and normal controls were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo). All datasets that contained whole-genome mRNA expression profiles between PMOP patient and control blood samples were enrolled in the current study. The datasets were scale normalized.
Identification of DEGs between PMOP patients and normal controls
MetaMA is an R package-implementing meta-analysis approach for microarray data (8). Data from multiple microarray analyses were combined by metaMA (inverse normal method), and individual P-values were obtained. In the integrated analysis performed in the present study, DEGs between PMOP patients and normal controls were identified at a P-value of <0.05. Hierarchical clustering analyses of mRNAs were conducted with ‘pheatmap’ package in R (version 3.3.3; www.r-project.org).
Identification of optimal diagnostic gene biomarkers for PMOP
The LASSO algorithm was applied with the glmnet package (https://cran.r-project.org/web/packages/glmnet/) in order to reduce the dimensions of the data (9). The Boruta algorithm (https://cran.r-project.org/web/packages/Boruta/) employs a wrapper approach, built around a random forest classifier (10). This algorithm is used to compare the relevance of the features to those of the random probes (11). The scale-standardized datasets were merged, the DEGs between PMOP patients and normal controls were retained for feature selection, and gene biomarkers for PMOP were identified with the LASSO and Boruta algorithms. Furthermore, the optimal gene biomarkers for PMOP were identified by overlapping biomarkers derived from these two algorithms. Hierarchical clustering analysis of these shared gene biomarkers, obtained by LASSO and Boruta algorithms, was conducted with the R package ‘pheatmap’ (R version 3.3.3).Based on these optimal gene biomarkers, several classification models, including support vector machine (SVM), decision tree and random forest models, were established to further identify the diagnostic value of these biomarkers in PMOP. An SVM model was established with an ‘e1071’ package (https://cran.r-project.org/web/packages/e1071/index.html). A decision tree model was established with the ‘rpart’ package (https://cran.r-project.org/web/packages/rpart/). A random forest model was established with the ‘randomForests’ package (https://cran.r-project.org/web/packages/randomForest/). These three classification models were compared by the average misjudgment rates of their 10-fold cross validations. The diagnostic ability of the three classification models was assessed by calculating the receiver operating characteristic curve, and measuring the area under the curve (AUC), accuracy, sensitivity and specificity.
PMOP-specific PPI network
To further investigate the biological functions of these optimal gene biomarkers, a PPI network was constructed with the BioGRID (also known as Biological General Repository for Interaction Datasets; http://thebiogrid.org/) and Cytoscape (http://www.cytoscape.org). Nodes and edges in the PPI network represented the proteins and the interactions between two proteins, respectively.
Functional annotation
Based on the PMOP-specific PPI network, the proteins that integrated with proteins encoded by the optimal gene biomarkers were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of all gene biomarkers and other DEGs that encode proteins within the PMOP-specific PPI network were conducted with the online software GeneCodis (http://genecodis.cnb.csic.es/analysis). Differences (using the Benjamini and Hochberg method) were defined as statistically significant when the false discovery rate (FDR) was <0.05.
Results
DEGs between PMOP patients and normal controls
Three datasets, including GSE56815, GSE13850 and GSE7429, were downloaded from the GEO database [(Table I) (12)]. Based on these three datasets, 1,320 DEGs (710 upregulated DEGs and 613 downregulated DEGs) with FDR<0.05 were identified between PMOP patients and normal controls. Hierarchical clustering analysis of the top 100 DEGs between PMOP patients and normal controls is presented in Fig. 1.
Table I.
Datasets used in the present study.
GEO ID
Sample
Country
Year
First author
PMOP to control ratio
GSE56815
Blood
USA
2016
Liu
20:20
GSE13850
Blood
USA
2009
Xiao
20:20
GSE7429
Blood
USA
2008
Xiao
10:10
The platform used for all datasets was GPL96 [HG-U133A] Affymetrix Human Genome U133A Array. GEO, Gene Expression Omnibus. PMOP, postmenopausal osteoporosis.
Figure 1.
Hierarchical clustering analysis of DEGs between PMOP patients and normal controls. (A) Analysis of top 100 DEGs between PMOP patients and normal controls. (B) Analysis of 11 shared gene biomarkers for PMOP obtained by both the LASSO and Boruta algorithms. Rows and columns represent DEGs and samples, respectively. The color scale represents the expression levels. DEGs, differentially expressed genes; PMOP, postmenopausal osteoporosis. BMD, bone mineral density; DEGs, ESR1, estrogen receptor 1; GEO, Gene Expression Omnibus; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; NE, norepinephrine; PPI, protein-protein interaction; RANKL, receptor activator of NF-κB ligand; SVM, support vector machine.
A total of 31 and 32 gene biomarkers were identified with the LASSO and Boruta algorithms, respectively. Furthermore, 11 shared mRNA biomarkers for PMOP were identified by overlapping the biomarkers derived from these two algorithms (Table II and Fig. 2). These 11 mRNA biomarkers included dehydrogenase E1 and transketolase domain containing 1 (DHTKD1), osteoclast stimulating factor 1 (OSTF1), G protein-coupled receptor 87 (GPR87), GPR116 (also known as adhesion G protein-coupled receptor F5), BCL2 interacting killer (BIK), adrenoceptor β1 (ADRB1), neogenin 1 (NEO1), RB binding protein 4 (RBBP4), cylicin 2 (CYLC2), EF-hand calcium binding domain 1 (EFCAB1) and DEAH-box helicase 35 (DHX35). Hierarchical clustering analysis of these 11 mRNA biomarkers was performed.
Table II.
A total of 11 shared gene biomarkers for postmenopausal osteoporosis, obtained using the LASSO and Boruta algorithms.
Expression levels of 11 gene biomarkers in the blood samples of postmenopausal osteoporosis patients and normal controls. Gene expression levels of (A) EFCAB1, (B) DHTKD1, (C) OSTF1, (D) GPR116, (E) BIK, (F) ADRB1, (G) NEO1, (H) RBBP4, (I) GPR87, (J) CYLC2 and (K) DHX35 are shown. ***P<0.001 vs. the normal control group. EFCAB1, EF-hand calcium binding domain 1; DHTKD1, dehydrogenase E1 and transketolase domain containing 1; OSTF1, osteoclast stimulating factor 1; GPR, G protein-coupled receptor; BIK, BCL2 interacting killer; ADRB1, adrenoceptor β1; NEO1, neogenin 1; RBBP4, RB binding protein 4; CYLC2, cylicin 2; DHX35, DEAH-box helicase 35.
The SVM, decision tree and random forest models were established with these 11 mRNA biomarkers, and the accuracy of these three models was 93, 78 and 94%, respectively. The AUC of SVM, decision tree and random forest models was 0.975, 0.799 and 0.975, respectively. In addition, the sensitivity and specificity of the SVM model were 92 and 100%, respectively (Fig. 3A). The sensitivity and specificity of the decision tree model were 70 and 88%, respectively (Fig. 3B). Finally, the sensitivity and specificity of the random forest model were 90 and 100%, respectively (Fig. 3C).
Figure 3.
Receiver operating characteristic curves of the combination of 11 gene biomarkers between patients with postmenopausal osteoporosis and normal controls, based on three classification models. The results of the (A) Support vector machine, (B) decision tree and (C) random forest models are displayed. The X-axis represents the 1-specificity and Y-axis represents the sensitivity. AUC, area under the curve.
The PMOP-specific PPI network consisted of 24 nodes and 20 edges (Fig. 4). Out of the 11 genes not all interacted with other differentially expressed genes. Only the gene that interacted with other differentially expressed genes (even though one gene) in the PPI network were presented. Nodes and edges represented the proteins and the interactions between two proteins, respectively. The red and blue ellipses represented the proteins encoded by up- and downregulated DEGs between PMOP patients and normal controls, respectively. RBBP4 (degree=12) was the hub gene of this PMOP-specific PPI network.
Figure 4.
PMOP-specific protein-protein interaction network, comprised of 24 nodes and 20 edges. Nodes and edges represented proteins and interactions between two proteins, respectively. Red and blue ellipses represented the proteins encoded by up- and downregulated DEGs, respectively. Ellipses with black borders were DEGs derived from the combination of 11 gene biomarkers. DEGs, differentially expressed genes; PMOP, postmenopausal osteoporosis.
Based on the functional annotations of these 11 biomarkers and DEGs that encode proteins of the PMOP-specific PPI network, prostate epithelial cord elongation (FDR<0.05), estrogen response element binding (FDR<0.05) and nucleosome remodeling deacetylase complex (FDR<0.05) were the most significant GO terms. Furthermore, endocrine and other factor-regulated calcium reabsorption (FDR=2.23×10−5), was a significantly enriched pathway in PMOP; Estrogen receptor 1 (ESR1) was revealed to be upregulated within this pathway (Table III).
Table III.
Top 10 significantly GO terms and KEGG pathways in postmenopausal osteoporosis.
A, GO terms
ID
Term
FDR
Genes
Biological process
GO:0060523
Prostate epithelial cord elongation
<0.05
ESR1
GO:0031649
Heat generation
<0.05
ADRB1
GO:0045986
Negative regulation of smooth muscle contraction
<0.05
ADRB1
GO:0002025
Vasodilation by norepinephrine-epinephrine involved in regulation of systemic arterial blood pressure
<0.05
ADRB1
GO:0031077
Post-embryonic camera-type eye development
<0.05
BCL11B
GO:0048386
Positive regulation of retinoic acid receptor signaling pathway
<0.05
ESR1
GO:0003382
Epithelial cell morphogenesis
<0.05
BCL11B
GO:0051124
Synaptic growth at neuromuscular junction
<0.05
APP
GO:0060750
Epithelial cell proliferation involved in mammary gland duct elongation
Endocrine and other factor-regulated calcium reabsorption
<0.05
ESR1
KEGG:00310
Lysine degradation
<0.05
SUV39H1
KEGG:05014
Amyotrophic lateral sclerosis (ALS)
<0.05
BCL2L1
KEGG:05210
Colorectal cancer
<0.05
APPL1
KEGG:03018
RNA degradation
<0.05
PAN2
KEGG:03320
PPAR signaling pathway
<0.05
FABP4
KEGG:05211
Renal cell carcinoma
<0.05
VHL
KEGG:05212
Pancreatic cancer
<0.05
BCL2L1
GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.
Discussion
PMOP increases the risk of fragility fractures in postmenopausal women, and imposes a significant burden on patients' families and society. Previous studies have indicated that identification of patients at high risk of developing PMOP can contribute to the prevention of bone loss (4,13).In the present study, 1,320 DEGs between PMOP patients and normal controls were identified with integrated microarray analysis. The gene biomarkers for PMOP were further identified with the LASSO and Boruta algorithms. An 11-gene combination (EFCAB1, DHTKD1, OSTF1, GPR116, BIK, ADRB1, NEO1, RBBP4, GPR87, CYLC2 and DHX35) was revealed as an optimal biomarker for PMOP with feature selection and classification procedures using SVM, decision tree and random forest models. Based on the random forest model, the 11-gene combination achieved a 94% prediction accuracy in distinguishing patients with PMOP from normal controls, with 90% sensitivity and 100% specificity. The results obtained using the other two models (SVM and decision tree) further supported this finding.Among these 11 genes, CYLC2 has previously been shown to be upregulated in B cells from postmenopausal women with low BMD compared with that in postmenopausal women with high BMD (14). In addition, CYLC2 is involved in the structural component of the cytoskeleton. A previous study indicated that the structural component of the cytoskeleton is associated with PMOP, which may suggest a potential role of CYLC2 in PMOP (14).Three DEGs identified in the present study, namely OSTF1, ADRB1 and NEO1, have previously been reported to be associated with the balance between bone formation and bone resorption (15–19). OSTF1 is an intracellular protein that is highly expressed in osteoclasts, which indirectly enhances osteoclast formation and bone resorption (15). ADRB1 belongs to the family of guanine nucleotide-binding regulatory protein-coupled receptors, which regulate the physiological effects of the hormone epinephrine and the neurotransmitter norepinephrine (NE) (16). The β-adrenergic system is also involved in leptin-dependent central regulation of bone turnover (17,18). Intraosseous sympathetic nerve fibers can be activated and release NE via leptin stimulation (17). Adrenergic receptors expressed on osteoblasts bind to the released NE and result in suppression of bone formation. In addition, β-adrenergic-stimulated production of the receptor activator of nuclear factor (NF)-κB ligand by osteoblasts may contribute to a negative bone mineral balance (19). In the present study, OSTF1 and ADRB1 were upregulated in the blood of patients with PMOP compared with that of normal controls. Furthermore, NEO1 encodes a cell surface protein that belongs to the immunoglobulin superfamily, and has been speculated to serve roles in cell growth and differentiation and in cell-cell adhesion. A previous study reported abnormal chondrocyte maturation and endochondral bone growth in NEO1 knockout mice (20). Additionally, the association between NEO1 and bone mass was identified by high-throughput screening of mouse gene knockouts (21). To the best of our knowledge, the present study is the first to reveal a downregulation of NEO1 in the blood of patients with PMOP.Estrogen deficiency is a pivotal cause of postmenopausal bone loss (22). RBBP4 is an estrogen-associated gene, which was included in the 11-gene combination described in the present study. It is also a chromatin remodeling factor that encodes a ubiquitously expressed nuclear protein that belongs to a highly conserved subfamily of WD-repeat proteins (23). RBBP4 has been reported to be involved in the chromatin remodeling and transcriptional repression associated with histone deacetylation (24). An upregulation in the expression of RBBP4 was detected in the tibia callus of estrogen-deficient rats (25). To the best of our knowledge, the present study is the first to reveal an upregulation of RBBP4 in the blood of patients with PMOP. Furthermore, as the hub protein of the PMOP-specific PPI network, RBBP4 was integrated with ESR1, a well-known PMOP-associated gene, which was revealed to be enriched in the endocrine and other factor-regulated calcium reabsorption pathway (KEGG ID: 04961). ESR1 is expressed on cells that contribute to bone formation, such as osteoblasts, osteocytes and osteoclasts. It also increases the formation and function of osteoblasts and reduces bone resorption activities (26). Therefore, the RBBP4-ESR1 interaction may serve a key role in PMOP.To the best of our knowledge, no previous study has reported the association between PMOP and the six other genes described in the current study, including DHTKD1, GPR87, GPR116, BIK, EFCAB1 and DHX35. DHTKD1 is a nuclear gene that is involved in mitochondrial lysine metabolism and adenosine triphosphate production (27,28). DHTKD1 has also been demonstrated to link mitochondrial dysfunction and eosinophilic esophagitis (29). Kim and Lee (30) indicated that mitochondrial dysfunction may be a potential pathophysiological mechanism of PMOP, which suggested that DHTKD1 may regulate mitochondrial dysfunction in PMOP. In addition, GPR87 is a cell surface G protein-coupled receptor that has been reported to be overexpressed in various types of cancer (31,32), and it serves a critical oncogenic role in pancreatic cancer progression by activating the NF-κB signaling pathway (33). GPR116 is a member of the G protein-coupled receptor family predominantly expressed in the alveolar type II epithelial cells of the lung. Since NF-κB signaling pathway is an important mediator in osteoblast differentiation, it can be speculated that both GPR87 and GPR116 may serve a role in PMOP by regulating the NF-κB signaling pathway. Another identified gene, BIK, is a member of the BH3-only Bcl-2 family of pro-apoptotic proteins, which is suppressed in various types of cancer (34). Methylated BIK was identified in the bone marrow of patients with multiple myeloma, and dysregulated BIK expression was observed in hematopoietic cell fractions of patients with myelodysplastic syndrome, highlighting the importance of BIK in bone disease (34,35). Furthermore, the gene DHX35 is a putative RNA helicase, and its variants have been reported to be involved with facial morphology, thyroid cancer and colorectal cancer (36–38) Finally, DNA methylation of EFCAB1 was demonstrated to be involved in multi-organ carcinogenesis (39). However, further research is required to explore the roles of DHX35 and EFCAB1 in PMOP.In conclusion, the present study identified 11 genes that were significantly associated with PMOP and provided clues for exploring the molecular mechanism of PMOP. Three of the identified genes (OSTF1, ADRB1 and NEO1) were speculated to be involved in PMOP by regulating the balance between bone formation and bone resorption, while two genes (GPR87 and GPR116) may regulate the NF-κB signaling pathway. RBBP4 and DHTKD1 may also be potential regulators of PMOP via interacting with ESR1 and regulating mitochondrial dysfunction, respectively. Furthermore, the constituents of this 11-gene combination may serve as potential biomarkers for PMOP. However, biological investigations and validation with a larger sample size are lacking, and are considered to be limitations of the present study. Further investigations are required to validate the diagnostic abilities of this gene combination for PMOP prior to its clinical application.
Authors: Divya Sharma; Adriana I Larriera; Paolo E Palacio-Mancheno; Vittorio Gatti; J Christopher Fritton; Timothy G Bromage; Luis Cardoso; Stephen B Doty; Susannah P Fritton Journal: Bone Date: 2018-01-31 Impact factor: 4.398
Authors: John A Kanis; Eugene V McCloskey; Helena Johansson; Anders Oden; L Joseph Melton; Nikolai Khaltaev Journal: Bone Date: 2007-11-17 Impact factor: 4.398
Authors: Eloi Franco-Trepat; María Guillán-Fresco; Ana Alonso-Pérez; Alberto Jorge-Mora; Vera Francisco; Oreste Gualillo; Rodolfo Gómez Journal: J Clin Med Date: 2019-08-07 Impact factor: 4.241
Authors: Nkiruka C Atuegwu; Cheryl Oncken; Reinhard C Laubenbacher; Mario F Perez; Eric M Mortensen Journal: Int J Environ Res Public Health Date: 2020-10-05 Impact factor: 3.390