Guo-Feng Qian1, Lu-Shun Yuan2, Min Chen1, Dan Ye1, Guo-Ping Chen1, Zhe Zhang1, Cheng-Jiang Li1, Vijith Vijayan3, Yu Xiao2. 1. Department of Endocrinology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang 310003, P.R. China. 2. Department of Urology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei 430071, P.R. China. 3. Institute for Transfusion Medicine, Hannover Medical School, D‑30625 Hannover, Germany.
Abstract
Postmenopausal osteoporosis (PMO) is the most common type of primary osteoporosis (OP), a systemic skeletal disease. Although many factors have been revealed to contribute to the occurrence of PMO, specific biomarkers for the early diagnosis and therapy of PMO are not available. In the present study, a weighted gene co‑expression network analysis (WGCNA) was performed to screen gene modules associated with menopausal status. The turquoise module was verified as the clinically significant module, and 12 genes (NUP133, PSMD12, PPWD1, RBM8A, CRNKL1, PPP2R5C, RBM22, PIK3CB, SKIV2L2, PAPOLA, SRSF1 and COPS2) were identified as 'real' hub genes in both the protein‑protein interaction (PPI) network and co‑expression network. Furthermore, gene expression analysis by microarray in blood monocytes from pre‑ and post‑menopausal women revealed an increase in the expression of these hub genes in postmenopausal women. However, only the expression of peptidylprolyl isomerase domain and WD repeat containing 1 (PPWD1) was correlated with bone mineral density (BMD) in postmenopausal women. In the validation set, a similar expression pattern of PPWD1 was revealed. Functional enrichment analysis revealed that the fatty acid metabolism pathway was significantly abundant in the samples that exhibited a higher expression of PPWD1. Collectively, PPWD1 is indicated as a potential diagnostic biomarker for the occurrence of PMO.
Postmenopausal osteoporosis (PMO) is the most common type of primary osteoporosis (OP), a systemic skeletal disease. Although many factors have been revealed to contribute to the occurrence of PMO, specific biomarkers for the early diagnosis and therapy of PMO are not available. In the present study, a weighted gene co‑expression network analysis (WGCNA) was performed to screen gene modules associated with menopausal status. The turquoise module was verified as the clinically significant module, and 12 genes (NUP133, PSMD12, PPWD1, RBM8A, CRNKL1, PPP2R5C, RBM22, PIK3CB, SKIV2L2, PAPOLA, SRSF1 and COPS2) were identified as 'real' hub genes in both the protein‑protein interaction (PPI) network and co‑expression network. Furthermore, gene expression analysis by microarray in blood monocytes from pre‑ and post‑menopausal women revealed an increase in the expression of these hub genes in postmenopausal women. However, only the expression of peptidylprolyl isomerase domain and WD repeat containing 1 (PPWD1) was correlated with bone mineral density (BMD) in postmenopausal women. In the validation set, a similar expression pattern of PPWD1 was revealed. Functional enrichment analysis revealed that the fatty acid metabolism pathway was significantly abundant in the samples that exhibited a higher expression of PPWD1. Collectively, PPWD1 is indicated as a potential diagnostic biomarker for the occurrence of PMO.
Osteoporosis (OP), a systemic skeletal disease characterized by low bone mass and micro-architectural deterioration of bone tissue, results in an increased vulnerability to bone fracture (1). Based on data from the National Health and Nutrition Examination Survey III, the National Osteoporosis Foundation estimates that more than 9.9 million cases have OP in the United States and 1.5 million osteoporotic fractures occur annually. In China, 19.2% of people >50 years of age were assessed to have OP in 2018, and ~2.69 million osteoporotic fractures occurred in 2015 (2). Notably, OP-induced fractures cause high economic expenditures with an estimated cost of 20.4 billion USD in the United States in 2015 (3) and ~10.87 billion USD in China in 2015 (2). This indicates that OP is developing into a major concern affecting worldwide public health. Postmenopausal osteoporosis (PMO), caused by oestrogen deficiency, is the most common type of primary OP. Oestrogen deficiency causes an imbalance in bone formation and resorption, finally leading to bone loss and OP. Although many factors, such as OPG/RANKL/RANK system (4), oxidative stress (5,6), and altered oestrogen signaling (4), have been revealed to contribute to PMO, specific biomarkers for the early diagnosis and therapy of PMO are not available. Therefore, diagnostic biomarkers predicting the occurrence of PMO are warranted.The present literature suggests that screening for differentially expressed genes (DEGs) has been the focus of PMO research in identifying biomarkers. Of note, few studies have explored the relevance of genes that share a high degree of functional interconnection and are regulated in a similar fashion. Weighted gene co-expression network analysis (WGCNA), a systems biology method, is particularly useful in this context and helps to create free-scale gene co-expression networks to identify the association between different gene sets or between gene sets and clinical features (7). Notably, WGCNA has been broadly used to identify the hub genes linked with clinical features in different diseases, such as clear cell renal cell carcinoma (8–11), bladder cancer (12,13), breast cancer (14), and neuropathic pain (15).In the present study, WGCNA along with different approaches were used to analyze the microarray data of blood monocytes in pre- and postmenopausal women with a low or high bone mineral density (BMD) to characterize the key genes associated with PMO. The present results revealed that peptidylprolyl isomerase domain and WD repeat containing 1 (PPWD1) may be a potential key biomarker for predicting the occurrence of PMO.
Materials and methods
Data collection
mRNA expression profiles obtained from circulating monocytes of pre- and postmenopausal subjects with a low or high BMD were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). In the present study, the GSE56815 dataset performed on Affymetrix Human Genome U133A Array was used as a training set to build co-expression networks and find hub genes. This dataset consisted of 80 Caucasian females, of which 40 subjects had a high hip BMD (20 pre- and 20 postmenopausal) and 40 subjects had a low hip BMD (20 pre- and 20 postmenopausal). Separately, the GSE2208 dataset based on the same microarray platform was downloaded from the GEO and used as a test set to validate our results. This dataset included 19 female subjects, 10 with a high BMD and 9 with a low BMD.
Data preprocessing
Normalized data were obtained from the GEO database. The quality of the microarray was evaluated by sample clustering using the distance between different samples in Pearson's correlation matrices. To perform further analysis, genes with an average expression >7 were selected as the cut-off criteria.
Co-expression network construction
At first, DEGs were assessed for their expression data profile to distinguish between good and bad samples, or good and bad genes, after which the ‘WGCNA’ package in R was applied to build a scale-free co-expression network for the selected genes (16). For the pair-wise genes, Pearson's correlation matrices were initially performed, followed by a weighted adjacency matrix constructed using a power function amn=|cmn|β (where cmn=Pearson's correlation between gene m and gene n; and amn=adjacency between gene m and gene n). Upon assigning the proper β, the weighted network was transformed into a topological overlap matrix (TOM) to calculate the network connectivity of a gene as previously described (8,13). The TOM-based dissimilarity measure was then used to conduct average linkage hierarchical clustering, and genes with similar expression profiles were classified into gene modules. The hierarchical clustering was performed on a minimum size (50 genes) to generate a gene dendrogram. The module was further analyzed by calculating the dissimilarity of module eigengenes (MEs). A cut line was selected for the module dendrogram, and certain modules were merged.
Identification of phenotype-significant modules
Module- related phenotypes were identified by two approaches. Gene significance (GS), defined as the log10 transformation of the P-value (GS=lgP) in the linear regression between gene expression and the clinical traits (BMD and menopausal status) and module significance (MS), defined as the average GS for all the genes involved in the module, were used. The module related to the phenotype was selected based on the absolute MS ranking (first or second) among the selected modules. In the principal component analysis, MEs were considered as the major component to summarize the expression pattern of all genes in a given module into a single characteristic expression profile. Additionally, the correlation between the MEs and phenotypes was calculated to validate the relevance of the module. The module related to the clinical trait was selected based on their maximal absolute MS. Finally, the module with the highest correlation to a certain phenotype was used for subsequent analysis.
Identification of hub genes and validation
Hub genes are genes that display a considerable interconnection with other genes in a module and have been previously revealed to be functionally significant. In the co-expression network, hub genes were identified in a module that was correlated with certain phenotypes. Next, all genes in the hub module were uploaded to the Search Tool for the Retrieval of Interacting Genes database (https://string-db.org/), with a confidence >0.4 to create a protein-protein interaction (PPI) network. In the PPI network, genes displaying a connectivity degree of ≥5 (node/edge) were identified as hub genes. The hub genes that were common in both the co-expression network and PPI network were regarded as ‘real’ hub genes and selected for subsequent analysis. The validation was performed using a training set and test set. In the test set of the GSE2208 dataset, the comparison of real hub genes with a low BMD and a high BMD was performed.
Functional and pathway enrichment analysis
The DAVID database (http://david.abcc.ncifcrf.gov/), an online program that provides a comprehensive set of functional annotation tools to understand the biological relevance behind the obtained large dataset of genes, was used for analysis of the DEGs in the hub module [particularly Gene Ontology (GO) terms and their visualization on Kyoto Encyclopedia of Genomes and Genes (KEGG) pathway maps]. The top 20 terms involving the candidate hub genes were selected as the biological processes and pathways of interest. P<0.05 was set as the cut-off criterion.
Gene set enrichment analysis (GSEA)
To further analyze the potential function, the training set was divided into several groups according to three labels: BMD (high BMD vs. low BMD), menopausal status (postmenopausal vs. premenopausal) and expression level of PPWD1 according to the median expression of PPWD1 (a high expression of PPWD1 vs. a low expression of PPWD1). For use with GSEA software, the collection of annotated gene sets of h.all.v6.1.symbol.gmat (Hallmarks) in the Molecular Signatures Database (MSigDB, http://software.broadinstitute.org/gsea/msigdb/index.jsp) was selected as the reference gene set. The gene sets enriched in the aforementioned 3 groups were selected, and a P-value <0.05 was selected as the cut-off criteria.
Results
DEG screening
After preprocessing and quality assessment of the data, the expression matrices were obtained from which 5,570 genes with a log2 expression level >7 were selected for subsequent analysis. At first, a sample cluster with phenotypes using average linkage and Pearson's correlation was performed (Fig. 1). No samples were removed.
Figure 1.
Sample clustering to detect outliers (GSE56815). (A) Cluster dendrogram. (B) Sample dendrogram and trait indicator. The color intensity was proportional to menopausal status and BMD level. BMD, bone mineral density.
Weighted co-expression network construction and key module identification
The ‘WGCNA’ package in R was used for the average linkage hierarchical clustering of DEGs with similar expression patterns into modules. β, a soft-thresholding parameter, emphasizes strong gene correlations and penalizes weak correlations. Herein, the power of β=8 (scale free R2=0.88) was selected to ensure a scale-free network (Fig. 2). In total, 8 modules were identified (Fig. 3A). The relevance between the phenotype (menopausal status) and module was assessed by two methods. Initially, modules with higher MS were considered to be more related to the menopausal status, and accordingly, the turquoise module MS (P=5×10−4, R2=0.38) was revealed to be higher in comparison to the MS of other modules (Fig. 3B). Similarly, the ME of the turquoise module exhibited a higher correlation with menopausal status than the other modules (Fig. 3C). Based on these findings, the turquoise module with menopausal status was identified as the clinically significant module and extracted for subsequent analysis.
Figure 2.
Determination of soft-thresholding power in the WGCNA. (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. (C) Histogram of the connectivity distribution when β=8. (D) Verification of the scale-free topology when β=8. WGCNA, weighted gene co-expression network analysis.
Figure 3.
Identification of modules associated with phenotypes. (A) Dendrogram of all differentially expressed genes clustered, based on a dissimilarity measure (1-TOM). (B) Heatmap of the correlation between module eigengenes and different phenotypes (menopausal status and BMD). (C) Distribution of average gene significance and errors in the modules associated with menopausal status. BMD, bone mineral density.
Identification of hub genes for menopausal status in the turquoise module
Next, the PPI network of the turquoise module, under the cut-off of confidence >0.4 and connectivity degree of ≥39, identified 157 genes as hub genes. Further analysis with more stringent parameters, such as the module connectivity measured by the absolute value of the Pearson's correlation coefficient (cor.geneModuleMembership >0.8) and the clinical trait relationship measured by the absolute value of the Pearson's correlation coefficient (cor.geneTraitSignificance >0.2), revealed 101 genes in the turquoise module with a high connectivity (Fig. 4A). Among the 101 genes, only NUP133, PSMD12, PPWD1, RBM8A, CRNKL1, PPP2R5C, RBM22, PIK3CB, SKIV2L2, PAPOLA, SRSF1 and COPS2 were identified in both PPI and co-expression network (Fig. 4B and C). Hence, these 12 genes were determined as the ‘real’ hub genes for menopausal status and were selected for subsequent analysis.
Figure 4.
Hub gene detection. (A) Scatter plot of module eigengenes in the turquoise module. (B) The Venn diagram of co-expression hub genes and the PPI network hub genes. (C) Heatmap of the real hub genes. PPI, protein-protein interaction.
Hub gene validation
The validation set revealed that the expression levels of the identified hub genes were increased in postmenopausal status compared to premenopausal status (Fig. 5A-b to L-b). Furthermore, all genes except CRKNL1 changed significantly between pre- and postmenopausal status in women with a high BMD (Fig. 5A-a to L-a). Notably, relative to those with a premenopausal status, PPP2R5C and PPWD1 were significantly upregulated in women with a low BMD and postmenopausal status (Fig. 5F-a and G-a). However, only PPWD1 was altered in the comparison between low and high BMDwomen (Fig. 5A-c to L-c). In the premenopausal status, PPWD1 expression was significantly decreased in high BMDwomen compared to low BMDwomen (Fig. 5G-a). A similar trend was also noted in postmenopausal status, although statistical significance was not achieved. In the validation set, 4 genes (RBM22, RBM8A, SKIV2L2, and PPWD1) were significantly downregulated in high BMDwomen in comparison to low BMDwomen (Fig. 6).
Figure 5.
(A-F) Hub gene validation using the training set. (A) COPS2 (B) CRNKL1 (C) NUP133 (D) PAPOLA (E) PIK3CB (F) PPP2R5C. (G-L) Hub gene validation using the training set. (G) PPWD1 (H) PSMD12 (I) RBM8A (J) RBM22 (K) SKIV2L2 and (L) SRSF1. *P<0.05, **P<0.01, ***P<0.001. low, low BMD, high, high BMD, pre, premenopausal status, post, postmenopausal status; BMD, bone mineral density; PPWD1, peptidylprolyl isomerase domain and WD repeat containing 1.
Figure 6.
Hub gene validation using the validation set (GSE2208). (A) RBM22 (B) RBM8A (C) SKIV2L2 (D) PPWD1 (E) PSMD12 (F) PIK3CB (G) PAPOLA (H) CRKNL1 (I) COPS2 and (J) SRSF1. *P<0.05 and **P<0.01. PPWD1, peptidylprolyl isomerase domain and WD repeat containing 1; BMD, bone mineral density; ns, not significant.
The turquoise module was uploaded to the DAVID database to gain more insight into the function of these DEGs. GO analysis revealed a role for the hub genes in the top 20 biological processes (BP), including intracellular transport, ubiquitin-dependent protein catabolic process, protein transport, establishment of protein localization, cellular respiration, RNA processing, protein localization, oxidative phosphorylation, phosphate metabolic process, phosphorus metabolic process, RNA splicing, macromolecule catabolic process, intracellular protein transport, cellular macromolecule catabolic process, cellular protein localization, energy derivation by oxidation of organic compounds, translation, cellular macromolecule localization, phosphorylation, and generation of precursor metabolites and energy. Moreover, hub genes were also overrepresented in these top 20 KEGG pathways, including oxidative phosphorylation, Huntington's disease, Parkinson's disease, proteasome, Alzheimer's disease, Spliceosome, ubiquitin mediated proteolysis, RNA degradation, nucleotide excision repair, basal transcription factors, aminoacyl-tRNA biosynthesis, inositol phosphate metabolism, SNARE interactions in vesicular transport, lysosome, neurotrophin signaling pathway, N-glycan biosynthesis, endometrial cancer, mTOR signaling pathway, Toll-like receptor signaling pathway, and citrate cycle (TCA cycle) (Fig. 7).
Figure 7.
Bioinformatics analysis of genes in turquoise module. (A) GO analysis and (B) KEGG pathway analysis. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
GSEA was conducted to map the gene sets into the Molecular Signatures Database (MSigDB) and to further elucidate the potential functions of the hub genes. The results revealed that two gene sets were associated with BMD, including ‘estrogen response early’ and the ‘p53 pathway’. One gene set-‘protein secretion’ was related to menopausal status. In contrast, the level of PPWD1 was influenced by ‘fatty acid metabolism’ (Fig. 8).
Figure 8.
GSEA. (A and B) Gene sets related to BMD. (C) Gene sets related to menopausal status. (D) Gene sets related to PPWD1 expression. GSEA, gene set enrichment analysis; BMD, bone mineral density; PPWD1, peptidylprolyl isomerase domain and WD repeat containing 1.
Discussion
The vast majority of patients suffering from OP are PMO. Although PMO is well understood as a disease, only a few biomarkers are available to predict its occurrence until now. Therefore, specific biomarkers for the occurrence of PMO are necessary. In the present study, a bioinformatics-based approach was conducted to screen for potential biomarkers, and the findings indicated that the PPWD1 gene may be a possible candidate.In the present study, WGCNA identified the turquoise module as clinically significant, and subsequent analysis revealed 12 genes (NUP133, PSMD12, PPWD1, RBM8A, CRNKL1, PPP2R5C, RBM22, PIK3CB, SKIV2L2, PAPOLA, SRSF1 and COPS2) that overlapped in PPI and co-expression analysis as ‘real’ hub genes. A previous WGCNA from Farber identified 6 genes (IFI35, EPSTI1, SP110, STAT1, TAP1, PSMA6) from 26 healthy young Chinese females through the integration of network analysis and genome-wide association data (17). Independently, Zhang et al identified 7 genes (LOC654188, PPIA, TAGLN2, YWHAB, LMNB1, ANXA2P2, ANXA2) from 42 unrelated postmenopausal Caucasian women in one study (18) and Chen et al 2 genes (HOMER1, SPTBN1) from 84 postmenopausal white women in another study (19). Notably, the genes identified in the present study were not reported in these earlier studies. We reason that the observed discrepancy may be connected to the differences in the subjects and samples, and should be investigated further.The present results revealed the high expression levels of these ‘real’ hub genes in postmenopausal women. However, only the expression of PPWD1 among the various hub genes exhibited a strong reduction in high BMDwomen. In the validation set, a similar expression pattern of PPWD1 was revealed. Collectively, the present results indicated that PPWD1 may be a key gene contributing to the occurrence of PMO. PPWD1 was first cloned in 1994 (20) and later purified as part of the catalytically competent form of the spliceosome C complex (21). The functional role of PPWD1 is currently elusive and is proposed to be involved in pre-RNA splicing (22). Of note, a functional link between PPWD1 and bone metabolism has not yet been established.To investigate the involvement of PPWD1 in BMD regulation, several gene functional enrichment analyses were used. Firstly, GO and KEGG pathway analyses were performed. In GO analysis, it was revealed that these hub genes were significantly enriched in the metabolic bioenergetics pathway of oxidative phosphorylation, which takes place in the mitochondria. Similarly, the KEGG pathway analysis revealed that oxidative phosphorylation was the most abundant pathway. Hence, it was speculated that the function of PPWD1 may be linked to energy metabolism in mitochondria. To confirm this hypothesis, GSEA analysis was carried out. Notably, our results revealed that the fatty acid metabolism pathway was significantly enriched in the samples with a high expression of PPWD1. Therefore, PPWD1 may play a role in the fatty acid oxidation pathway, which is functionally inter-linked to the oxidative phosphorylation pathway, as both derive ATP through the mitochondrial electron transport chain. Notably, many studies have highlighted the role of fatty acids in bone metabolism, including bone formation and resorption (23–27). Treatment with palmitate has been revealed to block the osteoblast differentiation of fetal rat calvarial cells (28). Independently, Kim et al demonstrated that a medium-chain fatty acid, capric acid, inhibits the osteoclast differentiation induced by RANKL (29). Recently, Lavadogarcía et al described a positive correlation between a dietary intake of long chain omega-3 polyunsaturated fatty acids and BMD in normal and osteopenia Spanish women (30). However, the exact function of PPWD1 requires investigation to identify whether it may have an effect on bone metabolism by regulating fatty acid metabolism.In summary, the present study used a weighted gene co-expression analysis to construct a gene co-expression network to identify and validate hub genes associated with menopausal status and BMD. Subsequently, PPWD1 was identified and validated in association with the occurrence of PMO, and the present findings highlighted the potential of PPWD1 as a possible diagnostic biomarker for PMO.
Authors: Russel Burge; Bess Dawson-Hughes; Daniel H Solomon; John B Wong; Alison King; Anna Tosteson Journal: J Bone Miner Res Date: 2007-03 Impact factor: 6.741
Authors: Claudia Goettsch; Andrea Babelova; Olivia Trummer; Reinhold G Erben; Martina Rauner; Stefan Rammelt; Norbert Weissmann; Valeska Weinberger; Sebastian Benkhoff; Marian Kampschulte; Barbara Obermayer-Pietsch; Lorenz C Hofbauer; Ralf P Brandes; Katrin Schröder Journal: J Clin Invest Date: 2013-11 Impact factor: 14.808
Authors: Tara L Davis; John R Walker; Hui Ouyang; Farrell MacKenzie; Christine Butler-Cole; Elena M Newman; Elan Z Eisenmesser; Sirano Dhe-Paganon Journal: FEBS J Date: 2008-04-03 Impact factor: 5.542
Authors: Maria Almeida; Marta Martin-Millan; Elena Ambrogini; Robert Bradsher; Li Han; Xiao-Dong Chen; Paula K Roberson; Robert S Weinstein; Charles A O'Brien; Robert L Jilka; Stavros C Manolagas Journal: J Bone Miner Res Date: 2010-04 Impact factor: 6.741