Shaoshuo Li1, Baixing Chen2, Hao Chen3, Zhen Hua3, Yang Shao3, Heng Yin3, Jianwei Wang3. 1. Nanjing University of Chinese Medicine, Nanjing, P.R. China. 2. Department of Development and Regeneration, KU Leuven, University of Leuven, Leuven, Belgium. 3. Department of Traumatology & Orthopedics, Wuxi Affiliated Hospital of Nanjing University of Chinese Medicine, Wuxi, P.R. China.
Abstract
OBJECTIVES: Smoking is a significant independent risk factor for postmenopausal osteoporosis, leading to genome variations in postmenopausal smokers. This study investigates potential biomarkers and molecular mechanisms of smoking-related postmenopausal osteoporosis (SRPO). MATERIALS AND METHODS: The GSE13850 microarray dataset was downloaded from Gene Expression Omnibus (GEO). Gene modules associated with SRPO were identified using weighted gene co-expression network analysis (WGCNA), protein-protein interaction (PPI) analysis, and pathway and functional enrichment analyses. Feature genes were selected using two machine learning methods: support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF). The diagnostic efficiency of the selected genes was assessed by gene expression analysis and receiver operating characteristic curve. RESULTS: Eight highly conserved modules were detected in the WGCNA network, and the genes in the module that was strongly correlated with SRPO were used for constructing the PPI network. A total of 113 hub genes were identified in the core network using topological network analysis. Enrichment analysis results showed that hub genes were closely associated with the regulation of RNA transcription and translation, ATPase activity, and immune-related signaling. Six genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) were selected as genetic biomarkers for SRPO by integrating the feature selection of SVM-RFE and RF. CONCLUSION: The present study identified potential genetic biomarkers and provided a novel insight into the underlying molecular mechanism of SRPO.
OBJECTIVES: Smoking is a significant independent risk factor for postmenopausal osteoporosis, leading to genome variations in postmenopausal smokers. This study investigates potential biomarkers and molecular mechanisms of smoking-related postmenopausal osteoporosis (SRPO). MATERIALS AND METHODS: The GSE13850 microarray dataset was downloaded from Gene Expression Omnibus (GEO). Gene modules associated with SRPO were identified using weighted gene co-expression network analysis (WGCNA), protein-protein interaction (PPI) analysis, and pathway and functional enrichment analyses. Feature genes were selected using two machine learning methods: support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF). The diagnostic efficiency of the selected genes was assessed by gene expression analysis and receiver operating characteristic curve. RESULTS: Eight highly conserved modules were detected in the WGCNA network, and the genes in the module that was strongly correlated with SRPO were used for constructing the PPI network. A total of 113 hub genes were identified in the core network using topological network analysis. Enrichment analysis results showed that hub genes were closely associated with the regulation of RNA transcription and translation, ATPase activity, and immune-related signaling. Six genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) were selected as genetic biomarkers for SRPO by integrating the feature selection of SVM-RFE and RF. CONCLUSION: The present study identified potential genetic biomarkers and provided a novel insight into the underlying molecular mechanism of SRPO.
Osteoporosis is a systemic skeletal disorder. This disease is highly prevalent worldwide and is characterized by bone microstructure degeneration, reduction in bone mineral density (BMD), leading to increased bone fragility and decreased bone strength [1, 2]. It is reported that almost 50% of postmenopausal women develop osteoporosis [3]. Furthermore, a third of postmenopausal women have bone fractures due to osteoporosis [4]. The estimated cost of managing postmenopausal osteoporosis (PMOP) and related fractures in the United States in 2015 was over USD 15 billion [5], and PMOP has become a major public health problem worldwide [6].Multiple factors are involved in PMOP by affecting the function of osteoblasts and osteoclasts and regulating bone mineral homeostasis [7]. Estrogen secretion is decreased during menopause, resulting in the decline of ovarian function, increasing the risk of bone metabolic diseases [8, 9]. Estrogens modulate immune activity and the response of immune cells (T cells, B cells, and monocytes) to estrogen and its receptors [10]. Circulating B lymphocytes are strongly implicated in the pathogenesis of PMOP by producing cytokines that regulate the activity of osteoblasts and osteoclasts. In addition, the downregulation of MAPK3 and ESR1 in B cells decreases osteogenesis and increases osteoclastogenesis, demonstrating the importance of B cells in the etiology of PMOP [11].Poor lifestyle habits are significant contributors to rapid bone loss in postmenopausal women [12]. In this context, smoking is a significant independent risk factor for osteoporosis (P = 0.000, OR = 1.911) [13]. Female smokers are almost twice as likely to have osteoporosis than non-smoking women [14]. Smoking may lead to changes in the microarchitecture of trabecular bone and reduces the ability of the skeletal muscle to resist mechanical load and stress [15]. Moreover, smoking may induce harmful changes in the immune system and cause diseases via the dysregulation of impaired B cells. Smoking-related postmenopausal osteoporosis (SRPO) is an emerging area of research that assesses changes in gene expression levels in postmenopausal smokers.With the rapid development of high-throughput microarray technologies, the identification of genomic variations and biological mechanisms has improved our understanding of disease pathogenesis and treatment [16, 17]. Weighted gene co-expression network analysis (WGCNA) is widely used to analyze gene expression microarray data, identify functional gene modules, and discover relationships between gene modules and disease traits [18-20]. WGCNA screens genes and divides them into modules, which in turn are correlated with specific clinical phenotypes through Pearson correlation analysis. Machine learning algorithms have shown great promise in investigating the underlying relationship of high-dimensional data through supervised or unsupervised methods [21, 22]. Moreover, machine learning is useful to analyze high-dimension transcriptomic data and identify feature genes with biological significance [23-25]. However, no studies have analyzed genome variations in SRPO.In this study, we performed a comprehensive analysis of gene expression patterns of circulating B cells from 20 postmenopausal female smokers with low or high BMD using bioinformatics and machine learning algorithms, including WGCNA, support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), protein-protein interaction (PPI) and functional analyses, and receiver operating characteristic (ROC) curve analysis. Six potential diagnostic biomarkers of SRPO were identified.
2. Materials and methods
2.1. Microarray data collecting and data preprocessing
The study flowchart is shown in Fig 1. The gene microarray dataset GSE13850 based on the Affymetrix Human Genome U133A (GPL96) platform, probe annotation files, and CEL files were downloaded from the Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo/). Quantile normalization, background correction, and probe summarization of raw data were performed using the robust multiarray average (RMA) algorithm [26]. If one gene matched more than one probe, the maximum value of the probe was selected and calculated. The GSE13850 dataset provided data on gene expression in circulating B cells of 20 postmenopausal female smokers (10 with high BMD and 10 with low BMD).
Fig 1
Workflow of the present study.
2.2. Construction of the WGCNA network
Phenotype-correlated gene modules associated with SRPO were identified by WGCNA. The top 5,000 genes with the highest expression levels were used to construct the WGCNA network using the WGCNA package in R [20]. First, Pearson’s correlation matrices for all pairs of genes were calculated. The pairwise correlation coefficient between the pair of gene m and gene n with significance (Smn) was defined as Smn = |cor(m,n)|. These correlation matrices were transformed into a weighted adjacency matrix using the power function amn = power (Smn, β) = |Smn|β [26]. According to the average connectivity degree and standard of approximate scale-free topology network, an appropriate soft-thresholding power β was selected, and the adjacency matrix was transformed into a topological overlap matrix (TOM). TOM-based hierarchical clustering of gene modules was performed using the dynamic tree cut algorithm [27]. Gene modules with similar expression profiles were represented by different branches with appropriate colors, and the minimum module size was set as 40.
2.3. Correlation between gene modules and SRPO
The WGCNA algorithm uses module eigengene (ME) to evaluate relationships between gene modules and clinical traits. ME was defined as the major component computed by a principal component analysis that recapitulates the manifestation of genes from a specific module into a characteristic expression profile [28]. The Pearson correlation between ME and clinical traits was calculated to identify the module that was highly correlated with SRPO. The significance of Pearson correlation was assessed using a t-test, and the module with a P-value of less than 0.05 was considered to be significantly correlated with SRPO. Furthermore, gene significance (GS) and module membership (MM) were calculated for intramodular analysis. MM was the correlation between ME and the gene expression profile. GS was defined as the log10 transformation of the P-value (lgP) between gene expression and the clinical trait (GS = lgP). Module significance (MS) was defined as the average GS of all genes in a module. The module with the highest absolute MS was considered to be significantly correlated with SRPO. The module with the highest correlation with a clinical trait (osteoporosis) was selected as a research object.
2.4. Construction of PPI networks
PPI networks were constructed to evaluate the relationship among genes in the selected modules using the Search Tool for the Retrieval of Interacting Genes version 11 (STRING V11, https://string-preview.org/). The confidence level was set as >0.4, and the network was visualized using Cytoscape version 3.8.2 [29]. Hub genes are highly interconnected nodes and may play important roles in the PPI network. A topological network analysis, including betweenness centrality (BC), closeness centrality (CC), and degree centrality (DC), for screening hub genes was performed using the CytoNCA plugin for Cytoscape [30].
2.5. Function and pathway enrichment analyses
Gene Ontology (GO) enrichment analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the clusterProfiler [31] package in R to describe the possible biological functions of hub genes. Three categories of biological process (BP), cellular component (CC), and molecular function (MF) were included in the GO terms. A Benjamini–Hochberg adjusted P-value of less than 0.05 was considered to indicate significantly enriched GO terms and KEGG pathways.
2.6. Machine learning for feature selection
Feature genes associated with SRPO were selected using SVM-RFE and RF. SVM-RFE was an efficient feature selection algorithm and had shown promising power in the analysis of the genomics [32], metabolomics [33], proteomics [34], etc. During the performance, SVM-RFE iteratively removed the features with the smallest weight from a rank until all features were excluded. In each iteration, the current SVM-RFE model was evaluated by k-fold cross-validation. After that, the classifier model with the highest accuracy was constructed, and the best variables were found [35]. The RF algorithm used the variables to construct numerous decision trees and generated the most accurate classes of variables to individual trees. RF has also been widely used for detecting disease biomarkers [36, 37]. The SVM-RFE model was built using the R package caret version 6.0–88. RF was applied using the randomForest package version 4.6–14. Ultimately, the common genes obtained using both SVM-RFE and RF were combined for further analysis.
2.7. Evaluation of the diagnostic efficiency
The ability of feature genes to differentiate between SRPO patients and non-osteoporosis postmenopausal smokers was evaluated by gene expression and ROC curve analyses. The predictive efficiency was measured in the control group (ten samples from postmenopausal smokers with high BMD) and the SRPO group (ten samples from postmenopausal smokers with low BMD). A Benjamini–Hochberg adjusted P-value of less than 0.05 were considered to indicate significant differences in gene expression. The ROC curve was created using the pROC package version 1.17.0.1 in R. The genes with an area under the ROC curve (AUC)>0.7 were considered to have good diagnostic performance.
3. Results
3.1. Data collection and WGCNA analysis
Gene expression data and clinical data from the GSE13850 dataset were downloaded from the GEO database. Following data processing, the top 5,000 genes in circulating B cells were collected, and the WGCNA network was constructed. Subsequently, an appropriate soft-thresholding power β = 9 was adopted due to the signed R^2 of the scale-free topology network was 0.85 (Fig 2).
Fig 2
Construction of the weighted gene co-expression network of gene modules.
(A) Analysis of the scale independence for the appropriate soft-thresholding power β. (B) Analysis of the mean connectivity for the appropriate soft-thresholding power β. (C) Histogram of connectivity distribution with an appropriate β = 9. (D) Checking the scale-free topology with an appropriate β = 9.
Construction of the weighted gene co-expression network of gene modules.
(A) Analysis of the scale independence for the appropriate soft-thresholding power β. (B) Analysis of the mean connectivity for the appropriate soft-thresholding power β. (C) Histogram of connectivity distribution with an appropriate β = 9. (D) Checking the scale-free topology with an appropriate β = 9.Eight gene modules were obtained using the dynamic tree cut algorithm (Fig 3A and 3B). The correlation between each module and osteoporosis was assessed by calculating the module–trait relationship and MS. First, the Pearson correlation between the ME of each module and osteoporosis was calculated and shown in the module–trait relationship heatmap (Fig 3C and Table 1). The blue module (module–trait relationships = 0.88, P-value = 7e-07) had the highest association with osteoporosis. After that, the MS of each module was calculated. We found that the blue module had the highest MS among all selected modules (Fig 3D). Hence, the 1078 genes in the blue module were significantly associated with SRPO, and these genes were selected for subsequent analysis in the PPI network. The clustering heatmap of the ME of the blue module and the scatterplots of GS vs. MM are presented in Fig 3E and 3F.
Fig 3
Identification of significant gene modules correlated with osteoporosis.
(A) Cluster dendrogram of representative gene modules. (B) Clustering heatmap of module eigengenes. (C) Relationships of module eigengenes and osteoporosis. The number in the square at the top of each row is the correlation coefficient, and P-values are shown below. (D) Gene significance across modules. (E) Heatmap and bar graph of the eigengenes in module blue. (F) Scatterplot of gene significance vs. module membership in the blue module.
Table 1
Correlation between modules and smoking-related postmenopausal osteoporosis.
Modules
Gene count
Correlation
P-value
Blue
1078
0.88
7e-07
Turquoise
1728
-0.75
2e-04
Brown
771
-0.64
0.003
Red
78
-0.62
0.005
Black
60
0.36
0.1
Green
164
0.29
0.2
Yellow
225
-0.26
0.3
Grey
896
0.2
0.4
Identification of significant gene modules correlated with osteoporosis.
(A) Cluster dendrogram of representative gene modules. (B) Clustering heatmap of module eigengenes. (C) Relationships of module eigengenes and osteoporosis. The number in the square at the top of each row is the correlation coefficient, and P-values are shown below. (D) Gene significance across modules. (E) Heatmap and bar graph of the eigengenes in module blue. (F) Scatterplot of gene significance vs. module membership in the blue module.
3.2. Construction of the PPI network and enrichment analysis of hub genes
After removing the disconnected nodes, there were 998 nodes and 10940 edges in the constructed PPI network for genes in the blue module (Fig 4A). According to topological network analysis, PPI nodes are considered significant targets if the DC is greater than two-fold the median DC [38]. Thus, DC > 28 was set as the threshold, and significant nodes were identified to generate a subnetwork. Then, nodes where BC and CC values were greater than the median in the subnetwork (BC>158.81, CC>0.48) were considered a new core network containing hub genes. The core network containing 113 hub genes (nodes) and 1831 edges is shown in Fig 4B.
Fig 4
Protein-Protein Interaction (PPI) network of genes from the blue module.
(A) Screening of hub genes. The screening criteria were degree centrality>28, betweenness centrality>158.81, and closeness centrality>0.48. (B) Core PPI network with 113 hub genes and 1831edges. The color of the nodes represented the value of degree. The darker (red) the color, the higher the degree.
Protein-Protein Interaction (PPI) network of genes from the blue module.
(A) Screening of hub genes. The screening criteria were degree centrality>28, betweenness centrality>158.81, and closeness centrality>0.48. (B) Core PPI network with 113 hub genes and 1831edges. The color of the nodes represented the value of degree. The darker (red) the color, the higher the degree.Functional enrichment analysis was performed to improve biological understanding of the hub genes identified in the PPI network. Regarding biological processes, GO analysis showed that hub genes were mainly involved in the regulation of mRNA transcription, regulation of cell cycle, protein targeting, and cellular response to hypoxia (Fig 5A). In the cellular component analysis, hub genes were mainly associated with ribosomal subunits, methylosome, and proteasome complexes (Fig 5B). Significantly enriched molecular functions were translation regulation, ATPase activity, hormone receptor binding, and protein binding (Fig 5C). KEGG pathway enrichment analysis showed that ribosome, apoptosis, mitophagy, HIF-1 signaling pathway, NF-kappa B signaling pathway, Th17 cell differentiation, and B cell receptor signaling pathway were the most significant processes in SRPO (Fig 5D).
3.3. Identification of feature genes using machine learning algorithms
Machine learning classification algorithms are being increasingly used to predict feature genes associated with diseases from the noise background. SVM-RFE and RF were used to predict feature genes associated with SRPO. First, an SVM-RFE classifier (Core: svmliner; Cross: 10-fold cross-validation; soft-margin; tuning parameter C = 1) was established based on 113 hub genes. Data from the control and SRPO groups were randomly divided into ten equal portions (training set: 9; test set: 1). During each of the ten iterations, SVM-RFE was applied to the training set to train the classifier with the selected features, and the trained classifier was applied to the test set to assess prediction accuracy. Then, the predictions from the ten iterations were combined to evaluate the accuracy of the classifier. Eight feature genes were validated using SVM-RFE (Fig 6A). Similarly, feature genes were screened by 10-fold cross-validation using RF algorithm. The RF classifier showed a least out-of-bag (OOB) error with the top 11 feature genes (Fig 6B). After integrating feature genes from SVM-RFE and RF, six feature genes closely associated with SRPO were obtained: HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2 (Fig 6C).
Fig 6
Feature genes selection.
Using support vector machine-recursive feature elimination (SVM-RFE) (A) and random forest (RF) (B). (C) Venn plot of feature genes selected by RF and SVM-RFE.
Feature genes selection.
Using support vector machine-recursive feature elimination (SVM-RFE) (A) and random forest (RF) (B). (C) Venn plot of feature genes selected by RF and SVM-RFE.
3.4. Diagnostic efficiency of feature genes
The difference in expression pattern of the six feature genes between the SRPO and control groups was assessed. Gene expression was downregulated in the SRPO group, except for UBE2V2 (Fig 7A). To identify if the feature genes influence SRPO diagnosis independently, ROC analysis was performed. The results showed that the ability of these genes to diagnose SRPO was high, with an AUC>0.9 (Fig 7B).
Fig 7
Diagnostic efficiency evaluation of feature genes.
(A) Gene expression of six feature genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) in women with smoking-related postmenopausal osteoporosis and controls. (B) Receiver operating characteristic curve analysis.
Diagnostic efficiency evaluation of feature genes.
(A) Gene expression of six feature genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) in women with smoking-related postmenopausal osteoporosis and controls. (B) Receiver operating characteristic curve analysis.As an RNA-binding protein, heterogeneous nuclear ribonucleoprotein C (HNRNPC) is well known for regulating mRNA metabolism and RNA expression, splicing, and translation [39, 40]. In addition, HNRNPC regulates N6-methyladenosine (m6A) RNA methylation, which is crucial to neurogenesis, embryonic development, stress responses, and tumorigenesis [41, 42]. TCEB2 (also known as ELOB) encodes the protein elongin B, a subunit of the transcription factor B complex and an adapter protein in the proteasomal degradation of target proteins through E3 ubiquitin ligases [43]. Proteasome 26S subunit ATPase 5 (PSMC5) interacts with several transcription factors, including nuclear hormone receptors, p53, c-Fos, and the basal transcription complex [44]. Moreover, PSMC5 plays a proteasome-independent role in DNA repair, chromatin remodeling, and transcription activation and elongation [45, 46]. PFDN2 is a component of β subunits of the URI prefoldin-like complex, which plays a critical role in maintaining cellular homeostasis [47]. Ubiquitin-conjugating enzyme E2 variant 2 (UBE2V2) mediates the transcriptional activation of target genes and controls cell differentiation, cell cycle, and DNA damage response [48]. Ribosomal protein S16 (RPS16), the basic component of the 40S ribosome, was reported to be associated with the defective mitochondrial translation [49]. These feature genes were closely associated with RNA transcription and translation, and important cellular activity in SRPO.
4. Discussion
There is increased public awareness of the harmful effects of exposure to cigarette smoking. However, although substantial progress has been made in tobacco control, cigarette smoking remains one of the most challenging global health issues to date [50, 51]. Postmenopausal smokers are at an increased risk of developing osteoporosis and osteoporotic fractures than non-smoking females [52]. Moreover, smoking‐induced genetic alterations influence hormone secretion and bone metabolism in women [53, 54]. The molecular mechanism of occurrence and development of SRPO is incompletely understood, and identifying new biomarkers for SRPO diagnosis and treatment is crucial.We determined the gene expression profiles in circulating B cells from 20 postmenopausal smokers with low or high BMD. First, WGCNA was performed to select the gene modules with the strongest correlation with SRPO. Then, 1078 genes in the selected module were used to construct a PPI network. Topological network analysis identified a core PPI network and 113 hub genes. Functional enrichment analysis showed that these hub genes were closely associated with the development of SRPO via the control of several biological processes, including the regulation of RNA transcription and translation, hormone receptor binding, and NF-kappa B signaling pathway. Previous studies have shown that these biological processes and signaling pathways are implicated in bone metabolism and osteoporosis [55, 56]. The risk of missing important features was minimized by incorporating genes using two machine learning algorithms. SVM-RFE and RF were performed to screen six characteristic variables from these hub genes. Diagnostic efficiency analysis showed that the genes HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2 were potential biomarkers for SRPO.In a cigarette smoke-induced chronic obstructive pulmonary disease (COPD) animal model, HNRNPC was overexpressed in the lungs of cigarette smoke-exposed mice [57]. The dysregulation of HNRNPC is associated with telomere shortening in lung cells and circulating lymphocytes, impairing lung function and increasing COPD severity and mortality [58, 59]. In addition, the dysregulation of HNRNPC may increase the expression of the urokinase plasminogen activator receptor, resulting in inflammation and immune activation [60]. TCEB2 plays an essential role in the development of acquired resistance to anti-angiogenic therapy in ovarian cancer cells via suppressing VEGF-A expression and promoting HIF-1α degradation [61]. The vascularization of bone tissue is tightly linked with bone formation in a spatial and temporal relationship known as angiogenesis-osteogenesis coupling [62]. Many factors, including HIF-1 and VEGF, regulate bone vascularization and angiogenic-osteogenic coupling in the bone microenvironment [63]. In this respect, the dysregulation of TCEB2 may contribute to SRPO by impairing this coupling. PSMC5 regulates ERK1/2 signaling transmission by remodeling the Shoc2 scaffold complex [64]. The activation of the ERK1/2 signaling cascade regulates the function of osteoblasts and osteoclasts, promoting inflammation and osteogenesis [65, 66]. PFDN2 is closely associated with several diseases, such as Alzheimer’s disease, colon cancer, and myelodysplastic syndromes, via different mechanisms [67-69]. The presence of antibodies against PFDN2 is associated with an increased risk of type 2 diabetes through autoimmune activation and/or pro-inflammatory signals, which are involved in the regulation of bone homeostasis [70]. UBE2V2 contributes to the development and progression of many cancers, including prostate, oropharyngeal, and breast cancers, via promoting cell proliferation, suppressing cell apoptosis, and regulating immune signaling [71-73]. Moreover, UBE2V2 is an independent prognostic indicator for lung adenocarcinoma, which is closely related to the mutational processes of cigarette smoking [74, 75]. RPS16 contributes to facilitate tumor progression of glioma via the PI3K/AKT signaling [76]. Previous studies have indicated that the PI3K/AKT signaling pathway is an important factor in the occurrence of osteoporosis by regulating the activity of osteoblasts and osteoclasts [77, 78].WGCNA can identify genes with clinical significance and cluster genes associate with pathological processes based on medical and biological background. Machine learning algorithms have shown objective assessment and optimal accuracy in feature selection. The present study is the first to perform a comprehensive strategy of machine learning algorithms and WGCNA to identify potential biomarkers of SRPO. Although our results are consistent with the literature, the reliability of this study needs to be verified by further experiments. This study has limitations. First, the smoking history, frequency, and status of individuals in the study were not well known, which might cause uncontrolled factors in data analysis. Second, the identified biomarkers were not functionally and externally validated. Third, the small sample size may have limited the power of the study. Additional studies on the association of these biomarkers with SRPO are warranted.
5. Conclusion
The present study identified six genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) as potential biomarkers for SRPO using WGCNA and machine learning algorithms, providing a novel insight into the diagnosis and treatment of SRPO. However, these biomarkers need to be validated by clinical trials.
Gene expression of samples.
(XLSX)Click here for additional data file.
Genes in each module.
(XLSX)Click here for additional data file.
PPI network.
(XLSX)Click here for additional data file.
Results of CytoNCA analysis.
(XLSX)Click here for additional data file.
GO enrichment of hub genes.
(XLSX)Click here for additional data file.
KEGG pathway enrichment.
(XLSX)Click here for additional data file.
Feature selection of machine learning.
(XLSX)Click here for additional data file.26 May 2021PONE-D-21-08513Investigation of Potential Genetic Biomarkers and Molecular Mechanism of Smoking-related Postmenopausal Osteoporosis by Using WGCNA and Machine LearningPLOS ONEDear Dr. Wang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Both reviewers raised substantial technical concerns. All of these need to be suitably addressed in a revised manuscript.Please submit your revised manuscript by Jul 08 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Jishnu Das, Ph.D.Academic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found atandhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. Please include your tables as part of your main manuscript and remove the individual files. Please note that supplementary tables should be uploaded as separate "supporting information" files.3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: NoReviewer #2: Partly**********2. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: NoReviewer #2: No**********3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: No**********4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: NoReviewer #2: No**********5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: Review of "Investigation of Potential Genetic Biomarkers and Molecular Mechanism of Smoking-related Postmenopausal Osteoporosis by Using WGCNA and Machine Learning" by J. Wang.The author has presented a bioniformatic pipeline to identify genetic biomarkers for smoking related postmenopausal osteoporosis. The are a few questions I have that I would like the authors to address.1. The author seems to have mixed analytical methods with subjective heuristics, which is concerning. For example, based on correlation analysis the author identifies the yellow module as showing a significant correlation between both smoking and osteoporosis. However, this identification seems subjective as the black module also seems to be significant. Futhermore, since this data is not RNAseq data but microarray data, which generaly profiles targeted transcripts, it is quite possible that the black module will also identify some relevant biomarkers. The author, therefore, needs to carefully, methodically and quantitatively justify the selection of the relevant module.2. Following the above point the author should clarify how the correlation between smoking/osteoporosis phenotype and eigengene of the different modules is computed and how the significance is computed. Additionally, in the Results section the author mentions correlation between eigenvalue (instead of eigengene) and the phenotype. The authors should clarify this confusion.3. The author uses a recursive feature elimination approach. In the context of statistical regression, such stepwise approaches have been shown to be suboptimal. Why does the author think that this approach is a good option, especially given that many of the microarray genes might already be pre-selected and correlated. Clarification would be quite helpful.4. What type of SVM did the author use: linear/non-linear, soft/hard margin, nu/C parameterization and why?5. The author should clarify their validation approach: what was the training set, what was the testing set, and how did the control set relate to them.6. It remains unclear to me as to why the author has generated ROCs and related AUC only for individual gene expressions. The performance is quite poor, and if one looks at the shape of the ROC, the sensitivity and specificity are not good either. It might be better to generate ROC curves for gene expression combinations.7. In the abstract the author has mentioned the "yellow module" as the clinically significant module. This seems to be a highly unusual way to present ones results. Nobody reading the abstract will understand what a yellow module is. The author needs to provide a better description.8. The writing needs to be improved. For example, 'According to recently scientific discoveries, ...', 'With the rapid development of high-throughput microarray technologies, identification of meaningful genomic variations and investigating biological mechanisms have contributed great effort ....', etc.Reviewer #2: The overall analytical pipeline (WGCNA+ML) seems promising to help identify potential genetic biomarkers of SRPO, but the validity and robustness of the results presented in this paper need further examination.Major1. Identification of significant modules and analysis of module-trait relationship – It was not clear whether proper multiple-testing correction has been applied. This appeared to be critical as the significance of the two modules nominated in the Module-Osteoporosis analysis is marginal (Yellow p=0.03, Green p=0.02). And the significance of the Yellow in Module-Smoking relationship will also become marginal if Bonferroni is applied to correct 11 tests.2. Construction of PPI network – The network and sub-network built in this paper seemed to be very dense. This was likely resulted from the authors including all sorts of protein linkages. Restricting to only “experimentally determined interactions” may better clarify the biological meaning of the network and the subsequent results. In particular, the inclusion of “gene neighborhood interactions”, “text-mining interactions”, and “gene co-occurrence interactions” lacks biological context.3. Function and pathway enrichment analysis – 1) Statistical significance again was not properly defined here. While gene set enrichment tools usually provide adjusted p-values (or q-values), the authors stated that “The p-value < 0.05 was considered to indicate a significant difference for GO terms and KEGG pathways”. 2) Statistical significance (Adjusted P-value or Odds) would be preferred in the figure, instead of “gene number”. 3) It is of little use to list all the enriched terms in the main text, instead, the authors should investigate further and explain the biological meaning of these terms and more importantly, how they are relevant to the phenotype SRPO. This is essential and is part of be the meat of this paper, as the authors claimed that this work could provide novel insights into the molecular mechanism of SRPO.4. Identification of feature genes using machine learning method – It was not sufficiently justified why the specific two ML methods were chosen. Also, how the recursive feature elimination analysis was performed and how it can benefit the model was not clearly explained. Plus, while prioritizing the genes by their overlaps seems justifiable, more detailed comparison of the two sets of genes is desired to give a more comprehensive view of the results (e.g., Are those non-overlapping genes also functionally relevant? Does one gene set appear to be more relevant than the other? Why RF gave many more genes than SVM – is it more powerful or less accurate? Would it be reasonable to use the union instead of intersection to implicate more genes?).5. Data validation – 1) The selection of control group: “10 samples of postmenopausal non-smoking females with high BMD” was used, while alternatively, the other group “10 samples of postmenopausal non-smoking females with low BMD” can be used to better dissect the relationship between the implicated genes and “smoking-related” postmenopausal osteoporosis (rather than the general postmenopausal osteoporosis). 2) Multiple-testing correction is expected for the gene expression analysis; if Bonferroni, genes ATP5G1 and RPL26L1 will not surpass the threshold. 3) Description and discussion about the functional significance of the implicated genes are expected to follow the results here. The authors had lengthy paragraphs on this in the Discussion section, which should be moved up, expanded in depth, while rephrased more concisely.Minor1. Accurate references need to be cited when linking the implicated genes/pathways to SRPO if the relevant conclusions are not drawn by this paper (e.g., “It has been suggested that aging and increasing in reactive oxygen species (ROS) may be the proximal culprits for osteoporosis”, “ROS can influence the generation and survival of osteoclasts and osteoblasts” …).2. Grammar needs to be revised (small errors like "Osteoporosis is one of the most common systemic skeletal disorder" occurs occasionally).3. Language needs to be polished into a more scientific fashion (for example, in “… performed a heatmap and bar graph of ...” and “…scatterplots of GS vs. MM of module yellow of the two phenotypes were performed …”, specific analyses/statistical tests should be described rather than the types of graphs).4. Data availability – key results should be complied as supplementary files for others to use (e.g., module information, PPI network, pathways, and relevant statistical results, etc.).**********6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.2 Aug 2021Thank you for your letter and the reviewers’ comments on our manuscript entitled " Investigation of Potential Genetic Biomarkers and Molecular Mechanism of Smoking-related Postmenopausal Osteoporosis by Using WGCNA and Machine Learning" (ID:PONE-D-21-08513). Those comments are very helpful for revising and improving our paper, as well as the important guiding significance to other researches. We have studied the comments carefully and made corrections which we hope meet with approval. The main corrections are in the manuscript and the responses to the reviewers’ comments are as follows (the replies are highlighted in blue). In addition, we have added several important supporting information.Replies to the reviewers’ comments:Reviewer #1:1.Response: According to your suggestion, we rewrote the “Correlation between gene modules and SRPO” in the “Methods and materials” and extended the reason why we chose blue module as the research object. In this study, we aimed to investigate potential genetic biomarkers and the underlying molecular mechanism of smoking-related postmenopausal osteoporosis. We collected the gene expression data of 20 postmenopausal female smokers (10 with high BMD and 10 with low BMD). Smoking was a common exposure for these people. The difference between these people was whether osteoporosis occurs or not, which may be related to the smoking‐induced genetic alterations. The genetic biomarkers of SRPO should differentiate SRPO patients and non-osteoporosis postmenopausal female smokers. So, we used WGCNA to identify the gene modules that highly correlated to the clinic trait (osteoporosis). After the construction of a weighted gene co-expression module network, module gene (ME) was used to evaluate the possible relationship of gene module with clinic trait. The Pearson correlation between ME and clinic trait was calculated to identify the module that was highly correlated to osteoporosis. The t-test was used to measure the significance of the Pearson correlation, and the module with a P-value less than 0.05 was considered to have a significant correlation to SRPO. Besides, the module significance (MS) of each module was calculated. The module with the highest absolute MS value was indicated to be significantly correlated to SRPO. As a result, the blue module (module–trait relationships = 0.88, P-value = 7e-07) was found to have the highest association with osteoporosis. And the blue module was found to have the highest MS value among all of the selected modules. Based on these results, we chose the blue module as the research object.2.Response: Thanks for the suggestion of the reviewer, we described the calculation method in detail in the “Methods and materials”. The WGCNA algorithm used module eigengene (ME) to evaluate the possible relationship of gene module with clinical trait. ME was defined as the major component computed by means of principal component analysis that recapitulates the manifestation of genes in a specific module into a single characteristic expression profile. The Pearson correlation between ME and clinical trait (osteoporosis) was calculated to identify the module that was highly correlated to SRPO. The t-test was used to measure the significance of the Pearson correlation, and the module with a P-value less than 0.05 was considered to have a significant correlation to SRPO. We are sorry to make a confusion of eigengene and eigenvalue. We have made corrections according to the Reviewer’s comments. In the module–trait relationships heatmap, the correlation between module eigengene and clinic trait was computed.3.Response: According to the suggestion of the reviewer, we rewrote the “Machine learning for feature selection” in the “Methods and materials” and extended the reason why we used the recursive feature elimination approach for the feature genes selection. In this study, we used support vector machine-recursive feature elimination (SVM-RFE), and random forest (RF) algorithm to select feature genes from hub genes that correlated to SRPO. The recursive feature elimination approach removed one feature with the smallest weight iteratively to a feature rank until all the features had been removed. In each iteration, the current SVM-RFE model was evaluated by the k-fold cross-validation. Finally, the classifier model with highest accuracy was constructed and the best variables were found. The support vector machine combined with recursive feature elimination algorithm had shown promising power in the analysis of the genomics, metabolomics, and proteomics. For instance, in the previous study, SVM-RFE had helped identifying biomarkers correlated with breast cancer prognosis (Li J, et.al Tumor Characterization in Breast Cancer Identifies Immune-Relevant Gene Signatures Associated with Prognosis. Front. Genet. 10:1119.). Besides, the main advantage of microarray chips is that they can simultaneously examine the expression of thousands of genes on a large scale and comprehensively. For example, through gene microarrays, genes that may be affected by a disease can be found in a short period of time and serve as biomarkers for early diagnosis. Therefore, we believed that the SVM-RFE machine learning algorithm is a good option for feature genes selection and we can use SVM-RFE algorithm to identify reliable biomarkers for SRPO.4.Response: We are very sorry for our neglect to describe the process of identification of feature genes using machine learning algorithms in detail. We have rewritten this part in the results according to the Reviewer’s suggestion. Firstly, an SVM-RFE classifier (Core: svmliner; Cross: 10-fold cross-validation; soft-margin; tuning parameter C = 1) was established based on the 113 hub genes. The data of control group and SRPO group was randomly divided into 10 equal portions (training set: 9, test set: 1). During each of the ten iterations, the SVM-RFE algorithm was performed on the training set to train the classifier with the selected features. Next, the trained classifier was applied to the test set for prediction. Then, the predictions from all 10 iterations were combined to evaluate the accuracy of the classifier. By the SVM-RFE algorithm, we validated a set of 8 feature genes (Fig. 6A). In our SVM-RFE model, the tuning parameter C (also known as Cost) was set as default (C=1). And we used the soft-margin in our SVM-RFE for the reason that soft-margin SVM could choose decision boundary that has non-zero training error even if dataset is linearly separable, and is less likely to overfit. Besides, during the feature selection, we tried several types of SVM models,including linear, radial, sigmoid. Finally, the SVM-RFE classifier (Core: svmliner; Cross: 10-fold cross-validation) with these 8 feature genes revealed the highest predict accuracy, and this classifier was selected by the SVM-RFE algorithm. The prediction result based on the selected SVM-RFE classifier has been provided as a supplementary file.5.Response: The comments are very helpful for revising and improving our paper, and we have made correction according to the Reviewer’s comments. We rewrote the “Data validation” and described this section as “Evaluation of the diagnostic efficiency”. After integrating the results of feature selection from SVM-RFE and RF, we obtained six feature genes highly correlated to SRPO. The ability of feature genes to differentiate between SRPO patients and non-osteoporosis postmenopausal smokers was evaluated by gene expression and ROC curve analyses. The predictive efficiency was measured in the control group (ten samples from postmenopausal smokers with high BMD) and the SRPO group (ten samples from postmenopausal smokers with low BMD). As smoking was the common exposure for these people, the occurrence of SRPO between them may be highly related to the smoking‐induced genetic alterations. The biomarkers of SRPO should distinguish SRPO patients and non-osteoporosis postmenopausal female smokers precisely. As a result, the expression pattern of selected feature genes was significantly differentiated between the two groups. Moreover, these feature genes reached great diagnosis value with an AUC>0.9, respectively. Furthermore, we committed certain limitations of this study in the discussion. There was little research on genomic variation in SRPO for now. To our best knowledge, this study is the first to investigate potential genetic biomarkers of SRPO. However, functional validation and external validation of these biomarkers were lacked in this study. The reliability of this study still needs to be verified by further experiments.6.Response: We are very sorry for our neglect to explain the purpose of the ROC analysis for individual gene expressions of the feature genes. We rewrote the “Data validation” and described this section as “Evaluation of the diagnostic efficiency” and we extended the reason why we performed ROC analysis for the feature genes in this section. In order to identify if the feature genes influence the SRPO diagnosis independently, we performed receiver operating characteristic (ROC) curve analysis for the selected feature genes on individual gene expression. As a result, the six feature genes reached great diagnosis value with an AUC>0.9, respectively. Moreover, the shape of the ROCs indicated great sensitivity and specificity of these feature genes in SRPO diagnosis. It might be a good idea to generate ROC analysis for gene expression combination of all the feature genes. But the above ROC results of individual feature gene expressions have proven that the feature genes can influence the SRPO diagnosis independently.7.Response: We are sorry that our English writing level has troubled the reviewers. We have submitted our manuscript to TOPEDIT (https://www.topeditsci.com/) for polishing as suggested by the reviewers. Correction have been made as follow. Eight highly conserved modules were detected in the WGCNA network, and the genes in the module that was strongly correlated with SRPO were used for constructing the PPI network.8.Response: We apologize for the language problems in the original manuscript. We have submitted our manuscript to TOPEDIT (https://www.topeditsci.com/) for polishing as suggested by the reviewers.Reviewer #2:Major1.Response: According to your suggestion, we rewrote the “Correlation between gene modules and SRPO” in the “Methods and materials”. In the revised paper, we aimed to investigate potential genetic biomarkers and the underlying molecular mechanism of smoking-related postmenopausal osteoporosis. The difference between the SRPO and control groups was whether osteoporosis occurs or not, which may be related to the smoking‐induced genetic alterations. The genetic biomarkers of SRPO should differentiate SRPO patients and non-osteoporosis postmenopausal female smokers. Thus, we used WGCNA to identify the gene modules that highly correlated to the clinical trait (osteoporosis). In WGCNA algorithm, the number of statistical tests is greatly reduced by clustering the genes into modules and then making statistical analysis. The association between thousands of genes and phenotypes is transformed into that between several gene sets and phenotypes, so as to avoid the problem of multiple hypothesis testing and correction. This processing method strengthens the strong correlation and weakens the weak or negative correlation, making the correlation value more consistent with the characteristics of scale-free network and more biologically meaningful. Taken together, the proper multiple-testing correction has been applied in the WGCNA algorithm. As a result, the blue module (module–trait relationships = 0.88, P-value = 7e-07) was found to have the highest association with osteoporosis. And the blue module was found to have the highest MS value among all of the selected modules. Based on these results, we chose blue module as the research object.2.Response: We are very grateful to the reviewer for this helpful suggestion. In order to investigate the biological interaction among the genes in the selected modules from WGCNA, we performed a protein-protein interaction (PPI) network analysis for these genes using the STRING database. After removing the disconnected nodes, there were 998 nodes and 10940 edges in the constructed PPI network. Then, we used the CytoNCA plugin for Cytoscape to perform a topological network analysis in the PPI network. We set the twice median DC value as a threshold to screen significant nodes in the PPI network and generate a subnetwork. Furthermore, nodes where both BC and CC value were greater than the median in the subnetwork were identified as a new core network containing hub genes. Finally, the core network containing 113 hub genes (nodes) and 1831 edges was displayed. The identified hub genes were subjected to enrichment analysis to investigate their biological meaning.3.Response: We have made corrections according to the reviewer’s comments. 1) In “Function and pathway enrichment analysis”, a Benjamini–Hochberg adjusted P-value < 0.05 was considered to indicate a significantly enriched GO terms and KEGG pathways. 2) In Fig.5, the “Gene Ratio” was used for statistical significance display. 3) The results of function and pathway enrichment analysis of hub genes indicated that hub genes were mainly concentrated in the regulation of RNA transcription and translation, regulation of cell cycle, ATPase activity, HIF-1 signaling pathway, and NF-kappa B signaling pathway, These GO terms and KEGG pathways seem to have closely association with the development of SRPO. We extended the biological meaning of these terms and their association with SRPO in the discussion.4.Response: According to the Reviewer’s suggestion, we extended the reason why we chose SVM-RFE and random forest algorithm for feature genes selection and how these two machine learning algorithms were performed in the “Methods and materials”. In this study, feature genes associated with SRPO were selected using SVM-RFE and RF. SVM-RFE was an efficient feature selection algorithm and had shown promising power in the analysis of the genomics, metabolomics], proteomics, etc. For instance, in the previous study, SVM-RFE had helped identifying biomarkers correlated with breast cancer prognosis (Li J, et.al Tumor Characterization in Breast Cancer Identifies Immune-Relevant Gene Signatures Associated with Prognosis. Front. Genet. 10:1119.). During the performance, the recursive feature elimination approach removed one feature with the smallest weight iteratively to a feature rank until all the features had been removed. In each iteration, the current SVM-RFE model was evaluated by the k-fold cross-validation. Finally, the classifier model with highest accuracy was constructed and the best variables were found. RF had also been widely used for detecting biomarkers in many diseases. The RF algorithm used the variables to construct numerous decision trees and generated the most accurate classes of variables to individual trees. Referring to other similar studies, we used SVM-RFE and RF machine learning algorithms to select feature genes of SRPO in this study. Moreover, we performed a combination strategy to minimize the possibility of losing important features through incorporating genes from two machine learning algorithms. Overlapping the feature genes selected by different machine learning algorithms to identify the most important feature genes is a very common processing method in many previous studies. SVM-RFE and RF machine learning algorithms screened characteristic variables based on different algorithm theories, so the number of feature genes screened out by these two machine learning algorithms were different. We believed that the common feature genes selected by both machine learning algorithms are important genes that are closely related to SRPO. In the following analysis, we found that the six overlapped feature genes show great diagnostic efficiency for SRPO, which indicated the reliability of our research methods. We hope the above explanation can address the reviewer’s comment.5.Response: 1) We are very grateful to the reviewer for the helpful comment and we made correction according to the reviewer’s suggestion. In this study, we aimed to investigate potential genetic biomarkers and the underlying molecular mechanism of smoking-related postmenopausal osteoporosis. We collected the gene expression data of 20 postmenopausal female smokers (10 with high BMD and 10 with low BMD). Smoking was the common exposure for these people. The difference between these people was whether osteoporosis occurs or not, which may be related to the smoking‐induced genetic alterations. The genetic biomarkers of SRPO should differentiate SRPO patients and non-osteoporosis postmenopausal female smokers. So, we selected 10 sample of postmenopausal female smokers with high BMD (non-osteoporosis) as the control group, 10 sample of postmenopausal female smokers with low BMD (osteoporosis) as the SRPO group in this study. 2) According to the suggestion of reviewer, we performed multiple-testing correction for the gene expression analysis. In the gene expression analysis of the selected feature genes, the t-test with multiple-testing correction (Benjamini–Hochberg) was used to test for significant differences between the two groups. As a result, the six feature genes, HNRNPC (p=0.0031), PFDN2 (p=0.0013), PSMC5 (p=0.0371), RPS16 (p=0.0202), TCEB2 (p=0.0073), and UBE2V2 (p=4.07e-05) was found statistically significant differentiated between the two groups. 3) According to the reviewer’s suggestion, we rephrased the description and discussion about the functional significance of the implicated genes in the discussion. Through extensive reading of relevant literatures, we explored the biological association between these feature genes and smoking-related postmenopausal osteoporosis, hoping to discover the potential mechanism of these genes in the pathogenesis of SRPO.Minor1.Response: We are very grateful to the reviewer for the comment and we have made corrections to this section. The accurate references were cited in the discussion to link the implicated genes/pathways to SRPO.2.Response: We apologize for the language problems in the original manuscript. We have submitted our manuscript to TOPEDIT (https://www.topeditsci.com/) for polishing as suggested by the reviewers.3.Response: We are sorry that our English writing level has troubled the reviewers. We have submitted our manuscript to TOPEDIT (https://www.topeditsci.com/) for polishing as suggested by the reviewers.4.Response: We are very grateful to the reviewer for this helpful suggestion. We have provided the original gene expression matrix, module information, PPI network, results of GO and KEGG enrichment analysis, and results of machine learning as supplementary files, based on the reviewers' recommendations.Submitted filename: Response to Reviewers.docxClick here for additional data file.23 Aug 2021PONE-D-21-08513R1Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learningPLOS ONEDear Dr. Wang,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.==============================ACADEMIC EDITOR:The revised manuscript is significantly improved. Reviewer 2's final comments regarding structure/organization should be addressed. The current version of the manuscript presents a highly fragmented narrative, and would benefit from stylistic edits (as suggested by Reviewer 2) to make it more readable.==============================Please submit your revised manuscript by Oct 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Jishnu Das, Ph.D.Academic EditorPLOS ONEJournal Requirements:Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.Additional Editor Comments (if provided):[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to QuestionsComments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.Reviewer #1: All comments have been addressedReviewer #2: (No Response)**********2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.Reviewer #1: YesReviewer #2: Yes**********3. Has the statistical analysis been performed appropriately and rigorously?Reviewer #1: YesReviewer #2: Yes**********4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: Yes**********5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.Reviewer #1: YesReviewer #2: No**********6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)Reviewer #1: The authors have adequately addressed my comments. In particular, I was glad to see that in their Discussion the authors have moderated the significance of their results with the limitations inherent in this initial study.Reviewer #2: The manuscript has been considerably improved. The authors addressed all technical comments well; the results are now scientifically sound and worth to be published.One last improvement needed is the writing / organization of the paper. The results section reads too thin to deliver the message fully. The major issue appears to be lacking of relevant discussion/interpretation following specific results. For example, result #2 "Construction of the PPI network" appears insufficient to be an independent section (it's more like a paragraph of Method). An easy fix could be combining it with result #3. Similarly, #5 "Diagnostic efficiency of feature genes" is also too thin - further interpretation is expected, possibly merge with result #4. On the other hand, the authors did a decent amount of work on explaining the specific nominated genes in the Discussion section. I'd suggest move these discussions to the Result section to alleviate the scarcity and better deliver the scientific implication of the analyses.**********7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.26 Aug 2021Replies to the reviewers’ comments:Reviewer #1:The authors have adequately addressed my comments. In particular, I was glad to see that in their Discussion the authors have moderated the significance of their results with the limitations inherent in this initial study.Response: We are grateful to Reviewer #1 for his/her effort reviewing our manuscript and his/her positive feedback. The summary of our work as written by this reviewer is precise. Thanks again for your support of our manuscript.Reviewer #2:The manuscript has been considerably improved. The authors addressed all technical comments well; the results are now scientifically sound and worth to be published.One last improvement needed is the writing / organization of the paper. The results section reads too thin to deliver the message fully. The major issue appears to be lacking of relevant discussion/interpretation following specific results. For example, result #2 "Construction of the PPI network" appears insufficient to be an independent section (it's more like a paragraph of Method). An easy fix could be combining it with result #3. Similarly, #5 "Diagnostic efficiency of feature genes" is also too thin - further interpretation is expected, possibly merge with result #4. On the other hand, the authors did a decent amount of work on explaining the specific nominated genes in the Discussion section. I'd suggest move these discussions to the Result section to alleviate the scarcity and better deliver the scientific implication of the analyses.Response: We appreciate Reviewer #2 for his/her effort to review our manuscript, and his/her positive feedback. The reviewer gives an accurate summary of our work and brings forward constructive questions. We have addressed them below. According to Reviewers’ suggestion, we combined result #2 with #3. Besides, we agreed with the comments that result #5 "Diagnostic efficiency of feature genes" is thin, and we have moved the explanation of the feature genes in the Discussion to result #5 according to the suggestion. We also extended the relationship of the identified feature genes in our study and smoking-related postmenopausal osteoporosis in the Discussion section. We sincerely hope our responses can address Reviewers’ comments.Submitted filename: Response to Reviewers.docxClick here for additional data file.31 Aug 2021Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learningPONE-D-21-08513R2Dear Dr. Wang,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Jishnu Das, Ph.D.Academic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:15 Sep 2021PONE-D-21-08513R2Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learningDear Dr. Wang:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Jishnu DasAcademic EditorPLOS ONE
Authors: Linda Broer; Mohammad Arfan Ikram; Maaike Schuur; Anita L DeStefano; Joshua C Bis; Fan Liu; Fernando Rivadeneira; Andre G Uitterlinden; Alexa S Beiser; William T Longstreth; Albert Hofman; Yurii Aulchenko; Sudha Seshadri; Annette L Fitzpatrick; Ben A Oostra; Monique M B Breteler; Cornelia M van Duijn Journal: J Alzheimers Dis Date: 2011 Impact factor: 4.472
Authors: David A Skerrett-Byrne; Elizabeth G Bromfield; Heather C Murray; M Fairuz B Jamaluddin; Andrew G Jarnicki; Michael Fricker; Ama T Essilfie; Bernadette Jones; Tatt J Haw; Daniel Hampsey; Amanda L Anderson; Brett Nixon; Rodney J Scott; Peter A B Wark; Matthew D Dun; Philip M Hansbro Journal: Respirology Date: 2021-07-05 Impact factor: 6.424
Authors: Jee Lee; Andrew J Sandford; John E Connett; Jin Yan; Tammy Mui; Yuexin Li; Denise Daley; Nicholas R Anthonisen; Angela Brooks-Wilson; S F Paul Man; Don D Sin Journal: PLoS One Date: 2012-04-25 Impact factor: 3.240