Jiangang Cao1,2, Han Ding1,2, Jun Shang1,2, Lei Ma1,2, Qi Wang1,2, Shiqing Feng1,2. 1. Department of Orthopedics, Tianjin Medical University General Hospital, Tianjin, China. 2. International Science and Technology Cooperation Base of Spinal Cord Injury, Tianjin Key Laboratory of Spine and Spinal Cord Injury, Department of Orthopedics, Tianjin Medical University General Hospital, Tianjin, China.
Abstract
BACKGROUND: The incidence of osteoarthritis (OA), a chronic degenerative disease, is increasing every year. There is no effective clinical treatment for OA and the pathological mechanism remains unclear. Early diagnosis is an effective strategy to control the progress of OA. In this study, we aimed to identify potential early diagnostic biomarkers. METHODS: We downloaded the gene expression profile dataset, GSE51588 and GSE55235, from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) public database. The differentially expressed genes (DEGs) were screened out using the "limma" R package. Weighted gene co-expression network analysis (WGCNA) was utilized to build the co-expression network between the normal and OA samples. A Venn diagram was constructed to detect the hub genes. Potential molecular mechanisms and signaling pathways were enriched by gene set variation analysis (GSVA). Single sample gene set enrichment analysis (ssGSEA) was used to identify the immune infiltration of OA. RESULTS: We screened out three hub genes based on WGCNA and DEGs in this study. GSVA results showed that nuclear factor interleukin-3 (NFIL3) was related to tumor necrosis factor alpha (TNF-α) signaling via nuclear factor kappa-B (NF-κB), the reactive oxygen species pathway, and myelocytomatosis (MYC) targets v2. Highly-expressed ADM (adrenomedullin) pathways included TNF-α signaling via NF-κB, the reactive oxygen species pathway, and ultraviolet (UV) response up. OGN (osteoglycin)-enriched pathways included epithelial mesenchymal transition, coagulation, and peroxisome. CONCLUSIONS: We identified three hub genes (NFIL3, ADM, and OGN) that were correlated to the development and progression of OA, which may provide new biomarkers for early diagnosis. 2021 Annals of Translational Medicine. All rights reserved.
BACKGROUND: The incidence of osteoarthritis (OA), a chronic degenerative disease, is increasing every year. There is no effective clinical treatment for OA and the pathological mechanism remains unclear. Early diagnosis is an effective strategy to control the progress of OA. In this study, we aimed to identify potential early diagnostic biomarkers. METHODS: We downloaded the gene expression profile dataset, GSE51588 and GSE55235, from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) public database. The differentially expressed genes (DEGs) were screened out using the "limma" R package. Weighted gene co-expression network analysis (WGCNA) was utilized to build the co-expression network between the normal and OA samples. A Venn diagram was constructed to detect the hub genes. Potential molecular mechanisms and signaling pathways were enriched by gene set variation analysis (GSVA). Single sample gene set enrichment analysis (ssGSEA) was used to identify the immune infiltration of OA. RESULTS: We screened out three hub genes based on WGCNA and DEGs in this study. GSVA results showed that nuclear factor interleukin-3 (NFIL3) was related to tumor necrosis factor alpha (TNF-α) signaling via nuclear factor kappa-B (NF-κB), the reactive oxygen species pathway, and myelocytomatosis (MYC) targets v2. Highly-expressed ADM (adrenomedullin) pathways included TNF-α signaling via NF-κB, the reactive oxygen species pathway, and ultraviolet (UV) response up. OGN (osteoglycin)-enriched pathways included epithelial mesenchymal transition, coagulation, and peroxisome. CONCLUSIONS: We identified three hub genes (NFIL3, ADM, and OGN) that were correlated to the development and progression of OA, which may provide new biomarkers for early diagnosis. 2021 Annals of Translational Medicine. All rights reserved.
Osteoarthritis (OA) is a degenerative disease that is caused by numerous factors, including aging, obesity, strain, trauma, congenital abnormality, joint deformity, and so on (1,2). OA is common in middle-aged and elderly people, especially in weight-bearing joints (knee and hip). With the aging of the population, the incidence of OA is increasing every year, which becomes a leading cause of disability and confers a heavy burden on the both patient’s family and society (3). OA is recognized as having a complicated pathological process and a long course of disease. At present, diagnosis based on clinical manifestations is the standard to confirm OA, while early diagnosis and treatment intervention remain challenging (4,5). Some studies have shown that biomarkers provide useful diagnostic information by detecting cartilage degradation in OA, reflecting the biological activities related to the disease, and predicting the process of disease progression. Therefore, exploring diagnostic biomarkers of OA may have important clinical application value (6,7).In recent years, an increasing number of studies have shown that immune cell infiltration plays an important role in the development of OA (8,9). Pro-inflammatory cytokines secreted by infiltrated immune cells can aggravate chondrocyte apoptosis and cartilage matrix proteolysis (10,11). However, the pathological mechanism of immune cell infiltration in the process of OA is still unclear. Although exploring the pathological mechanism of osteoarthritis can provide potential therapeutic targets for treatment (12,13), it is more significant to explore the biomarkers for early diagnosis (14). Therefore, evaluating the infiltration of immune cells and determining the different composition of infiltrating immune cells might aid in the development of new diagnostic biomarkers and immunotherapy targets.With the evolution of high-throughput sequencing technology, microarray has been used in a large number of OA studies (15-17). Weighted gene co-expression network analysis (WGCNA) involves the construction of a network based on systematic gene expression levels and analyzes gene expression microarray profiling data, with the aim of determining the co-expression relationship between genes. Thus, genes with similar expression patterns may be co-regulated, functionally related, or in the same pathway (18,19). In this study, we used microarray data from OA patients and normal control subjects to analyze differentially expressed genes (DEGs) and immune cell infiltration by WGCNA, in order to identify effective biomarkers for the early diagnosis in order to control the progress of OA. We present the following article in accordance with the MDAR reporting checklist (available at https://dx.doi.org/10.21037/atm-21-4566).
Methods
Microarray data sources and processing
The series matrix file, GSE51588, was downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) public database (20). There were 40 groups of human OA data (including 20 LT groups and 20 MT groups) and 10 groups of human non-OA data (including five LT groups and five MT groups) for the WGCNA co-expression network construction. The series matrix file data file, GSE55235, was also downloaded from the NCBI GEO public database (21). There were data from 20 transcriptional groups, including human non-OA groups (n=10) and human OA groups (n=10), which were used for subsequent model validation. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
DEG identification and WGCNA
We screened out the DEGs using the “limma” R package (https://bioconductor.org/packages/release/bioc/html/limma.html) with P<0.05 & |log2 fold change (FC)| >2 (22). By building a weighted gene co-expression network, we could find the gene modules of co-expression, and explore the relationship between the gene network and phenotype, as well as the core genes in the network. We used the R package ‘WGCNA’ (http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA) to build the co-expression network of all genes in the GSE51588 dataset (19). The first 5,000 genes were screened by this algorithm for further analysis; the soft threshold was set to 12, and the soft threshold of GSE55235 was set to 4. The weighted adjacency matrix was converted into the topological overlap matrix (TOM) in order to estimate the network connectivity, and a hierarchical clustering method was used to generate a cluster tree structure of the TOM matrix (23). Different branches of the cluster tree represented the different gene modules, and different colors represented the different modules. In view of the weighted correlation coefficient of genes, the genes were classified based on the expression patterns. Genes showing similar patterns were divided into one same module, and thousands of genes were divided into multiple modules via the gene expression patterns. Finally, the overlapping hub module genes from GSE51588 and GSE55235 were identified by the Venn method.
Immune gene correlation analysis
The effect of genes on immune infiltration was evaluated. The level of immune cell infiltration in each sample was quantified by single sample gene set enrichment analysis (ssGSEA) (24). Spearman correlation analysis was performed on gene expression and immune cell content.
Gene set variation analysis (GSVA) analysis
GSVA is a nonparametric and unsupervised method for assessing transcriptome gene set enrichment (25). GSVA changes the gene level into the pathway level by comprehensively scoring the gene set of interest, and then evaluates the biological function of the sample. In this study, to evaluate the potential biological function changes of different samples, we downloaded gene sets from the Molecular signatures database (v7.4) (http://www.gsea-msigdb.org/gsea/msigdb/), and used the GSVA algorithm to comprehensively score each gene set.
GeneMANIA analysis
GeneMANIA (http://www.genemania.org) is a flexible and user-friendly protein-protein interaction (PPI) network construction database, which is used to visualize the functional network between genes and analyze gene functions and interactions (26). The website can set the data sources of gene nodes, and has a variety of bioinformatics analysis methods, such as physical interaction, gene co-expression, gene co-location, gene enrichment analysis, and website prediction. In this study, geneMANIA generated the core gene network to explore its possible mechanism in patients with OA.
Statistical analysis
All statistical analysis was conducted with R language (version 3.6.3) (https://cran.r-project.org/bin/windows/base/old/3.6.3/). Statistical analysis was performed using the two-tailed Student’s t-test. P<0.05 was considered to indicate a statistically significant difference.
Results
Important DEGs in GSE51588 and WGCNA
The GSE51588 disease-related dataset was downloaded from the NCBI GEO public database to identify DEGs between the normal (n=10) and disease (n=40) groups. We further calculated the DEGs between the two groups using the “limma” R package. DEGs were screened out on the basis of the threshold of P<0.05 & |log2 fold change (FC)| >2. A total of 95 DEGs were screened out, including 24 up-regulated genes and 71 down-regulated genes (). According to the patients’ clinical characteristics, the WGCNA network was further constructed to detect the key modules in OA. The soft threshold β was determined by the “sft$powerEstimate" function and set to 12 (). Based on the TOM, 12 gene modules were detected in this analysis, which were blue [1,166], cyan [186], grey [621], light cyan [692], light green [76], light yellow [59], magenta [262], midnight blue [149], pink [508], purple [553], red [283], and turquoise [445] (). Through module-trait correlations analysis, we found that the pink module had the highest correlation with clinical features (cor =0.69, P=2e-08), and thus, the pink module was selected for subsequent verification ().
Figure 1
Identification of DEGs in the GSE51588 dataset and construction of the WGCNA. (A) Volcano plot of DEGs in GSE51588; (B) clustering dendrogram of 50 samples; (C,D) soft-threshold power for WGCNA; (E,F) dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM); (G) Heatmap of correlation between the ME and gene modules; (H) A scatterplot of gene significance vs. MM in the pink module. DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network analysis; ME, module eigengenes; MM, module membership.
Identification of DEGs in the GSE51588 dataset and construction of the WGCNA. (A) Volcano plot of DEGs in GSE51588; (B) clustering dendrogram of 50 samples; (C,D) soft-threshold power for WGCNA; (E,F) dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM); (G) Heatmap of correlation between the ME and gene modules; (H) A scatterplot of gene significance vs. MM in the pink module. DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network analysis; ME, module eigengenes; MM, module membership.
Important DEGs in GSE55235 and WGCNA
We downloaded the GSE55235 disease-related dataset from the NCBI GEO public database to identify DEGs between OA patients (n=10) and controls (n=10) using the “limma” R package. Based on the threshold of P<0.05 & |log2 FC|>2, 164 DEGs were screened out, including 84 up-regulated genes and 80 down-regulated genes (). Similarly, the WGCNA network was constructed to detect the key modules in OA based on the clinical characteristics of patients. The soft threshold β was determined by “sft$powerEstimate” function and set to 4 (). Based on the TOM, eight gene modules were detected, including black [1,173], brown [776], green [363], green yellow [70], magenta [107], purple [79], red [2039], and yellow [393] (). Furthermore, through correlation analysis of the modules and traits, we found that red module had the highest correlation with clinical features (cor =0.94, P=1e-09), and thus, the red module was selected for subsequent verification ().
Figure 2
Identification of DEGs in the GSE55235 dataset and construction of the WGCNA. (A) Volcano plot of DEGs in GSE55235; (B) clustering dendrogram of 20 samples; (C,D) soft-threshold power for WGCNA; (E,F) dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM); (G) Heatmap of correlation between the ME and gene modules; (H) a scatterplot of gene significance vs. MM in the red module. DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network analysis; ME, module eigengenes; MM, module membership.
Identification of DEGs in the GSE55235 dataset and construction of the WGCNA. (A) Volcano plot of DEGs in GSE55235; (B) clustering dendrogram of 20 samples; (C,D) soft-threshold power for WGCNA; (E,F) dendrogram of DEGs clustered based on a dissimilarity measure (1-TOM); (G) Heatmap of correlation between the ME and gene modules; (H) a scatterplot of gene significance vs. MM in the red module. DEGs, differentially expressed genes; WGCNA, weighted gene co-expression network analysis; ME, module eigengenes; MM, module membership.
Identification of hub genes and diagnostic efficacy verification
A Venn diagram was used to detect the core genes in both datasets. Three hub DEGs were identified, including nuclear factor interleukin-3 (NFIL3), adrenomedullin (ADM), and osteoglycin (OGN) (). Receiver operating characteristic (ROC) curve assessment and the area under curve (AUC) value were utilized to verify the diagnostic efficacy. The results showed that the AUC values of the three core genes were greater than 0.8 (NFIL3: AUC =0.957; ADM: AUC =0.900; OGN: AUC =0.815), indicating that these three hub genes could be effective indicators for clear diagnosis ().
Figure 3
Identification of hub genes and diagnostic efficacy verification. (A) The Venn diagram shows the intersecting genes of GSE51588 and GSE55235; (B-D) ROC curve analysis and AUC statistics were applied to verify diagnostic efficacy of NFIL3, ADM, and OGN separately. ROC, receiver operating characteristic; AUC, area under the curve.
Identification of hub genes and diagnostic efficacy verification. (A) The Venn diagram shows the intersecting genes of GSE51588 and GSE55235; (B-D) ROC curve analysis and AUC statistics were applied to verify diagnostic efficacy of NFIL3, ADM, and OGN separately. ROC, receiver operating characteristic; AUC, area under the curve.
GSVA of core genes and key pathways
Next, we explored the potential molecular mechanism and investigated the specific signaling pathways enriched in the three core genes by GSVA. The GSVA results showed that highly-expressed NFIL3 was related to tumor necrosis factor alpha (TNF-α) signaling via nuclear factor kappa-B (NF-κB), the reactive oxygen species (ROS) pathway, and myelocytomatosis (MYC) targets v2 (). Highly-expressed ADM pathways included TNF-α signaling via NF-κB, the ROS pathway, and ultraviolet (UV) response up (). OGN-enriched pathways included epithelial mesenchymal transition, coagulation, and peroxisome ().
Figure 4
GSVA of core genes and key pathways. (A) NFIL3-enriched pathways; (B) ADM-enriched pathways; (C) OGN-enriched pathways. The blue band represents a positive correlation and the green band represents a negative correlation. GSVA, gene set variation analysis; ADM, adrenomedullin; OGN, osteoglycin.
GSVA of core genes and key pathways. (A) NFIL3-enriched pathways; (B) ADM-enriched pathways; (C) OGN-enriched pathways. The blue band represents a positive correlation and the green band represents a negative correlation. GSVA, gene set variation analysis; ADM, adrenomedullin; OGN, osteoglycin.
Differences in immune infiltration and PPI analysis
The inflammatory microenvironment of OA is mainly composed of immune cells, extracellular matrix, various growth factors, inflammatory factors, and special physical and chemical characteristics, which significantly affect the sensitivity of disease diagnosis and treatment. By analyzing the relationship between core genes and immune infiltration, we investigated the potential mechanism of core genes affecting disease progression. The results showed that there were strong correlations between the three hub genes and immune cells (), which was in line with expectations. The PPI network involved in the three core genes is shown in .
Figure 5
Immune infiltration and PPI analysis. (A) Correlation between immune cells and hub genes. Red, positive; blue, negative; (B) the PPI network of hub genes. PPI, protein-protein interaction.
Immune infiltration and PPI analysis. (A) Correlation between immune cells and hub genes. Red, positive; blue, negative; (B) the PPI network of hub genes. PPI, protein-protein interaction.
Discussion
OA is a common chronic joint disease in the elderly, which is characterized by the degeneration of articular cartilage, thickening of synovial lining, subchondral sclerosis, and formation of osteophyte at the edges of joints (27). There is no effective clinical treatment for OA and the pathological mechanism remains unclear. Therefore, understanding the molecular mechanism and pathological process of OA is critical. A comprehensive investigation of OA, including the pathological process, early clinical diagnosis, clinical manifestations, clinical prevention, and drug treatment, requires systematic analysis. Since normal cartilage samples, as a control group, are limited, microarray data in the GEO database is sparse and the samples are often insufficient. In this study, we obtained microarray data from GSE51588 and GSE55235 to expand sample size and focused on identifying biological markers related to early diagnosis, for the first time.Compared with other bioinformatics analyses, the advantage of WGCNA lies in the comprehensive study of the relationship between co-expression modules and clinical features, which provides highly reliable and biologically significant results (28). We screened out three hub genes based on the WGCNA and DEGs in this study, including NFIL3, ADM, and OGN. Meanwhile, based on functional enrichment via GSVA in this study, it was suggested that TNF-α signaling via NF-κB and the ROS pathway play important roles in the progression of OA, which has been confirmed in other studies. Chen et al. found that spermine can reduce the progression of OA by inhibiting the TNF-α/NF-κB pathway (29). It has also been reported that TNF-α plays a major pathological role in rheumatoid arthritis (RA) via NF-κB and the Janus kinase (JAK)/signal transducer and activator of transcription (STAT) pathway (30). Numerous studies have shown that ROS or oxidative stress regulates the intracellular signal transduction process, chondrocyte aging and apoptosis, synthesis and degradation of extracellular matrix, synovitis, and subchondral bone dysfunction (31-33).Macrophages have been reported playing a crucial role in the progression of OA (34). Especially, M1 macrophages, known as pro-inflammatory phenotype, have an adverse effect on chondrocyte apoptosis which leading to cartilage degeneration (10). Besides, T cell, B cell and NK cell, which belong to innate immune system, are also defined as integral cells involved in the process of OA (35-37). Mesenchymal stem cells (MSCs) have pluripotent mesoderm differentiation potential and can differentiate into a variety of cell types, including bone cells, chondrocytes, muscle cells, and adipocytes (38). Studies have shown that MSCs can exert their ability to promote tissue repair by releasing paracrine factors, mainly a variety of growth factors, immunoregulatory cytokines and other nutrient mediators (39,40). It has been reported that MSCs could reduce chondrocyte apoptosis, inhibit the inflammatory response and provide a suitable microenvironment for cartilage repair (41,42). Besides, extracellular vesicles derived from MSCs also show the crucial effect on attenuating cartilage degeneration (43,44).As for the three DEGs, only one study reported the correlation between NFIL3 and OA, which may be helpful to reveal the similarities and differences in the pathogenesis of femoral head necrosis and cartilage injury in hip OA (45). A series of studies revealed the role of OGN in OA, which suggested that highly-expressed OGN could promote the ossification and reconstruction of articular cartilage (46-48). Other studies suggested that ADM was regulated by TNF-α, and ADM down-regulation could inhibit adipogenesis and synthesis of osteocytes, which may provide new treatment strategies for RA and OA (49-51).Taken together, these three identified hub genes provide important insights into the potential molecular mechanisms of OA, which may reveal new biomarkers for early diagnosis.The article’s supplementary files as
Authors: Muhammad Farooq Rai; Linda J Sandell; Toby N Barrack; Lei Cai; Eric D Tycksen; Simon Y Tang; Matthew J Silva; Robert L Barrack Journal: Cartilage Date: 2018-09-03 Impact factor: 4.634
Authors: B J E de Lange-Brokaar; A Ioan-Facsinay; G J V M van Osch; A-M Zuurmond; J Schoones; R E M Toes; T W J Huizinga; M Kloppenburg Journal: Osteoarthritis Cartilage Date: 2012-09-07 Impact factor: 6.576