| Literature DB >> 30621359 |
Nguyen Phuoc Long1, Seongoh Park2, Nguyen Hoang Anh3, Jung Eun Min4, Sang Jun Yoon5, Hyung Min Kim6, Tran Diem Nghi7, Dong Kyu Lim8, Jeong Hill Park9, Johan Lim10, Sung Won Kwon11.
Abstract
Introducing novel biomarkers for accurately detecting and differentiating rheumatoid arthritis (RA) and osteoarthritis (OA) using clinical samples is essential. In the current study, we searched for a novel data-driven gene signature of synovial tissues to differentiate RA from OA patients. Fifty-three RA, 41 OA, and 25 normal microarray-based transcriptome samples were utilized. The area under the curve random forests (RF) variable importance measurement was applied to seek the most influential differential genes between RA and OA. Five algorithms including RF, k-nearest neighbors (kNN), support vector machines (SVM), naïve-Bayes, and a tree-based method were employed for the classification. We found a 16-gene signature that could effectively differentiate RA from OA, including TMOD1, POP7, SGCA, KLRD1, ALOX5, RAB22A, ANK3, PTPN3, GZMK, CLU, GZMB, FBXL7, TNFRSF4, IL32, MXRA7, and CD8A. The externally validated accuracy of the RF model was 0.96 (sensitivity = 1.00, specificity = 0.90). Likewise, the accuracy of kNN, SVM, naïve-Bayes, and decision tree was 0.96, 0.96, 0.96, and 0.91, respectively. Functional meta-analysis exhibited the differential pathological processes of RA and OA; suggested promising targets for further mechanistic and therapeutic studies. In conclusion, the proposed genetic signature combined with sophisticated classification methods may improve the diagnosis and management of RA patients.Entities:
Keywords: diagnostic biomarker; machine learning; meta-analysis; osteoarthritis; pathway analysis; rheumatoid arthritis
Year: 2019 PMID: 30621359 PMCID: PMC6352223 DOI: 10.3390/jcm8010050
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Figure 1Workflow of variable selection and machine learning classification. (a) Variable selection workflow. (b) Classification analysis workflow. RF: random forests; SVM: support vector machines; kNN: k nearest neighbors. GSE55457, GSE55584, and GSE55235 are from the same study and were used for modeling and validation of the data-driven proposed biomarkers.
The 16-gene signature derived from AUC-RF based variable selection.
| Entry ID | Approved Symbol | Approved Name | Chromosomal Location | Combined Effects Size 1 |
|---|---|---|---|---|
| 7111 |
| Tropomodulin 1 | 9q22.33 | −2.16 |
| 10248 |
| POP7 homolog, ribonuclease P/MRP subunit | 7q22.1 | −0.87 |
| 6442 |
| Sarcoglycan alpha | 17q21.33 | −2.74 |
| 3824 |
| Killer cell lectin like receptor D1 | 12p13 | 1.35 |
| 240 |
| Arachidonate 5-lipoxygenase | 10q11.21 | 1.36 |
| 57403 |
| RAB22A, member RAS oncogene family | 20q13.32 | −1.20 |
| 288 |
| Ankyrin 3 | 10q21.2 | −1.92 |
| 5774 |
| Protein tyrosine phosphatase, non-receptor type 3 | 9q31 | −1.31 |
| 3003 |
| Granzyme K | 5q11.2 | 2.92 |
| 1191 |
| Clusterin | 8p21.1 | −2.17 |
| 3002 |
| Granzyme B | 14q12 | 2.61 |
| 23194 |
| F-box and leucine rich repeat protein 7 | 5p15.1 | −0.93 |
| 7293 |
| TNF receptor superfamily member 4 | 1p36.33 | 1.26 |
| 9235 |
| Interleukin 32 | 16p13.3 | 2.73 |
| 439921 |
| Matrix remodeling associated 7 | 17q25.1 | −2.19 |
| 925 |
| CD8a molecule | 2p11.2 | 2.83 |
1: Adopted from the the section of functional meta-analysis, which was independent to the biomarker selection.
Figure 2PCA and heatmap analysis of the data set with the 16-gene signature. (a) The sum of the two principal component is 62.3%. (b) Seven genes are upregulated, and nine genes are downregulated in RA compared to OA. PCA: Principal component analysis; RA: Rheumatoid arthritis; OA: osteoarthritis.
Figure 3Properties of the random forests model. (a) The ROC curve, accuracy, sensitivity, and specificity of the external the test set of the optimal random forest model. (b) The top 10 most important features of the optimal random forests model on the training set.
Figure 4The use of LIME to explain the model′s predictions. (a) Corrected classification of an OA sample. (b) Corrected classification of an RA sample. (c) Noncorrected classification of a sample from OA to RA. RA: Rheumatoid arthritis; OA: osteoarthritis.
Figure 5Summarization of the classification analyses. (a) Confusion matrices of the five classifiers on the test set of RA versus OA. (b) Prediction performances on the test sets of RA versus normalcy and OA versus normalcy. RA: Rheumatoid arthritis; OA: osteoarthritis.
Representative Gene Ontology (GO) biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) enriched pathways of differentially expressed (DE) genes between RA and OA.
| ID | Annotation | False Discovery Rate | RA versus OA |
|---|---|---|---|
| GO.0006955 | Immune response | 8.86 × 10−50 | Upregulation |
| GO.0050776 | Regulation of immune response | 1.44 × 10−41 | Upregulation |
| GO.0002376 | Immune system process | 2.64 × 10−41 | Upregulation |
| GO.0006952 | Defense response | 6.94 × 10−39 | Upregulation |
| GO.0002684 | Positive regulation of immune system process | 4.52 × 10−36 | Upregulation |
| GO.0048731 | System development | 8.90 × 10−12 | Downregulation |
| GO.0007275 | Multicellular organismal development | 3.68 × 10−11 | Downregulation |
| GO.0044767 | Single-organism developmental process | 2.74 × 10−9 | Downregulation |
| GO.0048856 | Anatomical structure development | 1.29 × 10−8 | Downregulation |
| GO.0051239 | Regulation of multicellular organismal process | 5.14 × 10−8 | Downregulation |
| KEGG.4650 | Natural killer cell mediated cytotoxicity | 1.27 × 10−16 | Upregulation |
| KEGG.5340 | Primary immunodeficiency | 1.27 × 10−16 | Upregulation |
| KEGG.4060 | Cytokine-cytokine receptor interaction | 3.14 × 10−16 | Upregulation |
| KEGG.4064 | NF-kappa B signaling pathway | 1.26 × 10−14 | Upregulation |
| KEGG.4062 | Chemokine signaling pathway | 1.33 × 10−11 | Upregulation |
| KEGG.561 | Glycerolipid metabolism | 0.0142 | Downregulation |
| KEGG.5202 | Transcriptional misregulation in cancer | 0.0276 | Downregulation |