| Literature DB >> 34177888 |
Dmitry Rychkov1,2,3, Jessica Neely3, Tomiko Oskotsky1, Steven Yu4,5, Noah Perlmutter4, Joanne Nititham4, Alexander Carvidi4, Melissa Krueger6, Andrew Gross4, Lindsey A Criswell4,7,8,9, Judith F Ashouri4, Marina Sirota1,3.
Abstract
There is an urgent need to identify biomarkers for diagnosis and disease activity monitoring in rheumatoid arthritis (RA). We leveraged publicly available microarray gene expression data in the NCBI GEO database for whole blood (N=1,885) and synovial (N=284) tissues from RA patients and healthy controls. We developed a robust machine learning feature selection pipeline with validation on five independent datasets culminating in 13 genes: TNFAIP6, S100A8, TNFSF10, DRAM1, LY96, QPCT, KYNU, ENTPD1, CLIC1, ATP6V0E1, HSP90AB1, NCL and CIRBP which define the RA score and demonstrate its clinical utility: the score tracks the disease activity DAS28 (p = 7e-9), distinguishes osteoarthritis (OA) from RA (OR 0.57, p = 8e-10) and polyJIA from healthy controls (OR 1.15, p = 2e-4) and monitors treatment effect in RA (p = 2e-4). Finally, the immunoblotting analysis of six proteins on an independent cohort confirmed two proteins, TNFAIP6/TSG6 and HSP90AB1/HSP90.Entities:
Keywords: biomarker; blood; gene expression; machine learning; rheumatoid arthritis; synovium
Mesh:
Substances:
Year: 2021 PMID: 34177888 PMCID: PMC8223752 DOI: 10.3389/fimmu.2021.638066
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 8.786
Figure 1Study overview. (A) Public data collection, processing and DGE analysis. (B) Feature selection pipeline. (C) Gene list validation on the independent datasets. Introducing the RA Score as a geometric mean of validated genes and its association with clinical outcomes.
Figure 2DE genes overlapped between synovium and whole blood tissues. Top Reactome common and different pathways for (A) Up-regulated and (B) downregulated genes. (C) Venn diagram of up- and down-regulated genes in synovium and blood: 29 common up-regulated genes (p =3e-09) and 4 common downregulated genes (p = 0.28). (D) Comparison scatter plot of fold changes between common genes in synovium and blood. Heatmap and PCA plots of common genes in (E, F) synovium and (G, H) blood. Vertical bars in the heatmap plots represent the color-coded coefficients of variation, Pearson correlations and log2 fold changes.
Figure 3Cell type enrichment analysis for synovium and whole blood tissues. 30 cell types were significant (BH adj p-values < 0.05) in synovium and 20 were significant in whole blood with 11 common cell types between the tisuues. Heatmap plots of significantly enriched cell types in (A) synovium and (B) blood. The scatter plots comparing log10 transformed fold changes of significant cell types between synovium and blood in (C) discovery and (D) validation cohorts. The scatter plots comparing log10 transformed fold changes of significant cell types between discovery and validation cohorts in (E) synovium and (F) blood tissues, with Pearson correlation coefficient and it's p-value.
Figure 4Feature selected genes. (A) Mean AUC performance with standard error for each feature selected gene on testing synovium and blood datasets (green) and on five independent validation datasets (black). 13 genes with AUC greater than 0.8 on validation datasets were chosen as the best performing genes. Mean AUC performance with standard errors of a RF model trained on discovery blood data with (B) common DE genes and (C) feature selected genes on five independent validation datasets. The discovery-based classifiers were held fixed and used once on each validation dataset.
Summary of 13 validated RA Score Panel genes.
| Gene | Gene name | Regulation | Discovery Synovium | Discovery Blood | Validation | Protein Secretion | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| FC (FDR adj. p-value) | ρ (BH adj. p-value) | AUC | FC (FDR adj. p-value) | ρ (BH adj. p-value) | AUC | AUC | ||||
| TNFAIP6 | TNF Alpha Induced Protein 6 | up | 2.46 (4E-06) | 0.39 (7E-11) | 0.81 | 1.36 (8E-16) | 0.39 (3E-67) | 0.77 | 0.88 | Secreted in blood |
| S100A8 | S100 Calcium Binding Protein A8 | up | 2.28 (7E-05) | 0.34 (1E-08) | 0.81 | 1.46 (7E-32) | 0.48 (9E-108) | 0.81 | 0.94 | Secreted in blood |
| DRAM1 | DNA Damage Regulated Autophagy Modulator 1 | up | 1.55 (6E-07) | 0.46 (3E-15) | 0.93 | 1.18 (8E-15) | 0.41 (6E-76) | 0.79 | 0.81 | |
| TNFSF10 | TNF Superfamily Member 10 | up | 1.55 (1E-09) | 0.52 (3E-19) | 0.9 | 1.27 (1E-23) | 0.44 (4E-88) | 0.8 | 0.84 | Secreted in blood |
| LY96 | Lymphocyte Antigen 96 | up | 1.54 (1E-09) | 0.51 (2E-18) | 0.94 | 1.22 (7E-11) | 0.28 (2E-35) | 0.69 | 0.87 | Secreted in blood |
| QPCT | Glutaminyl-Peptide Cyclotransferase | up | 1.46 (4E-05) | 0.39 (7E-11) | 0.92 | 1.19 (4E-10) | 0.29 (1E-37) | 0.71 | 0.82 | Secreted in blood |
| KYNU | Kynureninase | up | 1.41 (5E-05) | 0.36 (1E-09) | 0.84 | 1.17 (2E-11) | 0.28 (3E-34) | 0.69 | 0.82 | Intracellular or membrane-bound |
| ENTPD1 | Ectonucleoside Triphosphate Diphosphohydrolase 1 | up | 1.33 (1E-08) | 0.52 (2E-19) | 0.94 | 1.21 (2E-16) | 0.4 (5E-71) | 0.78 | 0.86 | |
| CLIC1 | Chloride Intracellular Channel 1 | up | 1.32 (5E-08) | 0.47 (7E-16) | 0.91 | 1.2 (5E-27) | 0.47 (4E-103) | 0.84 | 0.8 | Intracellular or membrane-bound |
| ATP6V0E1 | ATPase H+ Transporting V0 Subunit E1 | up | 1.23 (3E-04) | 0.37 (8E-10) | 0.84 | 1.08 (4E-10) | 0.28 (3E-35) | 0.7 | 0.82 | |
| NCL | Nucleolin | down | 0.83 (2E-05) | -0.39 (4E-11) | 0.82 | 0.88 (4E-09) | -0.32 (2E-44) | 0.72 | 0.82 | |
| CIRBP | Cold Inducible RNA Binding Protein | down | 0.8 (3E-05) | -0.41 (4E-12) | 0.83 | 0.91 (2E-10) | -0.33 (2E-47) | 0.74 | 0.89 | |
| HSP90AB1 | Heat Shock Protein 90 Alpha Family Class B Member 1 | down | 0.79 (2E-04) | -0.37 (3E-10) | 0.82 | 0.84 (4E-12) | -0.36 (7E-56) | 0.73 | 0.8 | Intracellular or membrane-bound |
Figure 5Clinical Interpretation of the RA Score. (A) The RA Score distinguishes Healthy, OA and RA samples in synovium. (B) The RA Score distinguishes Healthy and polyJIA samples. The p-values were obtained using Student’s t-test.
Figure 6Validation of the RA Score proteins. (A, C) Immunoblot analysis of 6 RA Score proteins in unstimulated PBMC lysates from subjects with RA (n=4) and healthy controls (n=5). Data representative of 2 immunoblots. (B, D) Box plots are quantification of RA Score protein levels normalized to GAPDH pooled from 2 immunoblot experiments [as shown in (A, C) and ]; RA (n=8) and healthy control (n=7) samples. Significance determined by Mann-Whitney-Wilcoxon test, (B, D). Dotted line represents lanes removed from non-RA subjects, otherwise immunoblots a and d are montages of the same western blot.