| Literature DB >> 31706344 |
Kerry E Poppenberg1,2, Kaiyu Jiang3, Lu Li4, Yijun Sun5,6, Hui Meng1,2,7,8, Carol A Wallace9, Teresa Hennon3, James N Jarvis10,11,12.
Abstract
BACKGROUND: The response to treatment for juvenile idiopathic arthritis (JIA) can be staged using clinical features. However, objective laboratory biomarkers of remission are still lacking. In this study, we used machine learning to predict JIA activity from transcriptomes from peripheral blood mononuclear cells (PBMCs). We included samples from children with Native American ancestry to determine whether the model maintained validity in an ethnically heterogeneous population.Entities:
Keywords: Juvenile idiopathic arthritis; Machine learning; Peripheral blood mononuclear cells; Transcriptome
Mesh:
Substances:
Year: 2019 PMID: 31706344 PMCID: PMC6842535 DOI: 10.1186/s13075-019-2010-z
Source DB: PubMed Journal: Arthritis Res Ther ISSN: 1478-6354 Impact factor: 5.606
Fig. 1Performance of four JIA prediction models in training and testing cohorts using all samples. a Mean accuracy, sensitivity, and specificity for four different modeling methods (KNN, RF, cSVM, gSVM) in training as assessed by tenfold cross-validation. All had accuracies > 78%. b ROC analysis in the training cohort demonstrated gSVM and RF provided best classifications with an AUC of 0.84. c Testing accuracy, sensitivity, and specificity for four models. These are true values based on the predicted class of testing samples. RF, cSVM, and gSVM had similar performance with accuracies of approximately 79%. d RF had the highest AUC (0.94) of the four models tested
Fig. 2Performance of four JIA prediction models in the training and testing cohorts using only European samples. a Mean accuracy, sensitivity, and specificity for four models in training as assessed by tenfold cross-validation. Accuracies ranged from 59 to 74%, with gSVM having the highest accuracy. b Similar performance is reflected in the ROC analysis of the training cohort. gSVM again had the best performance with an AUC of 0.72. c Improved accuracy, sensitivity, and specificity for four models are noted in the testing cohort as assessed by true predictions of testing samples. KNN, cSVM, and gSVM all achieved a testing accuracy of 91%. d All models had AUCs > 0.90 in the testing cohort
Transcripts selected by HSIC LASSO for model training using all samples and only European samples
| Whole dataset model | European model |
|---|---|
| ACAP3* | AC008267.1 |
| ARL2BP* | ACAP3* |
| CD97* | ARL2BP* |
| CEBPD | ARSA |
| FAM84B* | ATXN2L |
| GATAD1 | CCDC71 |
| HES5 | CCNA2 |
| HIST1H3E* | CD97* |
| IFNAR2 | CKAP4 |
| IL2RG | EPM2AIP1 |
| INPP5E* | FAM84B* |
| KAT8 | FANCF |
| KLF7 | GLE1 |
| LINS* | GSAP |
| MCFD2 | HIST1H3E* |
| MID1IP1 | INPP5E* |
| MRPL38* | KIF22 |
| MT-CO2 | L3MBTL2 |
| MT-CYB | LINS* |
| MT-ND4L | MAPK8IP1 |
| NSMAF | MRP63 |
| PAQR7 | MRPL38* |
| PNPLA2 | NME3 |
| PSME2 | OSMR |
| RPL23 | PPM1K |
| S100P | RANBP6 |
| SIAH2* | RLTPR |
| SPCS3* | SIAH2* |
| SRP14* | SPCS3* |
| SSNA1 | SRP14* |
| TCTA | TRIP13 |
| THAP1 | TXNL4B |
| UROD | USP51 |
| ZAP70 | |
| ZC3H12A |
Transcripts with an asterisk are identified in both models
Fig. 3Venn diagram depicting the overlap of model genes between all sample and European sample models. Eleven genes, approximately 1/3 of the model genes, were identified by feature selection process using all samples and only European samples
Fig. 4IPA networks using model genes from whole dataset analysis. Three networks were significant. Transcripts with increased expression in the ADT group are red, while transcripts with decreased expression in the ADT group are green. The color intensity represents fold change. a The first network (score = 46) associated with cancer, dermatological diseases and conditions, and hematological disease. b The second network (score = 22) associated with cell-to-cell signaling and interaction, cell morphology, and organismal injury. c The last network (score = 17) associated with protein synthesis, RNA damage and repair, and cancer