| Literature DB >> 34158089 |
T D Maarseveen1, M P Maurits1, E Niemantsverdriet1, A H M van der Helm-van Mil1, T W J Huizinga1, R Knevel2.
Abstract
BACKGROUND: Electronic health records (EHRs) offer a wealth of observational data. Machine-learning (ML) methods are efficient at data extraction, capable of processing the information-rich free-text physician notes in EHRs. The clinical diagnosis contained therein represents physician expert opinion and is more consistently recorded than classification criteria components.Entities:
Keywords: Artificial intelligence; Big data; Chart review; Classification criteria; EHR; Electronic health records; Machine learning algorithms; Observational research; Rheumatoid arthritis
Mesh:
Year: 2021 PMID: 34158089 PMCID: PMC8218515 DOI: 10.1186/s13075-021-02553-4
Source DB: PubMed Journal: Arthritis Res Ther ISSN: 1478-6354 Impact factor: 5.156
Fig. 1Study workflow depicting the model development and evaluation procedure (orange section) and the criteria comparison analysis (blue section), whereby the important analysis-steps are highlighted in green. We ran the SVM identification on all 17,662 patients of the Rheumatology outpatient clinic. Next, we selected only those patients that were also included in the EAC (n=1188) and who were annotated for the 1987 and 2010 criteria, resulting in a final selection of 1127 patients for the final analysis. The patient collections are indicated by a wave line box, whereby the initial two data sources are colored red (rheumatology outpatient clinic = patients from the Leiden outpatient clinic with the first consult after 2011; early arthritis cohort = research cohort patients with the first consult after 2011).
Performance characteristics for different cutoffs of the SVM ML RA identification score in the independent test set
| 0.53 | 0.83 | 0.99 | |
|---|---|---|---|
| Sens | 0.93 | 0.85 | 0.71 |
| Spec | 0.97 | 0.99 | 1.00 |
| PPV | 0.75 | 0.86 | 0.94 |
| NPV | 0.99 | 0.99 | 1.00 |
ML machine learning, SVM support vector machine, RA rheumatoid arthritis, PPV positive predictive value, NPV negative predictive value
Fig. 2A Receiver operating characteristics plotting the sensitivity against the specificity and B precision-recall curve plotting the positive predictive value (precision) against the sensitivity (recall) for the support vector machine classifier in the independent test set. The precise features (top 20) that constitute the support vector machine model can be found in the original study by Maarseveen et al (2020 )[4]
Fig. 3Upset plot visualizing the intersections of the ML-defined cohort and the 2 criteria-based selections, with a bar chart depicting the total cohort size in the bottom-left where C1987 = 1987 criteria-based cases, ML = machine learning-based cases, and C2010 = 2010 criteria-based cases. N = 539 unique cases out of 1127 records
Comparison of baseline characteristics between the machine learning defined case selection (cutoff=0.83) and the two criteria based selections
| Patients from the cohort with EHR data and classification data | |||
|---|---|---|---|
| Predicted case based on machine learning (cutoff=0.83) | 1987 criteria Based cases | 2010 criteria | |
| N☨ | 373 | 357 | 426 |
| Proportion women | 0.65 | 0.63 | 0.66 |
| Proportion anti-CCP2-positive | 0.52 | 0.49 | 0.49 |
| Proportion RF-positive | 0.56 | 0.57 | 0.58 |
| Median DAS44 at baseline | 2.8 | 2.9 | 2.9 |
| Median BMI | 26.0 | 25.6 | 25.6 |
| Median ESR | 25 | 29 | 27 |
| Median CRP | 9.5 | 10.2 | 9.0 |
| Median age at inclusion | 57.2 | 58.6 | 57.2 |
| Median symptom duration at diagnosis (days) | 92.0 | 90.0 | 91.0 |
| Median number of swollen joints | 5 | 6 | 6 |
P values were calculated with the Pearson chi-squared for proportions, Mann-Whitney U for medians: *p<0.05; **p<0.01, ***p<0.001; ☨Not statistically tested
Baseline characteristics of the cases exclusively identified by the ML and those exclusively identified by either the 1987 or 2010 criteria
| Only-ML not criteria | Only-criteria not ML | |
|---|---|---|
| N☨ | 50 | 166 |
| Proportion women | 0.60 | 0.63 |
| Proportion anti-CCP2-positive | 0.07 | 0.16 |
| Proportion RF-positive | 0.17 | 0.34 |
| Median DAS44 at baseline | 2.52 | 2.74 |
| Median BMI | 26.6 | 25.6 |
| Median ESR | 19.0 | 25.0 |
| Median CRP | 9.35 | 8 |
| Median age at inclusion | 55.7 | 58.9 |
| Median symptom duration at diagnosis (days) | 66 | 78.0 |
| Median number of swollen joints | 4.0 | 4.0 |