| Literature DB >> 33626237 |
Yuri Ahuja1, Nicole Kim1, Tianxi Cai1,2, Zongqi Xia3, Liang Liang1, Tianrun Cai4, Kumar Dahal4, Thany Seyok4, Chen Lin5, Sean Finan5, Katherine Liao4, Guergana Savovoa5, Tanuja Chitnis6.
Abstract
OBJECTIVE: No relapse risk prediction tool is currently available to guide treatment selection for multiple sclerosis (MS). Leveraging electronic health record (EHR) data readily available at the point of care, we developed a clinical tool for predicting MS relapse risk.Entities:
Mesh:
Year: 2021 PMID: 33626237 PMCID: PMC8045951 DOI: 10.1002/acn3.51324
Source DB: PubMed Journal: Ann Clin Transl Neurol ISSN: 2328-9503 Impact factor: 4.511
Figure 1Study schematics. (A) data source of electronic health records and research registry data, training and validation set, (B) overall study workflow, and (C) two‐stage development of phenotyping and prediction model of MS relapse risk.
Figure 2Performance of models in predicting the future 1‐year MS relapse risk as measured by AUCs and F scores. ASRD (red), a baseline model comprising only basic clinical factors (age, sex, race/ethnicity, disease duration); ASRD + PheCode (dark blue), baseline model plus PheCode for MS; ASRD + EHR (light blue), baseline model plus selected EHR features that passed the feature selection process; ASRD + RH (dark green) and ASRD + RH+EHR (light green), baseline model plus actual prior 1‐year relapse history without and with selected EHR features, respectively; ASRD + RH^ (dark purple) and ASRD + RH^+EHR (light purple), baseline model plus two‐stage phenotyping and prediction model without and with selected EHR features in the prediction stage, respectively. RH^ (equivalent to ) denotes prior 1‐year relapse history imputed from EHR data using phenotyping algorithm rather than actual relapse history (RH). Models were developed using the training set and evaluated on the held‐out validation set. 95% confidence intervals were computed nonparametrically via bootstrap with 1000 replicates.
Demographics of the training and validation sets.
| Training set | Validation set |
| |
|---|---|---|---|
| Total number of patients | 1435 | 186 | NA |
| Sex, % Women | 73.9% | 74.2% | 0.924 |
| Race, % non‐Hispanic European | 85.9% | 84.9% | 0.719 |
| Median (IQR) age at first code | 43.3 (15.6) | 43.7 (16.0) | 0.109 |
| Median (IQR) age at first ICD code for MS | 43.3 (15.5) | 43.5 (16.2) | 0.151 |
| Median (IQR) disease duration, years | 5.12 (2.03) | 4.37 (2.82) | <0.0001 |
| Annualized relapse rate 2006–2016, mean (SD) | 0.075 (0.002) | 0.118 (0.009) | <0.0001 |
The first of any ICD, CPT, or CUI code in the EHR data.
Relapse type includes clinical, radiological, or both.
The training set derives entirely from the CLIMB cohort, whereas the validation set is a random sample of MS patients from the Mass General Brigham (formerly known as the Partners) healthcare system (77 from CLIMB, none in the training set).
Performance of models in predicting the future 1‐year MS relapse risk.
| Models | AUC |
|
|
| Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|---|---|---|
|
| 0.686 | 0.288 | 0.520 | 0.676 | 0.199 | 0.901 | ||
|
| 0.686 | 0.14 | 0.292 | 0.10 | 0.537 | 0.668 | 0.200 | 0.903 |
|
| 0.695 | 0.56 | 0.319 | 0.25 | 0.509 | 0.738 | 0.232 | 0.906 |
|
| 0.712 | <0.01 | 0.339 | <0.01 | 0.478 | 0.791 | 0.262 | 0.907 |
|
| 0.700 | 0.15 | 0.319 | 0.07 | 0.459 | 0.780 | 0.245 | 0.903 |
|
| 0.707 | <0.01 | 0.307 | <0.01 | 0.499 | 0.719 | 0.223 | 0.900 |
|
| 0.696 | 0.43 | 0.318 | 0.09 | 0.501 | 0.743 | 0.233 | 0.906 |
ASRD, a baseline model comprising only basic clinical factors (age, sex, race/ethnicity, disease duration); ASRD + PheCode, baseline model plus PheCode for MS; ASRD + EHR, baseline model plus selected EHR features that passed the feature selection process; ASRD + RH and ASRD + RH+EHR, baseline model plus actual prior 1‐year relapse history without and with selected EHR features, respectively; ASRD + RH^ and ASRD + RH^+EHR, baseline model plus the two‐stage phenotyping and prediction model without and with selected EHR features in the prediction stage, respectively. RH^ differs from RH in that the former denotes prior 1‐year relapse history imputed from EHR data using the phenotyping algorithm, whereas the latter denotes actual prior 1‐year relapse history. Models were developed using the training set and performance was evaluated on the held‐out validation set. AUC and F score of all models were compared to the baseline model (ASRD).
Comparison in AUC between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates.
Comparison in F score between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates.
Figure 3Heat map of pairwise correlations between prior relapse history (RH)‐predictive features selected by LASSO in the phenotyping stage.
Figure 4Receiver operating characteristic curves of models for predicting the future 1‐year MS relapse probability. See Figure 2 description of ASRD, ASRD + RH, and .
Figure 5Relapse trend. Proportion of patients experiencing actual MS relapse (red) and mean predicted future 1‐year relapse probability based on the two‐stage model (blue) as a function of MS disease duration (left) and patient age (right). 95% confidence intervals for the predictive model were computed nonparametrically via bootstrap with 1000 replicates.