Literature DB >> 33626237

Leveraging electronic health records data to predict multiple sclerosis disease activity.

Yuri Ahuja¹, Nicole Kim¹, Tianxi Cai^1,2, Zongqi Xia³, Liang Liang¹, Tianrun Cai⁴, Kumar Dahal⁴, Thany Seyok⁴, Chen Lin⁵, Sean Finan⁵, Katherine Liao⁴, Guergana Savovoa⁵, Tanuja Chitnis⁶.

Abstract

OBJECTIVE: No relapse risk prediction tool is currently available to guide treatment selection for multiple sclerosis (MS). Leveraging electronic health record (EHR) data readily available at the point of care, we developed a clinical tool for predicting MS relapse risk.
METHODS: Using data from a clinic-based research registry and linked EHR system between 2006 and 2016, we developed models predicting relapse events from the registry in a training set (n = 1435) and tested the model performance in an independent validation set of MS patients (n = 186). This iterative process identified prior 1-year relapse history as a key predictor of future relapse but ascertaining relapse history through the labor-intensive chart review is impractical. We pursued two-stage algorithm development: (1) L1 -regularized logistic regression (LASSO) to phenotype past 1-year relapse status from contemporaneous EHR data, (2) LASSO to predict future 1-year relapse risk using imputed prior 1-year relapse status and other algorithm-selected features.
RESULTS: The final model, comprising age, disease duration, and imputed prior 1-year relapse history, achieved a predictive AUC and F score of 0.707 and 0.307, respectively. The performance was significantly better than the baseline model (age, sex, race/ethnicity, and disease duration) and noninferior to a model containing actual prior 1-year relapse history. The predicted risk probability declined with disease duration and age.
CONCLUSION: Our novel machine-learning algorithm predicts 1-year MS relapse with accuracy comparable to other clinical prediction tools and has applicability at the point of care. This EHR-based two-stage approach of outcome prediction may have application to neurological disease beyond MS.

Entities: Chemical

Mesh：

Year: 2021 PMID： 33626237 PMCID： PMC8045951 DOI： 10.1002/acn3.51324

Source DB: PubMed Journal: Ann Clin Transl Neurol ISSN： 2328-9503 Impact factor: 4.511

Introduction

Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system (CNS) that causes progressive neurological disability. Currently available disease‐modifying treatments (DMTs) for MS target neuroinflammation and delay neurodegeneration primarily by reducing inflammatory disease activity or relapse. , There is growing awareness of the long‐term benefit of early initiation of DMTs. , , particularly higher‐efficacy DMTs in patients with a high likelihood of relapse and accelerated disability accrual consistent with aggressive MS. , , , , , The ability to predict a patient’s future relapse risk is crucial to guide the clinical decision on initiating higher‐efficacy DMTs, given the trade‐off of potential DMT‐associated adverse events and costs. Well‐established clinical predictors of future aggressive MS disease activity include older age at first neurological symptom onset, male sex, non‐European descent, and importantly, frequency, and severity of prior relapse. , Additional neuroimaging and laboratory predictors of relapse include gadolinium enhancement on magnetic resonance imaging (MRI) and low serum 25‐OH vitamin D. These factors each have modest power for predicting future relapse. While predictive models of neurological disability accrual are available, , to our knowledge, there has been no clinically deployable predictive model of future relapse that incorporates multiple predictors. Studies that predict MS outcomes predominantly rely on research registry data. Increasing analytical capability , has enabled the use of electronic health records (EHR) data to facilitate clinical discovery by providing complementary features otherwise unavailable from traditional research registries. We previously integrated research registry data from a well‐characterized, long‐term, clinic‐based cohort , with EHR data for developing EHR‐based models of MS classification and neurological disability. , Here, we leveraged clinical and associated EHR data to develop and test a clinically deployable model for predicting 1‐year relapse risk in MS patients.

Methods

Data source

We included data from January 2006 to December 2016 for 2375 participants ≥18 years of age with neurologist‐confirmed MS diagnosis in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Women’s Hospital (CLIMB) cohort in the Brigham Multiple Sclerosis Center (Boston). CLIMB participants have had at least one annual clinic visit. We additionally obtained all EHR data for 5482 MS patients from the Mass General Brigham (MGB, formerly known as the Partners) HealthCare system using our published MS classification algorithm, with 4565 receiving neurological care at the Brigham MS Center. The MGB IRB approved the use of research registry data and EHR data. For the training set, we included the 1435 CLIMB participants with linked EHR data, as previously described. For evaluation of model performance in a held‐out validation set, we used annotated relapse events for 186 randomly selected MS patients from the EHR cohort from the same time period who received neurological care at MGB (77 in CLIMB) but were not part of the training set. We assessed for potential selection bias arising from training the model exclusively on CLIMB patients by comparing its predictive performance on the 77 CLIMB patients to the 109 non‐CLIMB patients in the validation set and found no significant disparity between the subgroups. A research assistant performed the chart review according to CLIMB guidelines after extensive training and under the close supervision of an MS neurologist. Figure 1 describes the overall workflow.

Figure 1

Study schematics. (A) data source of electronic health records and research registry data, training and validation set, (B) overall study workflow, and (C) two‐stage development of phenotyping and prediction model of MS relapse risk.

Relapse data

We used relapse events, dates, and type, from the CLIMB registry (training set) and annotation (validation set). For this study, we defined a relapse event as a clinical and/or radiological relapse. Clinical relapse was defined as having new or recurrence of neurological symptoms lasting persistently for ≥24 h without fever or infection. Radiological relapse was defined as having either a new T1‐enhancing lesion and/or a new or enlarging T2‐FLAIR hyperintense lesion on brain, orbit, or spinal cord MRI on clinical radiology report.

EHR data

For each patient, we extracted relevant demographic and clinical information (i.e., age, sex, race/ethnicity, disease duration [years elapsed between the first MS diagnostic code and index encounter]) from the EHR data. We extracted all occurrences over time of the following codified variables: (1) diagnostic (International Classification of Disease 9th/10th edition, ICD‐9/10) codes; and (2) procedural (Current Procedural Terminology, CPT) codes. Using a published classification system that consolidates multiple related ICD codes of each unique medical condition, we mapped each ICD code to a single clinically informative condition represented by a “phenotype” code (PheCode). To mitigate sparsity, we consolidated CPT codes according to groupings defined by the American Medical Association, with the exception of certain MRI procedures (orbit, brain, and spine) because of relevance to MS. From free‐text clinical narratives (e.g., outpatient encounters, radiology reports, discharge summaries), we extracted patient‐level counts of all clinical terms mapped to concept unique identifiers (CUIs) using the Natural Language Processing (NLP)‐based clinical Text Analytics and Knowledge Extraction System (cTAKES). Only positive mentions of CUIs were included.

Feature selection and data preprocessing

We first derived an EHR algorithm for predicting 1‐year relapse history using all available EHR features. From a list of 2726 features consisting of PheCode, CPT, and CUI occurrences within a 1‐week period of a given index patient encounter, we first screened for potentially informative features by fitting marginal logistic regression models to identify features significantly associated with relapse. We removed features with insignificant P‐values after adjustment using the Benjamini–Hochberg procedure with a false discovery rate of 0.1. This screening procedure identified a few hundred features to be included in further algorithm development. For each positively screened feature, we aggregated total counts over the prior 1‐year period and log‐transformed these counts. From the gold‐standard CLIMB registry, we separately extracted the number of relapses each patient had in the prior 1‐year period, described as 1‐year relapse history (RH). We also experimented with extracting EHR data and relapse information in the prior 6‐month and 2‐year period. While past 1‐year relapse history yielded the most accurate prediction of future 1‐year relapse, predictive performance reassuringly appeared mostly insensitive to the choice of the training period length. The main objective of the study was to predict a patient’s future probability of relapse within one year using EHR feature counts and demographic information rather than RH, as RH is often not readily available at the point of care. To prepare algorithm development, we classified a patient encounter as a case if the patient had a relapse within 1 year after the index date, and as a control otherwise. To avoid overcounting closely occurring encounters, we randomly sampled one encounter per nonoverlapping 3‐month time window for inclusion in the final preprocessed dataset. To mitigate sparsity, we eliminated features with prevalence <5% in both case and control groups. In this preprocessed dataset, each patient had multiple index timepoints over the intersection of the study inclusion period and the patient’s records.

Prediction of future 1‐year relapse probability

We use N, T(i), and p to represent the number of patients, number of timepoints for patient i, and number of features in the preprocessed dataset, respectively. We denote the complete feature vector for patient i in time period (t‐1y, t) as X = (X , ,1,…, X)ʹ and X = (X ,1,…, X ( ))ʹ. We let X = (ASRD, EHR), where ASRD i denotes age, sex, race/ethnicity, and disease duration, whereas EHR denotes aggregated EHR features in the prior 1‐year period. Furthermore, we use Y to indicate whether patient i has a relapse in time period (t,t + 1y), and let Y = (Y ,1, …, Y(i))ʹ. Finally, we represent the prior 1‐year relapse history of patient i as RH i,t and let RH i = (RH i,1, …, RH , ( ))ʹ. We use (X, Y, RH) and (X, Y test, RH) to designate nonoverlapping training and validation sets, respectively, for model development and independent evaluation (Fig. 1B). We predicted Y using a two‐stage procedure (Fig. 1C). In the first stage (phenotyping for RH), we predicted contemporaneous relapse within the same 1‐year period as the EHR features by fitting an L1‐regularized (LASSO) linear regression to {X, log (RH + 1)}. We used log(RH + 1) instead of log(RH) such that RH = 0 would yield a log relapse count of 0 rather than negative infinity. We optimized the LASSO regularization hyperparameter λ using 10‐fold cross‐validation to maximize Spearman correlation with the true count RH. We use to denote the LASSO‐predicted past 1‐year log relapse count for patient i at timepoint t, and let . We further experimented with two alternative models for imputing RH: (1) LASSO logistic regression predicting I (RH > 0) (i.e., at least 1 relapse) and (2) LASSO Poisson regression predicting RH. Poisson regression assumes that the outcome follows a Poisson rather than a normal distribution (as in standard linear regression). We selected the model with the best performance to impute . In the second stage (prediction), we predicted future 1‐year relapse by fitting a LASSO logistic regression to (A) , and (B) . Model (A) used age, sex, race, and disease duration plus to predict Y, whereas Model (B) includes the features in Model (A) and all EHR features that passed the feature selection process. Importantly, neither model used the actual prior relapse history to predict future relapse, because is a function of EHR but not RH.

Model evaluation

To report model performance in the validation set, we computed AUC as well as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F score, using a time‐dependent threshold set at the observed prevalence among observations within ±1 year of a patient’s time since the first MS relapse. AUC, sensitivity, and specificity are agnostic to outcome prevalence (which is relatively low in this study), whereas PPV, NPV, and F score (i.e., the harmonic mean of sensitivity and PPV) depend on outcome prevalence. We compared the two‐stage phenotyping‐prediction model to three LASSO logistic regression models trained without relapse history (model 1–3) and two models trained with relapse history (model 4–5): (1) ASRD alone, (2) ASRD + MS PheCode (335), (3) ASRD + EHR, (4) ASRD + RH, and (5) ASRD + EHR (Fig. 2). We obtained the standard error estimates, 95% confidence intervals, and P values for comparisons of all models to the baseline ASRD model nonparametrically by bootstrapping with 1000 replicates.

Figure 2

Performance of models in predicting the future 1‐year MS relapse risk as measured by AUCs and F scores. ASRD (red), a baseline model comprising only basic clinical factors (age, sex, race/ethnicity, disease duration); ASRD + PheCode (dark blue), baseline model plus PheCode for MS; ASRD + EHR (light blue), baseline model plus selected EHR features that passed the feature selection process; ASRD + RH (dark green) and ASRD + RH+EHR (light green), baseline model plus actual prior 1‐year relapse history without and with selected EHR features, respectively; ASRD + RH^ (dark purple) and ASRD + RH^+EHR (light purple), baseline model plus two‐stage phenotyping and prediction model without and with selected EHR features in the prediction stage, respectively. RH^ (equivalent to ) denotes prior 1‐year relapse history imputed from EHR data using phenotyping algorithm rather than actual relapse history (RH). Models were developed using the training set and evaluated on the held‐out validation set. 95% confidence intervals were computed nonparametrically via bootstrap with 1000 replicates.

Data availability

Code for analysis and figure generation is available at https://tinyurl.com/MS‐Relapse‐Prediction. Anonymous data that support the findings of this study are available upon request to the corresponding author.

Results

Patient characteristics

MS patients in the training and validation sets were comparable, specifically with respect to the percentage of women, percentage of self‐reported non‐Hispanic Europeans, median age at the first MS diagnosis code, and median age at the first occurrence of (any) ICD, CPT, or CUI code in the EHR data, whereas the disease duration was slightly shorter in the validation set (Table 1). From 2000 to 2016, the annualized relapse rate (clinical and/or radiological) was overall low, though the training set (0.074 ± 0.003) was marginally lower than the validation set (0.116 ± 0.017).

Table 1

Demographics of the training and validation sets.

	Training set*	Validation set*	P‐value
Total number of patients	1435	186	NA
Sex, % Women	73.9%	74.2%	0.924
Race, % non‐Hispanic European	85.9%	84.9%	0.719
Median (IQR) age at first code ¹	43.3 (15.6)	43.7 (16.0)	0.109
Median (IQR) age at first ICD code for MS	43.3 (15.5)	43.5 (16.2)	0.151
Median (IQR) disease duration, years	5.12 (2.03)	4.37 (2.82)	<0.0001
Annualized relapse rate 2006–2016, mean (SD) ²	0.075 (0.002)	0.118 (0.009)	<0.0001

The first of any ICD, CPT, or CUI code in the EHR data.

Relapse type includes clinical, radiological, or both.

The training set derives entirely from the CLIMB cohort, whereas the validation set is a random sample of MS patients from the Mass General Brigham (formerly known as the Partners) healthcare system (77 from CLIMB, none in the training set).

Demographics of the training and validation sets. The first of any ICD, CPT, or CUI code in the EHR data. Relapse type includes clinical, radiological, or both. The training set derives entirely from the CLIMB cohort, whereas the validation set is a random sample of MS patients from the Mass General Brigham (formerly known as the Partners) healthcare system (77 from CLIMB, none in the training set).

Prediction of 1‐year relapse probability

The primary objective was to develop models to predict the future risk of relapse within 1 year. As measured by both AUC and F score, a model comprising basic clinical features (age, sex, race/ethnicity, and disease duration) and prior 1‐year relapse history (ASRD + RH ) performed the best in predicting future 1‐year relapse risk (Fig. 2, Table 2), significantly better than the baseline ASRD model, reflecting the important predictive value of prior relapse history. The addition of EHR features that passed the feature selection screening process to this model () diminished AUC and F score while markedly widening 95% confidence intervals (Fig. 2). Finally, the model comprising basic clinical features and selected EHR features () without prior 1‐year relapse history did not significantly improve AUC or F score over the baseline model (ASRD) while also widening confidence intervals.

Table 2

Performance of models in predicting the future 1‐year MS relapse risk.

Models ¹	AUC	P ²	F score	P ³	Sensitivity	Specificity	PPV	NPV
ASRD	0.686		0.288		0.520	0.676	0.199	0.901
ASRD+PheCode	0.686	0.14	0.292	0.10	0.537	0.668	0.200	0.903
ASRD+EHR	0.695	0.56	0.319	0.25	0.509	0.738	0.232	0.906
ASRD+RH	0.712	<0.01	0.339	<0.01	0.478	0.791	0.262	0.907
ASRD+RH+EHR	0.700	0.15	0.319	0.07	0.459	0.780	0.245	0.903
ASRD+RH^	0.707	<0.01	0.307	<0.01	0.499	0.719	0.223	0.900
ASRD+RH^+EHR	0.696	0.43	0.318	0.09	0.501	0.743	0.233	0.906

ASRD, a baseline model comprising only basic clinical factors (age, sex, race/ethnicity, disease duration); ASRD + PheCode, baseline model plus PheCode for MS; ASRD + EHR, baseline model plus selected EHR features that passed the feature selection process; ASRD + RH and ASRD + RH+EHR, baseline model plus actual prior 1‐year relapse history without and with selected EHR features, respectively; ASRD + RH^ and ASRD + RH^+EHR, baseline model plus the two‐stage phenotyping and prediction model without and with selected EHR features in the prediction stage, respectively. RH^ differs from RH in that the former denotes prior 1‐year relapse history imputed from EHR data using the phenotyping algorithm, whereas the latter denotes actual prior 1‐year relapse history. Models were developed using the training set and performance was evaluated on the held‐out validation set. AUC and F score of all models were compared to the baseline model (ASRD).

Comparison in AUC between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates.

Comparison in F score between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates.

Performance of models in predicting the future 1‐year MS relapse risk. ASRD, a baseline model comprising only basic clinical factors (age, sex, race/ethnicity, disease duration); ASRD + PheCode, baseline model plus PheCode for MS; ASRD + EHR, baseline model plus selected EHR features that passed the feature selection process; ASRD + RH and ASRD + RH+EHR, baseline model plus actual prior 1‐year relapse history without and with selected EHR features, respectively; ASRD + RH^ and ASRD + RH^+EHR, baseline model plus the two‐stage phenotyping and prediction model without and with selected EHR features in the prediction stage, respectively. RH^ differs from RH in that the former denotes prior 1‐year relapse history imputed from EHR data using the phenotyping algorithm, whereas the latter denotes actual prior 1‐year relapse history. Models were developed using the training set and performance was evaluated on the held‐out validation set. AUC and F score of all models were compared to the baseline model (ASRD). Comparison in AUC between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates. Comparison in F score between each model and the baseline model (ASRD). P‐values were computed nonparametrically via bootstrap with 1000 replicates. Next, we built a phenotyping algorithm to impute past relapse history using EHR data for subsequent input into the future relapse risk prediction model, as we aimed to predict future relapse probability without using the actual relapse history RH. With the finding that prior 1‐year relapse history is an important predictor of future relapse risk, we also recognized that relapse history is often unavailable at the point of care while chart review is time consuming. In the first part of a two‐stage model (phenotyping), we used selected EHR features (EHR) to impute contemporaneous RH for subsequent use in future relapse risk prediction. For this stage, LASSO linear regression achieved the highest AUC (0.790) and Spearman correlation (0.487) in the validation set (Table S1). As such, we used this model to impute . Among the 205 features that passed the feature selection screening process, the LASSO phenotyping algorithm selected 111 features (12 CPT codes, 60 CUIs, and 35 PheCodes) as informative of RH (Table S2). We found that age and disease duration were inversely associated with contemporaneous relapse. On the other hand, the CPT code for “MRI spine,” PheCodes for “optic neuritis,” and “other demyelinating diseases of the CNS,” and CUIs for “intravenous steroid injection”, “Lhermitte’s sign,” and “flare” were positively associated with relapse, consistent with clinical experience. When examining the Spearman correlations among the 111 selected variables (Fig. 3), we found the vast majority of features to have pairwise correlations in the range of 0–0.2, suggesting that these variables conveyed sufficiently nonredundant information.

Figure 3

Heat map of pairwise correlations between prior relapse history (RH)‐predictive features selected by LASSO in the phenotyping stage.

Heat map of pairwise correlations between prior relapse history (RH)‐predictive features selected by LASSO in the phenotyping stage. Notably, the two‐stage model comprising basic clinical features and the imputed prior 1‐year relapse history based on the EHR‐based phenotyping algorithm () achieved an AUC and F score of 0.707 and 0.307, respectively, both significantly higher than the baseline model ASRD (AUC, P < 0.01; F score, P < 0.01) (Fig. 2, Table 2). Moreover, both the AUC and F score of the model were statistically noninferior to those of the model containing actual prior relapse history (), as ascertained nonparametrically via bootstrap (AUC, P = 0.27; F score, P = 0.38). The ROC curve demonstrated that performed the best when setting the threshold for either high sensitivity (>~0.9) or high specificity (>~0.9), suggesting that it is best suited as either a high‐sensitivity screening tool or a high‐specificity prognostic algorithm (Fig. 4). The two‐stage model ( ) also exhibited markedly narrower 95% confidence intervals than or , suggesting that using EHR to impute RH in the phenotyping stage rather than in the final prediction model mitigated the variance‐increasing effect of the high‐dimensional EHR feature set (Fig. 2). The final prediction model () was driven by just three factors: age, disease duration, and prior 1‐year relapse history imputed from EHR data (see coefficients in Table S3). We demonstrated sample implementations of the model as applied to one low‐risk patient (who may benefit from standard‐efficacy DMT or perhaps no DMT and infrequent monitoring of MS disease activity) and one high‐risk patient (who may benefit from early initiation of higher‐efficacy DMT and frequent monitoring of disease activity) (Fig. S2).

Figure 4

Receiver operating characteristic curves of models for predicting the future 1‐year MS relapse probability. See Figure 2 description of ASRD, ASRD + RH, and .

Calibration of relapse risk probabilities

To evaluate the utility of the two‐stage model predictions as relapse probabilities rather than risk scores, we compared 1‐year predicted relapse probability to the proportions of patients experiencing an actual relapse in the same 1‐year, stratified by disease duration and patient age. We found that the actual 1‐year relapse proportion declined significantly over disease duration (coefficient = −0.150, 95% CI [−0.191, −0.108], P < 2 × 10−16) and age (coefficient = −0.054, 95% CI [−0.065, −0.043], P < 2 × 10−16) (Fig. 5, Table S4), consistent with the notion that inflammatory disease activity in MS diminishes over time. In parallel, the predicted relapse probabilities also significantly declined with both disease duration (coefficient = −0.226, 95% CI [−0.242, −0.211], P < 2 × 10−16) and age (coefficient = −0.060, 95% CI [−0.064, −0.056], P < 2 × 10−16). The mean rates of decline in predicted relapse probability over age and disease duration were comparable to that of the actual relapse proportion. These results support the utility of the two‐stage model as an unbiased predictor of 1‐year relapse risk. By effectively leveraging EHR information to predict relapse, the two‐stage model allows for a more precise, personalized prediction of risk than a predictor using age and disease duration information alone.

Figure 5

Relapse trend. Proportion of patients experiencing actual MS relapse (red) and mean predicted future 1‐year relapse probability based on the two‐stage model (blue) as a function of MS disease duration (left) and patient age (right). 95% confidence intervals for the predictive model were computed nonparametrically via bootstrap with 1000 replicates.

Supplementary analysis

We developed a two‐stage model for predicting 2‐year relapse risk (Tables S5 and S6, Fig. S1, Supplementary Material). While this model outperformed baseline predictors, only the AUC improvement was statistically significant. We performed two exploratory analyses to (1) demonstrate the stability of the model trained on data from 2006 to 2016 and (2) quantify the improvement of the two‐stage model over baseline model in PPV and NPV when setting the threshold to achieve >95% specificity and >95% sensitivity as well as the number of additional high‐risk and low‐risk patients correctly identified per 100 tested for future 1‐year relapse probability (Supplementary Material).

Discussion

The ability to accurately predict future relapse at the point of care will improve clinical decision making, particularly in selecting MS treatment. We report a novel two‐stage model for predicting a patient’s 1‐year MS relapse risk that incorporates imputed prior relapse history based on EHR data. This model does not require knowledge of a MS patient’s prior relapse frequency, a key predictor often unavailable at the point of care. Achieving clinically actionable accuracy (AUC = 0.707), this final model performed significantly better than baseline models and was noninferior to a predictive model containing actual relapse history. Furthermore, the model‐predicted relapse probability declined with disease duration and patient age similar to trends seen with actual relapse proportion, suggesting that it produces a clinically meaningful estimate of a patient’s relapse risk over the course of this chronic disease. This study builds on our prior work on integrating EHR and research registry data to develop high‐dimensional models for not just classifying MS diagnosis but also estimating a key measure of neurological disability in MS that is not part of routine medical records (the multiple sclerosis severity score). Following the latest developments in EHR data analytics for phenotyping disease outcomes, our approach leverages the rich complexity of the available EHR data by incorporating a variety of codified and narrative variables in our algorithm. The novelty of the two‐stage model lies in using high‐dimensional EHR data to impute the key predictor (past relapse history, which captures important aspects of individual disease profile) in the phenotyping stage rather than in the future relapse prediction stage. This method mitigates variance increase due to the high feature dimensionality of the EHR data while preserving accuracy, bypassing the practical bottleneck of the labor‐intensive chart review process for ascertaining prior relapse history and improving the explicability of the final model. While the improvement over the baseline model is modest, the final prediction model achieved performance comparable to other clinical prediction algorithms. For comparison, the classic Framingham Risk Score for predicting coronary heart disease has an AUC in the 0.6–0.75 range. , The final relapse prediction model comprised only three familiar factors (age, disease duration, and imputed number of relapses in the prior year). These predictors are consistent with prior literature. , In planning the model development, we consciously avoided including DMT history among the potential features because we plan to use the predicted relapse risk as outcomes in future analyses evaluating efficacy in reducing relapse across DMTs and because the inclusion of specific DMTs might limit its future application given the ever‐growing number of DMT options. We also did not include MRI features as we originally focused on building a parsimonious model comprising clinical predictors readily available from the EHR data. We plan to incorporate MRI features in future iterations of the model. This study faces a limitation of selection bias and potential generalizability. The relapse prediction algorithm was developed using participants from a research cohort (CLIMB) and tested on patients within the same tertiary academic hospital system (MGB). Given that routine EHR data rarely capture recorded relapse events systematically, using research registry data to train models of relapse prediction is a necessity. Additional validation in other healthcare settings is warranted. If externally validated, the relapse risk prediction model can be integrated at the point of care to systematically identify MS patients at high risk of relapse and alert clinicians in selecting the appropriate DMTs. In summary, our novel model predicts 1‐year MS relapse risk with accuracy comparable to other clinical prediction algorithms and with potential applicability at the point of care. Our EHR‐based two‐stage approach for MS relapse imputation and temporal relapse prediction may have application to other complex neurological outcomes apart from MS.

Conflict of Interest

The authors have declared that no conflict of interest relevant to this study exists. File S1 . Supplementary material on the development and performance of a two‐stage model for predicting future 2‐year relapse probability, exploratory analysis of the stability of the two‐stage future 1‐year relapse probability prediction model from 2006 to 2016, and exploratory analysis of the improvement of the two‐stage model future 1‐year relapse probability prediction model over the baseline model. Table S1 . Performance of models for the phenotyping or prediction of contemporaneous 1‐year MS relapse history. Table S2 . LASSO linear regression coefficients of the phenotyping algorithm for imputing contemporaneous 1‐year relapse history in the two‐stage model. Table S3 . LASSO logistic regression coefficients of the prediction algorithm for future 1‐year relapse probability in the two‐stage model. Table S4 . Logistic regression coefficients and Spearman correlations of actual relapse proportions and predicted relapse probabilities predicted by the two‐stage model over disease duration and age. Table S5 . Performance of models in predicting the future 2‐year MS relapse risk. Table S6 . Performance of models for the phenotyping stage of imputing contemporaneous 2‐year MS relapse history. Figure S1 . Performance of models in predicting the future 2‐year MS relapse risk as measured by AUCs and F scores. Figure S2 . Sample implementation of the two‐stage model of relapse prediction in two representative patients. Click here for additional data file.

27 in total

1. Early versus later treatment start in multiple sclerosis: a register-based cohort study.

Authors: T A Chalmer; L M Baggesen; M Nørgaard; N Koch-Henriksen; M Magyari; P S Sorensen
Journal: Eur J Neurol Date: 2018-07-09 Impact factor: 6.089

2. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment.

Authors: Robert J Carroll; Lisa Bastarache; Joshua C Denny
Journal: Bioinformatics Date: 2014-04-14 Impact factor: 6.937

3. Early clinical markers of aggressive multiple sclerosis.

Authors: Charles B Malpas; Ali Manouchehrinia; Sifat Sharmin; Izanne Roos; Dana Horakova; Eva Kubala Havrdova; Maria Trojano; Guillermo Izquierdo; Sara Eichau; Roberto Bergamaschi; Patrizia Sola; Diana Ferraro; Alessandra Lugaresi; Alexandre Prat; Marc Girard; Pierre Duquette; Pierre Grammond; Francois Grand'Maison; Serkan Ozakbas; Vincent Van Pesch; Franco Granella; Raymond Hupperts; Eugenio Pucci; Cavit Boz; Youssef Sidhom; Riadh Gouider; Daniele Spitaleri; Aysun Soysal; Thor Petersen; Freek Verheul; Rana Karabudak; Recai Turkoglu; Cristina Ramo-Tello; Murat Terzi; Edgardo Cristiano; Mark Slee; Pamela McCombe; Richard Macdonell; Yara Fragoso; Javier Olascoaga; Ayse Altintas; Tomas Olsson; Helmut Butzkueven; Jan Hillert; Tomas Kalincik
Journal: Brain Date: 2020-05-01 Impact factor: 13.501

4. Evaluation of no evidence of disease activity in a 7-year longitudinal multiple sclerosis cohort.

Authors: Dalia L Rotstein; Brian C Healy; Muhammad T Malik; Tanuja Chitnis; Howard L Weiner
Journal: JAMA Neurol Date: 2015-02 Impact factor: 18.302

Review 5. Relapses in multiple sclerosis: Relationship to disability.

Authors: Douglas S Goodin; Anthony T Reder; Robert A Bermel; Gary R Cutter; Robert J Fox; Gareth R John; Fred D Lublin; Claudia F Lucchinetti; Aaron E Miller; Daniel Pelletier; Michael K Racke; Bruce D Trapp; Timothy Vartanian; Emmanuelle Waubant
Journal: Mult Scler Relat Disord Date: 2015-09-08 Impact factor: 4.339

6. A comparison of the Framingham and European Society of Cardiology coronary heart disease risk prediction models in the normative aging study.

Authors: James L Orford; Howard D Sesso; Margaret Stedman; David Gagnon; Pantel Vokonas; J Michael Gaziano
Journal: Am Heart J Date: 2002-07 Impact factor: 4.749

7. Timing of high-efficacy therapy for multiple sclerosis: a retrospective observational cohort study.

Authors: Anna He; Bernd Merkel; James William L Brown; Lana Zhovits Ryerson; Ilya Kister; Charles B Malpas; Sifat Sharmin; Dana Horakova; Eva Kubala Havrdova; Tim Spelman; Guillermo Izquierdo; Sara Eichau; Maria Trojano; Alessandra Lugaresi; Raymond Hupperts; Patrizia Sola; Diana Ferraro; Jan Lycke; Francois Grand'Maison; Alexandre Prat; Marc Girard; Pierre Duquette; Catherine Larochelle; Anders Svenningsson; Thor Petersen; Pierre Grammond; Franco Granella; Vincent Van Pesch; Roberto Bergamaschi; Christopher McGuigan; Alasdair Coles; Jan Hillert; Fredrik Piehl; Helmut Butzkueven; Tomas Kalincik
Journal: Lancet Neurol Date: 2020-03-18 Impact factor: 44.182

Review 8. Multiple Sclerosis: Mechanisms and Immunotherapy.

Authors: Clare Baecher-Allan; Belinda J Kaskow; Howard L Weiner
Journal: Neuron Date: 2018-02-21 Impact factor: 17.173

9. Predictive value of gadolinium-enhanced magnetic resonance imaging for relapse rate and changes in disability or impairment in multiple sclerosis: a meta-analysis. Gadolinium MRI Meta-analysis Group.

Authors: L Kappos; D Moeri; E W Radue; A Schoetzau; K Schweikert; F Barkhof; D Miller; C R Guttmann; H L Weiner; C Gasperini; M Filippi
Journal: Lancet Date: 1999-03-20 Impact factor: 79.321

10. Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

Authors: Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane
Journal: BMJ Date: 2015-04-24

4 in total

1. Temporal trends of multiple sclerosis disease activity: Electronic health records indicators.

Authors: Liang Liang; Nicole Kim; Jue Hou; Tianrun Cai; Kumar Dahal; Chen Lin; Sean Finan; Guergana Savovoa; Mattia Rosso; Mariann Polgar-Tucsanyi; Howard Weiner; Tanuja Chitnis; Tianxi Cai; Zongqi Xia
Journal: Mult Scler Relat Disord Date: 2021-10-24 Impact factor: 4.339