| Literature DB >> 24244385 |
Zongqi Xia1, Elizabeth Secor, Lori B Chibnik, Riley M Bove, Suchun Cheng, Tanuja Chitnis, Andrew Cagan, Vivian S Gainer, Pei J Chen, Katherine P Liao, Stanley Y Shaw, Ashwin N Ananthakrishnan, Peter Szolovits, Howard L Weiner, Elizabeth W Karlson, Shawn N Murphy, Guergana K Savova, Tianxi Cai, Susanne E Churchill, Robert M Plenge, Isaac S Kohane, Philip L De Jager.
Abstract
OBJECTIVE: To optimally leverage the scalability and unique features of the electronic health records (EHR) for research that would ultimately improve patient care, we need to accurately identify patients and extract clinically meaningful measures. Using multiple sclerosis (MS) as a proof of principle, we showcased how to leverage routinely collected EHR data to identify patients with a complex neurological disorder and derive an important surrogate measure of disease severity heretofore only available in research settings.Entities:
Mesh:
Year: 2013 PMID: 24244385 PMCID: PMC3823928 DOI: 10.1371/journal.pone.0078927
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overall approach for developing EHR algorithm to classify multiple sclerosis (A), and to derive surrogate measures of brain parenchymal fraction and multiple sclerosis severity score in MS patients (B).
Performance of the four models of the EHR algorithm for identifying multiple sclerosis patients (at 95% specificity).
| Model | Sensitivity (SE) | PPV (SE) | NPV (SE) | AUC (SE) |
|
| 0.600 (0.058) | 0.894 (0.013) | 0.769 (0.029) | 0.890 (0.013) |
|
| 0.764 (0.038) | 0.916 (0.007) | 0.849 (0.023) | 0.937 (0.010) |
|
| 0.758 (0.034) | 0.914 (0.006) | 0.849 (0.021) | 0.941 (0.008) |
|
| 0.827 (0.024) | 0.921 (0.006) | 0.888 (0.017) | 0.958 (0.006) |
The ICD model uses the number of ICD-9 code for MS as the only variable. The Codified (COD) model includes codified variables in addition to the number of ICD-9 code for MS. The NLP model includes narrative variables extracted from clinical texts. The combined (ALL) model uses both codified and narrative variables. Performance parameters are calculated using 0.632 bootstrap cross-validation in the training set. The standard errors are estimated based on 1,000 bootstrap replications.
Abbreviations: AUC, area under the curve; NPV, negative predictive value; PPV, positive predictive value; SE, standard error of the estimates.
Characteristics of the EHR-derived cohort of multiple sclerosis (MS) patients and the subset of patients who receive care at a subspecialty MS Centera.
| Parameter | EHR-derived Cohort (n = 5,495) | MS Center Subset (n = 4,241) |
| Sex (% female) | 73% | 73% |
| Race/Ethnicity (% non-Hispanic white) | 72% | 75% |
| Age at first ICD-9 code for MS, years (median [Q1–Q3]) | 41 [33–49] | 40 [32–49] |
| Duration of follow-up, years (median [Q1–Q3]) | 8.4 [3.5–13.7] | 9.1 [4.3–14.4] |
| Number of ICD-9 code for MS per patient (median [Q1–Q3]) | 22 [8–49] | 26 [9–55] |
| Number of MRI brain per patient (median [Q1–Q3]) | 6 | 8 |
| Number of MRI cervical spine per patient (median [Q1–Q3]) | 4 | 4 |
| Number of entries by a MS neurologist per patient(median [Q1–Q3]) | 47 [13–124] | 56 [12–139] |
| Number of prescriptions for MS disease modifyingtreatment per patient (median [Q1–Q3]) | 5 (2–13) | 6 (3–15) |
| Receiving MS disease modifying treatment, % | 49% | 55% |
A subset of the patients in the EHR-derived MS cohort receives neurological care at the Partners MS Center where neuroimaging and clinical outcomes are available. For comparison, our cohort shares similar basic demographic characteristics as an independent MS patient registry from the North American Research Committee on Multiple Sclerosis (NARCOMS): 73% of the NARCOMS patients are female, 90% are self-described White, mean age at diagnosis is 37 years, and 52% of the patients are receiving immune modulatory therapy [32].
Abbreviation: ICD-9 = 340 is the diagnostic code for MS.
Figure 2Density distribution of the performance (adjusted R2) of the EHR algorithm for deriving brain parenchymal fraction (A), and multiple sclerosis severity score (B).
Performance is measured as variance that explains the correlation between the derived and true outcomes after adjusting for the number of variables in the model.
EHR-derived MS severity score (MSSS) captures the difference between progressive MS and relapsing-remitting MS patients.
| Discovery Set | Validation Set | |||||
| Outcome | PPMS/SPMS (n = 34) Mean (SE) | RRMS (n = 295) Mean (SE) |
| PPMS/SPMS(n = 25) Mean (SE) | RRMS (n = 188) Mean (SE) |
|
| Observed MSSS | 3.86 (0.27) | 0.86 (0.10) | 1.55E-23 | 4.36 (0.29) | 0.73 (0.11) | 3.32E-26 |
| Derived MSSS | 2.90 (0.18) | 0.98 (0.06) | 9.42E-23 | 3.22 (0.18) | 1.10 (0.07) | 1.56E-12 |
Observed MSSS is based on actual data from MS patients who receive care at the Partners MS Center. Derived MSSS is based on algorithm with 40% frequency cut-off for EHR variables.
Patients with known MS disease category were divided into a discovery set (n = 329, including 34 PPMS/SPMS patients and 295 RRMS patients) and a validation set (n = 213, including 25 PPMS/SPMS and RRMS patients). For the observed measure of MSSS, ANOVA was performed and the comparison was adjusted for sex, age of symptom onset and disease duration as covariates. For derived surrogate measure of MSSS, t-test was performed. The effects of sex, age of symptom onset, and disease duration are accounted for in the derivation of the surrogate measure of MSSS.
Abbreviations: BPF, brain parenchymal fraction; MSSS, multiple sclerosis severity score; PPMS, primary progressive multiple sclerosis; RRMS, relapsing-remitting multiple sclerosis; SPMS, secondary progressive multiple sclerosis.