| Literature DB >> 31914991 |
Emily Kogan1, Kathryn Twyman2, Jesse Heap2, Dejan Milentijevic3, Jennifer H Lin3, Mark Alberts4.
Abstract
BACKGROUND: Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data.Entities:
Keywords: Database; Outcomes research; Real-world evidence
Year: 2020 PMID: 31914991 PMCID: PMC6950922 DOI: 10.1186/s12911-019-1010-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Schematic diagram of study design. Schematic diagram of study design, including timeline and patient inclusion requirements. EHR: electronic health record; NIHSS: National Institutes of Health Stroke Scale
Fig. 2NIHSS Distribution. Distribution of (a) real NLP-extracted NIHSS scores and (b) model-imputed NIHSS scores for the hold-out test set (n = 1033). Model-imputed scores were rounded down to nearest integer
Fig. 3Feature selection based on cross-validated model performance. Feature selection based on cross-validated model performance on the training data. Mean model performance was calculated on 3-fold cross-validated subsets for feature selection. Maximum cross-validated performance was achieved with the top 226 features incorporated into the model; however, minimal gains in performance were seen beyond 100 features
Patient Demographics and Characteristics
| Characteristic | Training set ( | Hold-out test set ( | Overall population ( |
|---|---|---|---|
| Demographics | |||
| Age, mean (SD) | 66 (14) | 67 (14) | 66 (14) |
| Female, n (%) | 3196 (52) | 568 (55) | 3764 (53) |
| Region | |||
| Northeast, n (%) | 464 (8) | 84 (8) | 548 (8) |
| Midwest, n (%) | 2388 (39) | 389 (38) | 2777 (39) |
| South, n (%) | 2957 (48) | 496 (48) | 3453 (48) |
| West, n (%) | 186 (3) | 40 (4) | 226 (3) |
| Other/Unknown, n (%) | 121 (2) | 24 (2) | 145 (2) |
| EHR data | |||
| NIHSS, median (IQR) | 2 (6) | 2 (6) | 2 (6) |
| LOS, median (IQR) | 3 (5) | 2 (4) | 3 (5) |
| Type of strokea | |||
| Ischemic, n (%) | 4328 (70.8) | 710 (68.7) | 5038 (70.5) |
| Hemorrhagic, n (%) | 605 (10.0) | 113 (10.9) | 718 (10.0) |
| TIA, n (%) | 2235 (36.5) | 384 (37.2) | 2619 (36.6) |
| Charlson Comorbidity Indexb, median (IQR) | 1 (3) | 1 (3) | 1 (3) |
SD standard deviation, EHR Electronic Health Record, NIHSS National Institutes of Health Stroke Scale, IQR interquartile range, LOS length of stay, TIA transient ischemic attack
aBased on ICD diagnosis codes during stroke event; more than one type may be coded per patient stroke event
bCalculated based on patients’ diagnosis codes prior to stroke [15]
Fig. 4Imputed versus actual NIHSS scores. Imputed versus actual NIHSS scores in the hold-out test cohort. Lighter colored points represent single patients whereas darker points represent multiple overlapping patients