| Literature DB >> 31304370 |
Juan M Banda1,2, Ashish Sarraju3, Fahim Abbasi3, Justin Parizo3, Mitchel Pariani3, Hannah Ison3, Elinor Briskin3, Hannah Wand3, Sebastien Dubois1, Kenneth Jung1, Seth A Myers4, Daniel J Rader5,6, Joseph B Leader7, Michael F Murray8, Kelly D Myers4,6, Katherine Wilemon6, Nigam H Shah1, Joshua W Knowles3,6,9.
Abstract
Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk of coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part of the FH Foundation's FIND FH initiative, we developed a classifier to identify potential FH patients using electronic health record (EHR) data at Stanford Health Care. We trained a random forest classifier using data from known patients (n = 197) and matched non-cases (n = 6590). Our classifier obtained a positive predictive value (PPV) of 0.88 and sensitivity of 0.75 on a held-out test-set. We evaluated the accuracy of the classifier's predictions by chart review of 100 patients at risk of FH not included in the original dataset. The classifier correctly flagged 84% of patients at the highest probability threshold, with decreasing performance as the threshold lowers. In external validation on 466 FH patients (236 with genetically proven FH) and 5000 matched non-cases from the Geisinger Healthcare System our FH classifier achieved a PPV of 0.85. Our EHR-derived FH classifier is effective in finding candidate patients for further FH screening. Such machine learning guided strategies can lead to effective identification of the highest risk patients for enhanced management strategies.Entities:
Keywords: Health care; Translational research
Year: 2019 PMID: 31304370 PMCID: PMC6550268 DOI: 10.1038/s41746-019-0101-5
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Classifier building followed by internal and external evaluation as well as evaluation via chart review for EHR-based FH case identification
Classifier performance at internal and external sites
| Internal evaluation (Stanford) | External evaluation (Geisinger) | |
|---|---|---|
| AUROC | 0.94 | 0.94 (0.003) |
| AUPRC | 0.71 | 0.68 (0.054) |
| PPV | 0.88 | 0.85 (0.002) |
| Sensitivity | 0.75 | 0.68 (0.002) |
| Specificity | 0.99 | 0.99 (0.001) |
| F1 Score | 0.81 | 0.75 (0.004) |
For the internal evaluation, the table reports performance metrics on a held-out test-set. For the external evaluation, the table reports the average performance over 10 iterations of classifying randomly sampled 71 cases and 4970 non-cases at 1:70 prevalence, which mirrors expected prevalence in a lipid clinic. The numbers in the in parentheses are standard deviations for each metric
Fig. 2Distribution of FH cases according to probability assigned by the random forest classifier as arbitrated by independent chart review at Stanford
Top 20 features in the classifier that flag patients with FH
| # | Feature_ID | Source | Feature explanation, and source |
|---|---|---|---|
| 1 | text:40094263 | Unstructured | Mention of LDL cholesterol in doctors′ notes |
| 2 | lab:3027114:BIN5 | Structured | |
| 3 | text:457658075 | Unstructured | Mention of a visit to a Cardiology clinic |
| 4 | cond:448359416 | Structured | A diagnosis code of Paroxysmal supraventricular tachycardia |
| 5 | drugEx:15459583 | Structured | A prescription of atorvastatin |
| 6 | lab:3028288:BIN4 | Structured | |
| 7 | drugEx:15264753 | Structured | A prescription of ezetimibe |
| 8 | text:40372345 | Unstructured | Mention of ‘Red meat′ (indicative of diet conversations) |
| 9 | lab:3028288:BIN5 | Structured | |
| 10 | lab:3009966:BIN4 | Structured | |
| 11 | text:42897633 | Unstructured | Mention of ‘Lipid′ in doctors notes |
| 12 | lab:3025839:BIN5 | Structured | |
| 13 | text:45957223 | Unstructured | A mention of ‘Triglycerides’ |
| 14 | drugEx:15108133 | Structured | A prescription of rosuvastatin |
| 15 | cond:448369299 | Structured | Mixed hyperlipidemia |
| 16 | cond:448276299 | Structured | Other and unspecified hyperlipidemia |
| 17 | drugEx:13070462 | Structured | A prescription of Metoprolol |
| 18 | text:457636305 | Unstructured | A mention of Rosuvastatin |
| 19 | lab:3027114:BIN4 | Structured | |
| 20 | text:4230588 | Unstructured | A mention of ‘Cytologic’ |
Fig. 3Learning and testing setup for the Stanford FH classifier. * One comorbidity from the following: hypertension, coronary atherosclerosis (CAD), dyslipidemia, myocardial infarction; and had no history of nephrotic syndrome, or obstructive (cholestatic) liver disease