| Literature DB >> 32374408 |
Mehr Kashyap1, Martin Seneviratne1, Juan M Banda1,2, Thomas Falconer3, Borim Ryu4, Sooyoung Yoo4, George Hripcsak3, Nigam H Shah1.
Abstract
OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network.Entities:
Keywords: cohort identification; electronic health records; electronic phenotyping; machine learning; phenotype
Mesh:
Year: 2020 PMID: 32374408 PMCID: PMC7309227 DOI: 10.1093/jamia/ocaa032
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Development and validation of phenotype classifiers. Training sets were constructed by applying multiple mentions-based imperfect labeling functions to our patient data extract. Patients with multiple mentions of any SNOMED codes relevant to the phenotype of interest were considered training cases. Patients who did not meet this criterion were labeled as training controls. Random forest classifiers were built for each phenotype using 5-fold cross validation. The test set was constructed using OMOP implementations of rule-based phenotype definitions. Test cases were randomly sampled from the cohort of patients selected by the rule-based definitions. Test controls were sampled from the remaining patients. For each phenotype, the imperfect labeling function used to generate the training set and the corresponding classifier were evaluated using the rule-based phenotype-derived test sets.
Test set performance of labeling heuristic requiring multiple disease-specific code mentions compared to phenotype classifiers trained with data labeled using this multiple mentions approach
| Phenotype | Prevalence of cases in test set | Multiple mentions of SNOMED code | APHRODITE classifier | Recall boost using classifier | Precision loss using classifier | |||
|---|---|---|---|---|---|---|---|---|
| No. of mentions | Recall | Precision | Recall | Precision | ||||
| Appendicitis | 0.05 | 2 | 0.31 | 1.00 | 0.97 | 0.99 | 0.66 | 0.01 |
| T2DM | 0.14 | 4 | 0.24 | 0.99 | 0.60 | 0.91 | 0.36 | 0.08 |
| Cataracts | 0.17 | 4 | 0.07 | 0.97 | 0.63 | 0.93 | 0.56 | 0.04 |
| HF | 0.02 | 4 | 0.33 | 0.94 | 0.99 | 0.56 | 0.66 | 0.38 |
| AAA | 0.04 | 4 | 0.22 | 0.99 | 0.53 | 0.97 | 0.31 | 0.02 |
| Epileptic seizure | 0.02 | 4 | 0.06 | 1.00 | 0.22 | 0.94 | 0.17 | 0.06 |
| PAD | 0.05 | 4 | 0.18 | 0.98 | 0.91 | 0.91 | 0.72 | 0.07 |
| Adult onset obesity | 0.36 | 4 | 0.20 | 1.00 | 0.29 | 0.91 | 0.09 | 0.09 |
| Glaucoma | 0.01 | 4 | 0.08 | 1.00 | 0.22 | 0.88 | 0.14 | 0.12 |
| VTE | 0.01 | 4 | 0.03 | 1.00 | 0.69 | 0.22 | 0.66 | 0.78 |
Abbreviations: AAA, abdominal aortic aneurysm; HF, heart failure; T2DM, type 2 diabetes mellitus; PAD, peripheral arterial disease; VTE, venous thromboembolism.
Classifier performance at 3 sites within OHDSI network. Phenotype classifiers constructed at Stanford were shared with Columbia and SNUBH, and evaluated using test sets derived locally at each site using rule-based definitions. Furthermore, classifiers built at Columbia and SNUBH were shared with Stanford and evaluated using similarly constructed test sets. Blue denotes values equal to 1, white denotes values equal to 0
| Development site | Stanford | Columbia | SNUBH | Stanford | Columbia | SNUBH | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Validation site | Stanford | Columbia | SNUBH | Stanford | Columbia | Stanford | SNUBH | Stanford | Columbia | SNUBH | Stanford | Columbia | Stanford | SNUBH |
| Phenotype | Recall | Precision | ||||||||||||
| Appendicitis | 0.97 | 0.9 | 0.09 | 0.82 | 0.75 | 0.52 | 0.1 | 0.99 | 0.9 | 0.56 | 0.83 | 0.48 | 0.13 | 0.98 |
| T2DM | 0.6 | 0.63 | 0.77 | 0.58 | 0.69 | 0.67 | 0.75 | 0.91 | 0.86 | 0.75 | 0.75 | 0.83 | 0.51 | 0.89 |
| Cataracts | 0.63 | 0.45 | 0.84 | 0.8 | 0.6 | 0.35 | 0.84 | 0.93 | 0.79 | 0.85 | 0.74 | 0.74 | 0.42 | 0.74 |
| HF | 0.99 | 0.97 | 0.8 | 0.99 | 0.97 | 0.71 | 0.82 | 0.56 | 0.67 | 0.75 | 0.47 | 0.4 | 0.11 | 0.66 |
| AAA | 0.53 | 0.24 | 0.54 | 0.59 | 0.78 | 0.33 | 0.57 | 0.97 | 0.75 | 0.87 | 0.96 | 0.97 | 0.13 | 0.47 |
| Epileptic seizure | 0.22 | 0.3 | 0.28 | 0.41 | 0.79 | 0.46 | 0.11 | 0.94 | 0.87 | 0.55 | 0.79 | 0.57 | 0.08 | 0.68 |
| PAD | 0.91 | 0.89 | 0.57 | 0.48 | 0.85 | 0.46 | 0.55 | 0.91 | 0.87 | 0.68 | 0.69 | 0.57 | 0.24 | 0.59 |
| Adult onset obesity | 0.29 | 0.33 | 0.07 | 0.14 | 0.59 | 0.39 | 0.07 | 0.91 | 0.93 | 0.73 | 0.85 | 0.89 | 0.68 | 0.8 |
| Glaucoma | 0.22 | 0.18 | 0.11 | 0.34 | 0.4 | 0.22 | 0.12 | 0.88 | 0.78 | 0.65 | 0.69 | 0.8 | 0.06 | 0.75 |
| VTE | 0.69 | 0.34 | 0.2 | 0.21 | 0.71 | 0.46 | 0.19 | 0.2 | 0.71 | 0.83 | 0.51 | 0.21 | 0.05 | 0.78 |
Abbreviations: AAA, abdominal aortic aneurysm; HF, heart failure; T2DM, type 2 diabetes mellitus; PAD, peripheral arterial disease; VTE, venous thromboembolism.