Wei-Qi Wei1, Pedro L Teixeira1, Huan Mo1, Robert M Cronin2, Jeremy L Warner2, Joshua C Denny3. 1. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. 2. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA. 3. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA josh.denny@vanderbilt.edu.
Abstract
OBJECTIVE: To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications. MATERIALS AND METHODS: We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimer's disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinson's disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, andF-score for each EHR component alone and in combination. RESULTS: The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean ± standard deviation: 0.91 ± 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 ± 0.16). Overall, the best performance (Fscore: 0.70 ± 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 ± 0.14) was only slightly lower than using two or more components, its PPV (0.71 ± 0.13) is substantially worse (0.91 ± 0.08). CONCLUSION: Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
OBJECTIVE: To evaluate the phenotyping performance of three major electronic health record (EHR) components: International Classification of Disease (ICD) diagnosis codes, primary notes, and specific medications. MATERIALS AND METHODS: We conducted the evaluation using de-identified Vanderbilt EHR data. We preselected ten diseases: atrial fibrillation, Alzheimer's disease, breast cancer, gout, human immunodeficiency virus infection, multiple sclerosis, Parkinson's disease, rheumatoid arthritis, and types 1 and 2 diabetes mellitus. For each disease, patients were classified into seven categories based on the presence of evidence in diagnosis codes, primary notes, and specific medications. Twenty-five patients per disease category (a total number of 175 patients for each disease, 1750 patients for all ten diseases) were randomly selected for manual chart review. Review results were used to estimate the positive predictive value (PPV), sensitivity, andF-score for each EHR component alone and in combination. RESULTS: The PPVs of single components were inconsistent and inadequate for accurately phenotyping (0.06-0.71). Using two or more ICD codes improved the average PPV to 0.84. We observed a more stable and higher accuracy when using at least two components (mean ± standard deviation: 0.91 ± 0.08). Primary notes offered the best sensitivity (0.77). The sensitivity of ICD codes was 0.67. Again, two or more components provided a reasonably high and stable sensitivity (0.59 ± 0.16). Overall, the best performance (Fscore: 0.70 ± 0.12) was achieved by using two or more components. Although the overall performance of using ICD codes (0.67 ± 0.14) was only slightly lower than using two or more components, its PPV (0.71 ± 0.13) is substantially worse (0.91 ± 0.08). CONCLUSION: Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.
Authors: Wei-Qi Wei; Robert M Cronin; Hua Xu; Thomas A Lasko; Lisa Bastarache; Joshua C Denny Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18
Authors: Joshua C Denny; Anderson Spickard; Kevin B Johnson; Neeraja B Peterson; Josh F Peterson; Randolph A Miller Journal: J Am Med Inform Assoc Date: 2009-08-28 Impact factor: 4.497
Authors: Matthew Bidwell Goetz; Tuyen Hoang; Virginia L Kan; David Rimland; Maria Rodriguez-Barradas Journal: AIDS Res Hum Retroviruses Date: 2014-03-20 Impact factor: 2.205
Authors: Elizabeth F O Kern; Miriam Maney; Donald R Miller; Chin-Lin Tseng; Anjali Tiwari; Mangala Rajan; David Aron; Leonard Pogach Journal: Health Serv Res Date: 2006-04 Impact factor: 3.402
Authors: W-Q Wei; Q Feng; L Jiang; M S Waitara; O F Iwuchukwu; D M Roden; M Jiang; H Xu; R M Krauss; J I Rotter; D A Nickerson; R L Davis; R L Berg; P L Peissig; C A McCarty; R A Wilke; J C Denny Journal: Clin Pharmacol Ther Date: 2013-10-04 Impact factor: 6.875
Authors: Colin R Cooke; Min J Joo; Stephen M Anderson; Todd A Lee; Edmunds M Udris; Eric Johnson; David H Au Journal: BMC Health Serv Res Date: 2011-02-16 Impact factor: 2.655
Authors: Curtis P Langlotz; Bibb Allen; Bradley J Erickson; Jayashree Kalpathy-Cramer; Keith Bigelow; Tessa S Cook; Adam E Flanders; Matthew P Lungren; David S Mendelson; Jeffrey D Rudie; Ge Wang; Krishna Kandarpa Journal: Radiology Date: 2019-04-16 Impact factor: 11.105
Authors: Isotta Landi; Benjamin S Glicksberg; Hao-Chih Lee; Sarah Cherng; Giulia Landi; Matteo Danieletto; Joel T Dudley; Cesare Furlanello; Riccardo Miotto Journal: NPJ Digit Med Date: 2020-07-17
Authors: Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley Journal: Pac Symp Biocomput Date: 2018
Authors: Justin R Gregg; Maximilian Lang; Lucy L Wang; Matthew J Resnick; Sandeep K Jain; Jeremy L Warner; Daniel A Barocas Journal: JCO Clin Cancer Inform Date: 2017-06-08
Authors: Hassan S Dashti; Brian E Cade; Gerda Stutaite; Richa Saxena; Susan Redline; Elizabeth W Karlson Journal: Sleep Date: 2021-03-12 Impact factor: 5.849