PURPOSE: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. METHODS: The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18 years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. RESULTS: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. CONCLUSIONS: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.
PURPOSE: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. METHODS: The Integrated Primary Care Information (IPCI) database was searched for potential asthmapatients aged 5-18 years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. RESULTS: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. CONCLUSIONS: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.
Authors: Devon W Paul; Nigel B Neely; Meredith Clement; Isaretta Riley; Mashael Al-Hegelan; Matthew Phelan; Monica Kraft; David M Murdoch; Joseph Lucas; John Bartlett; Mehri McKellar; Loretta G Que Journal: J Am Med Inform Assoc Date: 2018-02-01 Impact factor: 4.497
Authors: Elizabeth Ford; John A Carroll; Helen E Smith; Donia Scott; Jackie A Cassell Journal: J Am Med Inform Assoc Date: 2016-02-05 Impact factor: 4.497
Authors: Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas Journal: Clin Epidemiol Date: 2017-12-01 Impact factor: 4.790
Authors: Francis Nissen; Jennifer K Quint; Samantha Wilkinson; Hana Mullerova; Liam Smeeth; Ian J Douglas Journal: BMJ Open Date: 2017-05-29 Impact factor: 2.692
Authors: Mindy K Ross; Henry Zheng; Bing Zhu; Ailina Lao; Hyejin Hong; Alamelu Natesan; Melina Radparvar; Alex A T Bui Journal: Methods Inf Med Date: 2021-07-14 Impact factor: 1.800