Wouter B van Dijk1, Aernoud T L Fiolet2, Ewoud Schuit3, Arjan Sammani4, T Katrien J Groenhof3, Rieke van der Graaf5, Martine C de Vries6, Marco Alings7, Jeroen Schaap7, Folkert W Asselbergs8, Diederick E Grobbee3, Rolf H H Groenwold9, Arend Mosterd10. 1. Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands. Electronic address: W.B.vanDijk-7@umcutrecht.nl. 2. Department of Cardiology, Meander Medical Center, Amersfoort, the Netherlands; Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands. 3. Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands. 4. Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands. 5. Department of Medical Humanities, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands. 6. Department of Medical Ethics and Health Law, Leiden University Medical Center, Leiden University, Leiden, the Netherlands. 7. Department of Cardiology, Amphia Hospital, Breda, the Netherlands; Dutch Network for Cardiovascular Research (WCN), Utrecht, the Netherlands. 8. Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom; Health Data Research UK and Institute of Health Informatics, University College London, London, United Kingdom. 9. Department of Clinical Epidemiology, Leiden University Medical Center, Leiden University, Leiden, the Netherlands. 10. Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands; Department of Cardiology, Meander Medical Center, Amersfoort, the Netherlands; Dutch Network for Cardiovascular Research (WCN), Utrecht, the Netherlands.
Abstract
OBJECTIVE: This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. STUDY DESIGN AND SETTING: In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. RESULTS: Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4-8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7-92.8%). CONCLUSION: Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.
OBJECTIVE: This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial. STUDY DESIGN AND SETTING: In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel. RESULTS: Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4-8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7-92.8%). CONCLUSION: Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.
Authors: Chiamaka L Okorie; Elise Gatsby; Florian R Schroeck; A Aziz Ould Ismail; Kristine E Lynch Journal: PLoS One Date: 2022-05-13 Impact factor: 3.752