Amol A Verma1,2,3, Sachin V Pasricha1,4, Hae Young Jung1, Vladyslav Kushnir1, Denise Y F Mak1, Radha Koppula1, Yishan Guo1, Janice L Kwan2,5, Lauren Lapointe-Shaw2,6,7, Shail Rawal2,6, Terence Tang2,8, Adina Weinerman2,9, Fahad Razak1,2,3. 1. Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada. 2. Department of Medicine, University of Toronto, Toronto, Ontario, Canada. 3. Institute for Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada. 4. School of Medicine, Faculty of Health Sciences, Queen's University, Kingston, Ontario, Canada. 5. Department of Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada. 6. Division of General Internal Medicine, University Health Network, Toronto, Ontario, Canada. 7. Institute for Clinical and Evaluative Sciences, Toronto, Ontario, Canada. 8. Institute for Better Health, Trillium Health Partners, Toronto, Ontario, Canada. 9. Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.
Abstract
OBJECTIVE: Large clinical databases are increasingly used for research and quality improvement. We describe an approach to data quality assessment from the General Medicine Inpatient Initiative (GEMINI), which collects and standardizes administrative and clinical data from hospitals. METHODS: The GEMINI database contained 245 559 patient admissions at 7 hospitals in Ontario, Canada from 2010 to 2017. We performed 7 computational data quality checks and iteratively re-extracted data from hospitals to correct problems. Thereafter, GEMINI data were compared to data that were manually abstracted from the hospital's electronic medical record for 23 419 selected data points on a sample of 7488 patients. RESULTS: Computational checks flagged 103 potential data quality issues, which were either corrected or documented to inform future analysis. For example, we identified the inclusion of canceled radiology tests, a time shift of transfusion data, and mistakenly processing the chemical symbol for sodium ("Na") as a missing value. Manual validation identified 1 important data quality issue that was not detected by computational checks: transfusion dates and times at 1 site were unreliable. Apart from that single issue, across all data tables, GEMINI data had high overall accuracy (ranging from 98%-100%), sensitivity (95%-100%), specificity (99%-100%), positive predictive value (93%-100%), and negative predictive value (99%-100%) compared to the gold standard. DISCUSSION AND CONCLUSION: Computational data quality checks with iterative re-extraction facilitated reliable data collection from hospitals but missed 1 critical quality issue. Combining computational and manual approaches may be optimal for assessing the quality of large multisite clinical databases.
OBJECTIVE: Large clinical databases are increasingly used for research and quality improvement. We describe an approach to data quality assessment from the General Medicine Inpatient Initiative (GEMINI), which collects and standardizes administrative and clinical data from hospitals. METHODS: The GEMINI database contained 245 559 patient admissions at 7 hospitals in Ontario, Canada from 2010 to 2017. We performed 7 computational data quality checks and iteratively re-extracted data from hospitals to correct problems. Thereafter, GEMINI data were compared to data that were manually abstracted from the hospital's electronic medical record for 23 419 selected data points on a sample of 7488 patients. RESULTS: Computational checks flagged 103 potential data quality issues, which were either corrected or documented to inform future analysis. For example, we identified the inclusion of canceled radiology tests, a time shift of transfusion data, and mistakenly processing the chemical symbol for sodium ("Na") as a missing value. Manual validation identified 1 important data quality issue that was not detected by computational checks: transfusion dates and times at 1 site were unreliable. Apart from that single issue, across all data tables, GEMINI data had high overall accuracy (ranging from 98%-100%), sensitivity (95%-100%), specificity (99%-100%), positive predictive value (93%-100%), and negative predictive value (99%-100%) compared to the gold standard. DISCUSSION AND CONCLUSION: Computational data quality checks with iterative re-extraction facilitated reliable data collection from hospitals but missed 1 critical quality issue. Combining computational and manual approaches may be optimal for assessing the quality of large multisite clinical databases.
Authors: Tyler Williamson; Michael E Green; Richard Birtwhistle; Shahriar Khan; Stephanie Garies; Sabrina T Wong; Nandini Natarajan; Donna Manca; Neil Drummond Journal: Ann Fam Med Date: 2014-07 Impact factor: 5.166
Authors: Amol A Verma; Yishan Guo; Janice L Kwan; Lauren Lapointe-Shaw; Shail Rawal; Terence Tang; Adina Weinerman; Peter Cram; Irfan A Dhalla; Stephen W Hwang; Andreas Laupacis; Muhammad M Mamdani; Steven Shadowitz; Ross Upshur; Robert J Reid; Fahad Razak Journal: CMAJ Open Date: 2017-12-13
Authors: Loan R van Hoeven; Martine C de Bruijne; Peter F Kemper; Maria M W Koopman; Jan M M Rondeel; Anja Leyte; Hendrik Koffijberg; Mart P Janssen; Kit C B Roes Journal: BMC Med Inform Decis Mak Date: 2017-07-14 Impact factor: 2.796
Authors: Kelly M Sunderland; Derek Beaton; Julia Fraser; Donna Kwan; Paula M McLaughlin; Manuel Montero-Odasso; Alicia J Peltsch; Frederico Pieruccini-Faria; Demetrios J Sahlas; Richard H Swartz; Stephen C Strother; Malcolm A Binns Journal: BMC Med Res Methodol Date: 2019-05-15 Impact factor: 4.615
Authors: Amol A Verma; Tejasvi Hora; Hae Young Jung; Michael Fralick; Sarah L Malecki; Lauren Lapointe-Shaw; Adina Weinerman; Terence Tang; Janice L Kwan; Jessica J Liu; Shail Rawal; Timothy C Y Chan; Angela M Cheung; Laura C Rosella; Marzyeh Ghassemi; Margaret Herridge; Muhammad Mamdani; Fahad Razak Journal: CMAJ Date: 2021-02-10 Impact factor: 8.262
Authors: Amol A Verma; Joshua Murray; Russell Greiner; Joseph Paul Cohen; Kaveh G Shojania; Marzyeh Ghassemi; Sharon E Straus; Chloe Pou-Prom; Muhammad Mamdani Journal: CMAJ Date: 2021-08-29 Impact factor: 8.262
Authors: Amol A Verma; Joshua Murray; Russell Greiner; Joseph Paul Cohen; Kaveh G Shojania; Marzyeh Ghassemi; Sharon E Straus; Chloé Pou-Prom; Muhammad Mamdani Journal: CMAJ Date: 2021-11-08 Impact factor: 8.262
Authors: Amol A Verma; Tejasvi Hora; Hae Young Jung; Michael Fralick; Sarah L Malecki; Lauren Lapointe-Shaw; Adina Weinerman; Terence Tang; Janice L Kwan; Jessica J Liu; Shail Rawal; Timothy C Y Chan; Angela M Cheung; Laura C Rosella; Marzyeh Ghassemi; Margaret Herridge; Muhammad Mamdani; Fahad Razak Journal: CMAJ Date: 2021-06-07 Impact factor: 8.262