Siaw-Teng Liaw1, Jason Guan Nan Guo1, Sameera Ansari1, Jitendra Jonnagaddala1, Myron Anthony Godinho1, Alder Jose Borelli1, Simon de Lusignan2, Daniel Capurro3, Harshana Liyanage2, Navreet Bhattal4, Vicki Bennett4, Jaclyn Chan4, Michael G Kahn5. 1. WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia. 2. Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom. 3. Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Victoria, Australia. 4. Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia. 5. Department of Pediatrics (Section of Informatics and Data Sciences), University of Colorado Anschutz Medical Campus, Denver, Colorado, USA.
Abstract
OBJECTIVE: Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. MATERIALS AND METHODS: The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. RESULTS: The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. CONCLUSIONS: A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
OBJECTIVE: Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. MATERIALS AND METHODS: The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. RESULTS: The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. CONCLUSIONS: A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
Authors: George Hripcsak; Jon D Duke; Nigam H Shah; Christian G Reich; Vojtech Huser; Martijn J Schuemie; Marc A Suchard; Rae Woong Park; Ian Chi Kei Wong; Peter R Rijnbeek; Johan van der Lei; Nicole Pratt; G Niklas Norén; Yu-Chuan Li; Paul E Stang; David Madigan; Patrick B Ryan Journal: Stud Health Technol Inform Date: 2015
Authors: Ritu Khare; Levon Utidjian; Byron J Ruth; Michael G Kahn; Evanette Burrows; Keith Marsolo; Nandan Patibandla; Hanieh Razzaghi; Ryan Colvin; Daksha Ranade; Melody Kitzmiller; Daniel Eckrich; L Charles Bailey Journal: J Am Med Inform Assoc Date: 2017-11-01 Impact factor: 4.497
Authors: Vojtech Huser; Frank J DeFalco; Martijn Schuemie; Patrick B Ryan; Ning Shang; Mark Velez; Rae Woong Park; Richard D Boyce; Jon Duke; Ritu Khare; Levon Utidjian; Charles Bailey Journal: EGEMS (Wash DC) Date: 2016-11-30
Authors: Jonathan M Mang; Susanne A Seuchter; Christian Gulden; Stefanie Schild; Detlef Kraska; Hans-Ulrich Prokosch; Lorenz A Kapsner Journal: BMC Med Inform Decis Mak Date: 2022-08-11 Impact factor: 3.298
Authors: Sylvia E K Sudat; Sarah C Robinson; Satish Mudiganti; Aravind Mani; Alice R Pressman Journal: J Biomed Inform Date: 2021-02-19 Impact factor: 6.317