Literature DB >> 27107447

Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories.

Carlos Sáez1,2, Oscar Zurriaga3,4,5, Jordi Pérez-Panadés3, Inma Melchor3, Montserrat Robles6, Juan M García-Gómez6,7.   

Abstract

OBJECTIVE: To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ).
MATERIALS AND METHODS: Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data.
RESULTS: The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. DISCUSSION: Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed.
CONCLUSION: Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  data mining; data monitoring; data quality; data reuse; multisite repositories; statistical data analysis

Mesh:

Year:  2016        PMID: 27107447     DOI: 10.1093/jamia/ocw010

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  9 in total

1.  Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions.

Authors:  Yili Zhang; Güneş Koru
Journal:  J Am Med Inform Assoc       Date:  2020-03-01       Impact factor: 4.497

Review 2.  Secondary Use of Patient Data: Review of the Literature Published in 2016.

Authors:  D R Schlegel; G Ficheur
Journal:  Yearb Med Inform       Date:  2017-09-11

3.  Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients.

Authors:  Omar Del Tejo Catala; Ismael Salvador Igual; Francisco Javier Perez-Benito; David Millan Escriva; Vicent Ortiz Castello; Rafael Llobet; Juan-Carlos Perez-Cortes
Journal:  IEEE Access       Date:  2021-03-10       Impact factor: 3.476

4.  The impact of data quality assurance and control solutions on the completeness, accuracy, and consistency of data in a national spinal cord injury registry of Iran (NSCIR-IR).

Authors:  Pegah Derakhshan; Zahra Azadmanjir; Khatereh Naghdi; Roya Habibi Arejan; Mahdi Safdarian; Mohammad Reza Zarei; Seyed Behzad Jazayeri; Mahdi Sharif-Alhoseini; Jalil Arab Kheradmand; Abbas Amirjamshidi; Zahra Ghodsi; Morteza Faghih Jooybari; Mahdi Mohammadzadeh; Zahra Khazaeipour; Shayan Abdollah Zadegan; Aidin Abedi; Gerard Oreilly; Vanessa Noonan; Edward C Benzel; Alexander R Vaccaro; Farideh Sadeghian; Vafa Rahimi-Movaghar
Journal:  Spinal Cord Ser Cases       Date:  2021-06-10

5.  Subphenotyping of Mexican Patients With COVID-19 at Preadmission To Anticipate Severity Stratification: Age-Sex Unbiased Meta-Clustering Technique.

Authors:  Lexin Zhou; Nekane Romero-García; Juan Martínez-Miranda; J Alberto Conejero; Juan M García-Gómez; Carlos Sáez
Journal:  JMIR Public Health Surveill       Date:  2022-03-30

6.  EHRtemporalVariability: delineating temporal data-set shifts in electronic health records.

Authors:  Carlos Sáez; Alba Gutiérrez-Sacristán; Isaac Kohane; Juan M García-Gómez; Paul Avillach
Journal:  Gigascience       Date:  2020-08-01       Impact factor: 6.524

7.  Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years.

Authors:  Francisco Javier Pérez-Benito; Carlos Sáez; J Alberto Conejero; Salvador Tortajada; Bernardo Valdivieso; Juan M García-Gómez
Journal:  PLoS One       Date:  2019-08-07       Impact factor: 3.240

8.  Registry Data Coordinator (RDC): a Proper Accessible Strategy for Improving Road Traffic Injury (RTI) Hospital Based Trauma Registry Systems in Developing Countries and Low Income Countries.

Authors:  Zahra Meidani; Mehrdad Mahdian; Atefe Ayan; Mahdi Mohammadzade; Alimohammad Nickfarjam; Gholam Abbas Moosavi
Journal:  Acta Inform Med       Date:  2018

9.  Data-driven discovery of changes in clinical code usage over time: a case-study on changes in cardiovascular disease recording in two English electronic health records databases (2001-2015).

Authors:  Patrick Rockenschaub; Vincent Nguyen; Robert W Aldridge; Dionisio Acosta; Juan Miguel García-Gómez; Carlos Sáez
Journal:  BMJ Open       Date:  2020-02-13       Impact factor: 2.692

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.