Literature DB >> 27924347

Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges.

José Carlos Ferrão1, Mónica Duarte Oliveira, Filipe Janela, Henrique M G Martins.   

Abstract

BACKGROUND: EHR systems have high potential to improve healthcare delivery and management. Although structured EHR data generates information in machine-readable formats, their use for decision support still poses technical challenges for researchers due to the need to preprocess and convert data into a matrix format. During our research, we observed that clinical informatics literature does not provide guidance for researchers on how to build this matrix while avoiding potential pitfalls.
OBJECTIVES: This article aims to provide researchers a roadmap of the main technical challenges of preprocessing structured EHR data and possible strategies to overcome them.
METHODS: Along standard data processing stages - extracting database entries, defining features, processing data, assessing feature values and integrating data elements, within an EDPAI framework -, we identified the main challenges faced by researchers and reflect on how to address those challenges based on lessons learned from our research experience and on best practices from related literature. We highlight the main potential sources of error, present strategies to approach those challenges and discuss implications of these strategies.
RESULTS: Following the EDPAI framework, researchers face five key challenges: (1) gathering and integrating data, (2) identifying and handling different feature types, (3) combining features to handle redundancy and granularity, (4) addressing data missingness, and (5) handling multiple feature values. Strategies to address these challenges include: cross-checking identifiers for robust data retrieval and integration; applying clinical knowledge in identifying feature types, in addressing redundancy and granularity, and in accommodating multiple feature values; and investigating missing patterns adequately.
CONCLUSIONS: This article contributes to literature by providing a roadmap to inform structured EHR data preprocessing. It may advise researchers on potential pitfalls and implications of methodological decisions in handling structured data, so as to avoid biases and help realize the benefits of the secondary use of EHR data.

Keywords:  Data mining; clinical decision support; data access; electronic health records and systems; integration and analysis; structured data

Mesh:

Year:  2016        PMID: 27924347      PMCID: PMC5228148          DOI: 10.4338/ACI-2016-03-SOA-0035

Source DB:  PubMed          Journal:  Appl Clin Inform        ISSN: 1869-0327            Impact factor:   2.342


  53 in total

Review 1.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

2.  Annotation: what can be done about missing data? Approaches to imputation.

Authors:  D F Heitjan
Journal:  Am J Public Health       Date:  1997-04       Impact factor: 9.308

3.  Learning from big health care data.

Authors:  Sebastian Schneeweiss
Journal:  N Engl J Med       Date:  2014-06-05       Impact factor: 91.245

Review 4.  Computer-stored medical records. Their future role in medical practice.

Authors:  C J McDonald; W M Tierney
Journal:  JAMA       Date:  1988-06-17       Impact factor: 56.272

5.  Bias arising from missing data in predictive models.

Authors:  Marc H Gorelick
Journal:  J Clin Epidemiol       Date:  2006-10       Impact factor: 6.437

6.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem.

Authors:  José M Jerez; Ignacio Molina; Pedro J García-Laencina; Emilio Alba; Nuria Ribelles; Miguel Martín; Leonardo Franco
Journal:  Artif Intell Med       Date:  2010-07-16       Impact factor: 5.326

7.  Are three methods better than one? A comparative assessment of usability evaluation methods in an EHR.

Authors:  Muhammad F Walji; Elsbeth Kalenderian; Mark Piotrowski; Duong Tran; Krishna K Kookal; Oluwabunmi Tokede; Joel M White; Ram Vaderhobli; Rachel Ramoni; Paul C Stark; Nicole S Kimmes; Maxim Lagerweij; Vimla L Patel
Journal:  Int J Med Inform       Date:  2014-02-03       Impact factor: 4.046

Review 8.  The outcomes of regional healthcare information systems in health care: a review of the research literature.

Authors:  Tiina Mäenpää; Tarja Suominen; Paula Asikainen; Marianne Maass; Ilmari Rostila
Journal:  Int J Med Inform       Date:  2009-08-04       Impact factor: 4.046

9.  PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

Authors:  Kenney Ng; Amol Ghoting; Steven R Steinhubl; Walter F Stewart; Bradley Malin; Jimeng Sun
Journal:  J Biomed Inform       Date:  2013-12-25       Impact factor: 6.317

10.  Predicting length of stay from an electronic patient record system: a primary total knee replacement example.

Authors:  Evelene M Carter; Henry W W Potts
Journal:  BMC Med Inform Decis Mak       Date:  2014-04-04       Impact factor: 2.796

View more
  5 in total

1.  Presenting predictors and temporal trends of treatment-related outcomes in diabetic ketoacidosis.

Authors:  Christopher M Horvat; Heba M Ismail; Alicia K Au; Luigi Garibaldi; Nalyn Siripong; Sajel Kantawala; Rajesh K Aneja; Diane S Hupp; Patrick M Kochanek; Robert Sb Clark
Journal:  Pediatr Diabetes       Date:  2018-04-26       Impact factor: 4.866

2.  Leveraging the electronic health record to improve dermatologic care delivery: The importance of finding structure in data.

Authors:  Andrew J Park; Gil S Weintraub; Maryam M Asgari
Journal:  J Am Acad Dermatol       Date:  2019-11-02       Impact factor: 11.527

3.  Use of machine learning to transform complex standardized nursing care plan data into meaningful research variables: a palliative care exemplar.

Authors:  Tamara G R Macieira; Yingwei Yao; Gail M Keenan
Journal:  J Am Med Inform Assoc       Date:  2021-11-25       Impact factor: 7.942

4.  Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research.

Authors:  Tiffany Pellathy; Melissa Saul; Gilles Clermont; Artur W Dubrawski; Michael R Pinsky; Marilyn Hravnak
Journal:  J Clin Monit Comput       Date:  2021-02-08       Impact factor: 1.977

Review 5.  Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter.

Authors:  Davy van de Sande; Michel E Van Genderen; Jim M Smit; Joost Huiskens; Jacob J Visser; Robert E R Veen; Edwin van Unen; Oliver Hilgers Ba; Diederik Gommers; Jasper van Bommel
Journal:  BMJ Health Care Inform       Date:  2022-02
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.