Literature DB >> 26374705

A design of experiments approach to validation sampling for logistic regression modeling with error-prone medical records.

Liwen Ouyang1, Daniel W Apley2, Sanjay Mehrotra1.   

Abstract

BACKGROUND AND
OBJECTIVE: Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches.
METHODS: The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed.
RESULTS: The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample.
CONCLUSIONS: The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  design of experiments; electronic medical records; logistic regression; sudden cardiac arrest; validation sampling

Mesh:

Year:  2015        PMID: 26374705      PMCID: PMC4954627          DOI: 10.1093/jamia/ocv132

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  6 in total

1.  Assessment of commercial NLP engines for medication information extraction from dictated clinical notes.

Authors:  V Jagannathan; Charles J Mullett; James G Arbogast; Kevin A Halbritter; Deepthi Yellapragada; Sushmitha Regulapati; Pavani Bandaru
Journal:  Int J Med Inform       Date:  2008-10-05       Impact factor: 4.046

2.  Electronic medical records for clinical research: application to the identification of heart failure.

Authors:  Serguei Pakhomov; Susan A Weston; Steven J Jacobsen; Christopher G Chute; Ryan Meverden; Véronique L Roger
Journal:  Am J Manag Care       Date:  2007-06       Impact factor: 2.229

Review 3.  Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care.

Authors:  Benjamin J Miriovsky; Lawrence N Shulman; Amy P Abernethy
Journal:  J Clin Oncol       Date:  2012-10-15       Impact factor: 44.544

4.  Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data.

Authors:  Jessie K Edwards; Stephen R Cole; Melissa A Troester; David B Richardson
Journal:  Am J Epidemiol       Date:  2013-04-04       Impact factor: 4.897

5.  Validation data-based adjustments for outcome misclassification in logistic regression: an illustration.

Authors:  Robert H Lyles; Li Tang; Hillary M Superak; Caroline C King; David D Celentano; Yungtai Lo; Jack D Sobel
Journal:  Epidemiology       Date:  2011-07       Impact factor: 4.822

6.  A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data.

Authors:  Iyad Batal; Hamed Valizadegan; Gregory F Cooper; Milos Hauskrecht
Journal:  ACM Trans Intell Syst Technol       Date:  2013-09       Impact factor: 4.654

  6 in total
  2 in total

1.  Genomics of posttraumatic stress disorder in veterans: Methods and rationale for Veterans Affairs Cooperative Study #575B.

Authors:  Krishnan Radhakrishnan; Mihaela Aslan; Kelly M Harrington; Robert H Pietrzak; Grant Huang; Sumitra Muralidhar; Kelly Cho; Rachel Quaden; David Gagnon; Saiju Pyarajan; Ning Sun; Hongyu Zhao; Michael Gaziano; John Concato; Murray B Stein; Joel Gelernter
Journal:  Int J Methods Psychiatr Res       Date:  2019-02-14       Impact factor: 4.035

Review 2.  The application of big data to cardiovascular disease: paths to precision medicine.

Authors:  Jane A Leopold; Bradley A Maron; Joseph Loscalzo
Journal:  J Clin Invest       Date:  2020-01-02       Impact factor: 14.808

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.