Literature DB >> 34031916

Clinical risk prediction models and informative cluster size: Assessing the performance of a suicide risk prediction algorithm.

Rebecca Yates Coley1,2, Rod L Walker1, Maricela Cruz1, Gregory E Simon1, Susan M Shortreed1,2.   

Abstract

Clinical visit data are clustered within people, which complicates prediction modeling. Cluster size is often informative because people receiving more care are less healthy and at higher risk of poor outcomes. We used data from seven health systems on 1,518,968 outpatient mental health visits from January 1, 2012 to June 30, 2015 to predict suicide attempt within 90 days. We evaluated true performance of prediction models using a prospective validation set of 4,286,495 visits from October 1, 2015 to September 30, 2017. We examined dividing clustered data on the person or visit level for model training and cross-validation and considered a within cluster resampling approach for model estimation. We evaluated optimism by comparing estimated performance from a left-out testing dataset to performance in the prospective dataset. We used two prediction methods, logistic regression with least absolute shrinkage and selection operator (LASSO) and random forest. The random forest model using a visit-level split for model training and testing was optimistic; it overestimated discrimination (area under the curve, AUC = 0.95 in testing versus 0.84 in prospective validation) and classification accuracy (sensitivity = 0.48 in testing versus 0.19 in prospective validation, 95th percentile cut-off). Logistic regression and random forest models using a person-level split performed well, accurately estimating prospective discrimination and classification: estimated AUCs ranged from 0.85 to 0.87 in testing versus 0.85 in prospective validation, and sensitivity ranged from 0.15 to 0.20 in testing versus 0.17 to 0.19 in prospective validation. Within cluster resampling did not improve performance. We recommend dividing clustered data on the person level, rather than visit level, to ensure strong performance in prospective use and accurate estimation of future performance at the time of model development.
© 2021 Wiley-VCH GmbH.

Entities:  

Keywords:  correlated data; electronic health records; machine learning; nonignorable cluster size; predictive analytics

Mesh:

Year:  2021        PMID: 34031916      PMCID: PMC9134927          DOI: 10.1002/bimj.202000199

Source DB:  PubMed          Journal:  Biom J        ISSN: 0323-3847            Impact factor:   1.715


  16 in total

1.  Marginal analyses of clustered data when cluster size is informative.

Authors:  John M Williamson; Somnath Datta; Glen A Satten
Journal:  Biometrics       Date:  2003-03       Impact factor: 2.571

2.  Informative cluster sizes for subcluster-level covariates and weighted generalized estimating equations.

Authors:  Ying Huang; Brian Leroux
Journal:  Biometrics       Date:  2011-01-31       Impact factor: 2.571

3.  Modeling survival data with informative cluster size.

Authors:  John M Williamson; Hae-Young Kim; Amita Manatunga; David G Addiss
Journal:  Stat Med       Date:  2008-02-20       Impact factor: 2.373

4.  Validation samples.

Authors:  R P Hirsch
Journal:  Biometrics       Date:  1991-09       Impact factor: 2.571

5.  Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records.

Authors:  Gregory E Simon; Eric Johnson; Jean M Lawrence; Rebecca C Rossom; Brian Ahmedani; Frances L Lynch; Arne Beck; Beth Waitzfelder; Rebecca Ziebell; Robert B Penfold; Susan M Shortreed
Journal:  Am J Psychiatry       Date:  2018-05-24       Impact factor: 18.112

6.  Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

Authors:  Chung-Wei Shen; Yi-Hau Chen
Journal:  Biometrics       Date:  2018-03-13       Impact factor: 2.571

7.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

8.  An Examination of Potential Misclassification of Army Suicides: Results from the Army Study to Assess Risk and Resilience in Servicemembers.

Authors:  Kenneth L Cox; Matthew K Nock; Quinn M Biggs; Jennifer Bornemann; Lisa J Colpe; Catherine L Dempsey; Steven G Heeringa; James E McCarroll; Tsz Hin Ng; Michael Schoenbaum; Robert J Ursano; Bailey G Zhang; David M Benedek
Journal:  Suicide Life Threat Behav       Date:  2016-07-22

9.  Validation of a combined comorbidity index.

Authors:  M Charlson; T P Szatrowski; J Peterson; J Gold
Journal:  J Clin Epidemiol       Date:  1994-11       Impact factor: 6.437

Review 10.  Methods for observed-cluster inference when cluster size is informative: a review and clarifications.

Authors:  Shaun R Seaman; Menelaos Pavlou; Andrew J Copas
Journal:  Biometrics       Date:  2014-01-30       Impact factor: 2.571

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.