Literature DB >> 30347470

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes.

Richard D Riley1, Kym I E Snell1, Joie Ensor1, Danielle L Burke1, Frank E Harrell2, Karel G M Moons3, Gary S Collins4.   

Abstract

In the medical literature, hundreds of prediction models are being developed to predict health outcomes in individuals. For continuous outcomes, typically a linear regression model is developed to predict an individual's outcome value conditional on values of multiple predictors (covariates). To improve model development and reduce the potential for overfitting, a suitable sample size is required in terms of the number of subjects (n) relative to the number of predictor parameters (p) for potential inclusion. We propose that the minimum value of n should meet the following four key criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9; (ii) small absolute difference of ≤ 0.05 in the apparent and adjusted R2 ; (iii) precise estimation (a margin of error ≤ 10% of the true value) of the model's residual standard deviation; and similarly, (iv) precise estimation of the mean predicted outcome value (model intercept). The criteria require prespecification of the user's chosen p and the model's anticipated R2 as informed by previous studies. The value of n that meets all four criteria provides the minimum sample size required for model development. In an applied example, a new model to predict lung function in African-American women using 25 predictor parameters requires at least 918 subjects to meet all criteria, corresponding to at least 36.7 subjects per predictor parameter. Even larger sample sizes may be needed to additionally ensure precise estimates of key predictor effects, especially when important categorical predictors have low prevalence in certain categories.
© 2018 John Wiley & Sons, Ltd.

Entities:  

Keywords:  R-squared; continuous outcome; linear regression; minimum sample size; multivariable prediction model

Mesh:

Year:  2018        PMID: 30347470     DOI: 10.1002/sim.7993

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  31 in total

1.  Prevalence of and Associations With Avascular Necrosis After Pediatric Sepsis: A Single-Center Retrospective Study.

Authors:  Uvaraj Periasamy; Marianne Chilutti; Summer L Kaplan; Christopher P Hickey; Katie Hayes; Jeffrey W Pennington; Fran Balamuth; Julie C Fitzgerald; Scott L Weiss
Journal:  Pediatr Crit Care Med       Date:  2022-03-01       Impact factor: 3.624

Review 2.  DNA methylation-based predictors of health: applications and statistical considerations.

Authors:  Paul D Yousefi; Matthew Suderman; Ryan Langdon; Oliver Whitehurst; George Davey Smith; Caroline L Relton
Journal:  Nat Rev Genet       Date:  2022-03-18       Impact factor: 53.242

3.  Individualised screening of diabetic foot: creation of a prediction model based on penalised regression and assessment of theoretical efficacy.

Authors:  Iztok Štotl; Rok Blagus; Vilma Urbančič-Rovan
Journal:  Diabetologia       Date:  2021-11-06       Impact factor: 10.122

4.  Just How Confident Can We Be in Predicting Sports Injuries? A Systematic Review of the Methodological Conduct and Performance of Existing Musculoskeletal Injury Prediction Models in Sport.

Authors:  Garrett S Bullock; Joseph Mylott; Tom Hughes; Kristen F Nicholson; Richard D Riley; Gary S Collins
Journal:  Sports Med       Date:  2022-06-11       Impact factor: 11.928

5.  A CHecklist for statistical Assessment of Medical Papers (the CHAMP statement): explanation and elaboration.

Authors:  Mohammad Ali Mansournia; Gary S Collins; Rasmus Oestergaard Nielsen; Maryam Nazemipour; Nicholas P Jewell; Douglas G Altman; Michael J Campbell
Journal:  Br J Sports Med       Date:  2021-01-29       Impact factor: 18.473

6.  Toward a unified framework for interpreting machine-learning models in neuroimaging.

Authors:  Lada Kohoutová; Juyeon Heo; Sungmin Cha; Sungwoo Lee; Taesup Moon; Tor D Wager; Choong-Wan Woo
Journal:  Nat Protoc       Date:  2020-03-18       Impact factor: 17.021

7.  Cardiovascular Disease Prognostic Models in Latin America and the Caribbean: A Systematic Review.

Authors:  Rodrigo M Carrillo-Larco; Carlos Altez-Fernandez; Niels Pacheco-Barrios; Claudia Bambs; Vilma Irazola; J Jaime Miranda; Goodarz Danaei; Pablo Perel
Journal:  Glob Heart       Date:  2019-03

8.  Development and validation of a prediction model for fat mass in children and adolescents: meta-analysis using individual participant data.

Authors:  Mohammed T Hudda; Mary S Fewtrell; Dalia Haroun; Sooky Lum; Jane E Williams; Jonathan C K Wells; Richard D Riley; Christopher G Owen; Derek G Cook; Alicja R Rudnicka; Peter H Whincup; Claire M Nightingale
Journal:  BMJ       Date:  2019-07-24

9.  COVID-19 vaccination intention in the UK: results from the COVID-19 vaccination acceptability study (CoVAccS), a nationally representative cross-sectional survey.

Authors:  Susan M Sherman; Louise E Smith; Julius Sim; Richard Amlôt; Megan Cutts; Hannah Dasch; G James Rubin; Nick Sevdalis
Journal:  Hum Vaccin Immunother       Date:  2020-11-26       Impact factor: 3.452

10.  Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.

Authors:  Richard D Riley; Kym Ie Snell; Joie Ensor; Danielle L Burke; Frank E Harrell; Karel Gm Moons; Gary S Collins
Journal:  Stat Med       Date:  2018-10-24       Impact factor: 2.373

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.