Literature DB >> 23873895

Sample size requirements for training high-dimensional risk predictors.

Kevin K Dobbin1, Xiao Song.   

Abstract

A common objective of biomarker studies is to develop a predictor of patient survival outcome. Determining the number of samples required to train a predictor from survival data is important for designing such studies. Existing sample size methods for training studies use parametric models for the high-dimensional data and cannot handle a right-censored dependent variable. We present a new training sample size method that is non-parametric with respect to the high-dimensional vectors, and is developed for a right-censored response. The method can be applied to any prediction algorithm that satisfies a set of conditions. The sample size is chosen so that the expected performance of the predictor is within a user-defined tolerance of optimal. The central method is based on a pilot dataset. To quantify uncertainty, a method to construct a confidence interval for the tolerance is developed. Adequacy of the size of the pilot dataset is discussed. An alternative model-based version of our method for estimating the tolerance when no adequate pilot dataset is available is presented. The model-based method requires a covariance matrix be specified, but we show that the identity covariance matrix provides adequate sample size when the user specifies three key quantities. Application of the sample size method to two microarray datasets is discussed.

Entities:  

Keywords:  Conditional score; Cox regression; High-dimensional data; Risk prediction; Sample size; Training set

Mesh:

Substances:

Year:  2013        PMID: 23873895      PMCID: PMC3770001          DOI: 10.1093/biostatistics/kxt022

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  21 in total

1.  Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates.

Authors:  F Y Hsieh; P W Lavori
Journal:  Control Clin Trials       Date:  2000-12

2.  Estimating dataset size requirements for classifying DNA microarray data.

Authors:  Sayan Mukherjee; Pablo Tamayo; Simon Rogers; Ryan Rifkin; Anna Engle; Colin Campbell; Todd R Golub; Jill P Mesirov
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

3.  On corrected score approach for proportional hazards model with covariate measurement error.

Authors:  Xiao Song; Yijian Huang
Journal:  Biometrics       Date:  2005-09       Impact factor: 2.571

4.  Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.

Authors:  Christine Desmedt; Fanny Piette; Sherene Loi; Yixin Wang; Françoise Lallemand; Benjamin Haibe-Kains; Giuseppe Viale; Mauro Delorenzi; Yi Zhang; Mahasti Saghatchian d'Assignies; Jonas Bergh; Rosette Lidereau; Paul Ellis; Adrian L Harris; Jan G M Klijn; John A Foekens; Fatima Cardoso; Martine J Piccart; Marc Buyse; Christos Sotiriou
Journal:  Clin Cancer Res       Date:  2007-06-01       Impact factor: 12.531

5.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.

Authors:  Wenyu Jiang; Richard Simon
Journal:  Stat Med       Date:  2007-12-20       Impact factor: 2.373

Review 6.  Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.

Authors:  F E Harrell; K L Lee; D B Mark
Journal:  Stat Med       Date:  1996-02-28       Impact factor: 2.373

7.  Modelling progression of CD4-lymphocyte count and its relationship to survival time.

Authors:  V De Gruttola; X M Tu
Journal:  Biometrics       Date:  1994-12       Impact factor: 2.571

8.  A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer.

Authors:  Tomas Bonome; Douglas A Levine; Joanna Shih; Mike Randonovich; Cindy A Pise-Masison; Faina Bogomolniy; Laurent Ozbun; John Brady; J Carl Barrett; Jeff Boyd; Michael J Birrer
Journal:  Cancer Res       Date:  2008-07-01       Impact factor: 12.701

9.  Gene-expression profiles predict survival of patients with lung adenocarcinoma.

Authors:  David G Beer; Sharon L R Kardia; Chiang-Ching Huang; Thomas J Giordano; Albert M Levin; David E Misek; Lin Lin; Guoan Chen; Tarek G Gharib; Dafydd G Thomas; Michelle L Lizyness; Rork Kuick; Satoru Hayasaka; Jeremy M G Taylor; Mark D Iannettoni; Mark B Orringer; Samir Hanash
Journal:  Nat Med       Date:  2002-07-15       Impact factor: 53.440

10.  Optimally splitting cases for training and testing high dimensional classifiers.

Authors:  Kevin K Dobbin; Richard M Simon
Journal:  BMC Med Genomics       Date:  2011-04-08       Impact factor: 3.063

View more
  1 in total

1.  A strategy to build and validate a prognostic biomarker model based on RT-qPCR gene expression and clinical covariates.

Authors:  Maud Tournoud; Audrey Larue; Marie-Angelique Cazalis; Fabienne Venet; Alexandre Pachot; Guillaume Monneret; Alain Lepape; Jean-Baptiste Veyrieras
Journal:  BMC Bioinformatics       Date:  2015-03-28       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.