Literature DB >> 17948867

Data reduction for prediction: a case study on robust coding of age and family history for the risk of having a genetic mutation.

Ewout W Steyerberg1, Judith Balmaña, David H Stockwell, Sapna Syngal.   

Abstract

Data reduction is often desired in the development of a prediction model, for example for effects of age and family history in the identification of subjects having a genetic mutation. We aimed to evaluate a strategy for model simplification by robust coding of related predictors. We considered 898 patients suspected of having Lynch syndrome, which is caused primarily by mutations in the mismatch repair genes, MLH1 or MSH2. The presence of colorectal cancer (CRC) and endometrial cancer in patients and their relatives was related to mutation prevalence with logistic regression analysis. The performances of simplified and more complex models were quantified with a concordance statistic (c), which was corrected for optimism by cross-validation and bootstrapping. External validation was performed in 1016 patients. The first challenge was the coding of age at diagnosis of CRC, where we forced effects to be identical in patients, in 1st degree and in 2nd degree relatives, by taking the sum of the ages at diagnosis. As a further simplification, CRC diagnosis in 2nd degree relatives was weighted half that of 1st degree relatives. These data reduction approaches were also followed for endometrial cancer. The simplified model used 7 instead of 17 degrees of freedom (df) for a more complex model incorporating individual predictor effects. The optimism-corrected c was higher (0.79 instead of 0.77), but the external c was similar (0.78 for the simplified and more complex models). A stepwise selected model performed slightly worse (external c=0.77). In conclusion, a prediction model could be developed with relatively few df that captured effects of age at diagnosis across patients and relatives per type of cancer in the family. Such robust coding may especially be relevant for modeling in relatively small data sets. Copyright (c) 2007 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17948867     DOI: 10.1002/sim.3119

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  3 in total

1.  The PREMM(1,2,6) model predicts risk of MLH1, MSH2, and MSH6 germline mutations based on cancer history.

Authors:  Fay Kastrinos; Ewout W Steyerberg; Rowena Mercado; Judith Balmaña; Spring Holter; Steven Gallinger; Kimberly D Siegmund; James M Church; Mark A Jenkins; Noralane M Lindor; Stephen N Thibodeau; Lynn Anne Burbidge; Richard J Wenstrup; Sapna Syngal
Journal:  Gastroenterology       Date:  2010-08-19       Impact factor: 22.682

2.  Comparison of the clinical prediction model PREMM(1,2,6) and molecular testing for the systematic identification of Lynch syndrome in colorectal cancer.

Authors:  Fay Kastrinos; Ewout W Steyerberg; Judith Balmaña; Rowena Mercado; Steven Gallinger; Robert Haile; Graham Casey; John L Hopper; Loic LeMarchand; Noralane M Lindor; Polly A Newcomb; Stephen N Thibodeau; Sapna Syngal
Journal:  Gut       Date:  2012-02-16       Impact factor: 23.059

3.  Development and Validation of the PREMM5 Model for Comprehensive Risk Assessment of Lynch Syndrome.

Authors:  Fay Kastrinos; Hajime Uno; Chinedu Ukaegbu; Carmelita Alvero; Ashley McFarland; Matthew B Yurgelun; Matthew H Kulke; Deborah Schrag; Jeffrey A Meyerhardt; Charles S Fuchs; Robert J Mayer; Kimmie Ng; Ewout W Steyerberg; Sapna Syngal
Journal:  J Clin Oncol       Date:  2017-05-10       Impact factor: 44.544

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.