Literature DB >> 22264894

Multivariate modeling of complications with data driven variable selection: guarding against overfitting and effects of data set size.

Arjen van der Schaaf1, Cheng-Jian Xu, Peter van Luijk, Aart A Van't Veld, Johannes A Langendijk, Cornelis Schilstra.   

Abstract

PURPOSE: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power.
MATERIALS AND METHODS: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets.
RESULTS: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small data sets, in particular in data sets with a low number of events (median: 7, 95th percentile: 32). Recognizing overfitting from an inverted sign of the estimated model coefficients has a limited discriminative value.
CONCLUSIONS: Despite considerable spread around the optimal number of selected variables, the bootstrapping method is efficient and accurate for sufficiently large data sets, and guards against overfitting for all simulated cases with the exception of some data sets with a particularly low number of events. An appropriate minimum data set size to obtain a model with high predictive power is approximately 200 patients and more than 32 events. With fewer data samples the true predictive power decreases rapidly, and for larger data set sizes the benefit levels off toward an asymptotic maximum predictive power.
Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Mesh:

Year:  2012        PMID: 22264894     DOI: 10.1016/j.radonc.2011.12.006

Source DB:  PubMed          Journal:  Radiother Oncol        ISSN: 0167-8140            Impact factor:   6.280


  21 in total

Review 1.  Predicting outcomes in radiation oncology--multifactorial decision support systems.

Authors:  Philippe Lambin; Ruud G P M van Stiphout; Maud H W Starmans; Emmanuel Rios-Velazquez; Georgi Nalbantov; Hugo J W L Aerts; Erik Roelofs; Wouter van Elmpt; Paul C Boutros; Pierluigi Granone; Vincenzo Valentini; Adrian C Begg; Dirk De Ruysscher; Andre Dekker
Journal:  Nat Rev Clin Oncol       Date:  2012-11-20       Impact factor: 66.675

2.  Texture analysis as a predictor of radiation-induced xerostomia in head and neck patients undergoing IMRT.

Authors:  Valerio Nardone; Paolo Tini; Christophe Nioche; Maria Antonietta Mazzei; Tommaso Carfagno; Giuseppe Battaglia; Pierpaolo Pastina; Roberta Grassi; Lucio Sebaste; Luigi Pirtoli
Journal:  Radiol Med       Date:  2018-01-24       Impact factor: 3.469

3.  Ability of Delta Radiomics to Predict a Complete Pathological Response in Patients with Loco-Regional Rectal Cancer Addressed to Neoadjuvant Chemo-Radiation and Surgery.

Authors:  Valerio Nardone; Alfonso Reginelli; Roberta Grassi; Giovanna Vacca; Giuliana Giacobbe; Antonio Angrisani; Alfredo Clemente; Ginevra Danti; Pierpaolo Correale; Salvatore Francesco Carbone; Luigi Pirtoli; Lorenzo Bianchi; Angelo Vanzulli; Cesare Guida; Roberto Grassi; Salvatore Cappabianca
Journal:  Cancers (Basel)       Date:  2022-06-18       Impact factor: 6.575

4.  Radiomics predicts survival of patients with advanced non-small cell lung cancer undergoing PD-1 blockade using Nivolumab.

Authors:  Valerio Nardone; Paolo Tini; Pierpaolo Pastina; Cirino Botta; Alfonso Reginelli; Salvatore Francesco Carbone; Rocco Giannicola; Grazia Calabrese; Carmela Tebala; Cesare Guida; Aldo Giudice; Vito Barbieri; Pierfrancesco Tassone; Pierosandro Tagliaferri; Salvatore Cappabianca; Rosanna Capasso; Amalia Luce; Michele Caraglia; Maria Antonietta Mazzei; Luigi Pirtoli; Pierpaolo Correale
Journal:  Oncol Lett       Date:  2019-12-16       Impact factor: 2.967

5.  Artificial Intelligence and Radiomics in Head and Neck Cancer Care: Opportunities, Mechanics, and Challenges.

Authors:  Lisanne V van Dijk; Clifton D Fuller
Journal:  Am Soc Clin Oncol Educ Book       Date:  2021-03

6.  Developing Multivariable Normal Tissue Complication Probability Model to Predict the Incidence of Symptomatic Radiation Pneumonitis among Breast Cancer Patients.

Authors:  Tsair-Fwu Lee; Pei-Ju Chao; Liyun Chang; Hui-Min Ting; Yu-Jie Huang
Journal:  PLoS One       Date:  2015-07-06       Impact factor: 3.240

7.  Multivariate normal tissue complication probability modeling of gastrointestinal toxicity after external beam radiotherapy for localized prostate cancer.

Authors:  Laura Cella; Vittoria D'Avino; Raffaele Liuzzi; Manuel Conson; Francesca Doria; Adriana Faiella; Filomena Loffredo; Marco Salvatore; Roberto Pacelli
Journal:  Radiat Oncol       Date:  2013-09-23       Impact factor: 3.481

8.  3D bone texture analysis as a potential predictor of radiation-induced insufficiency fractures.

Authors:  Valerio Nardone; Paolo Tini; Stefania Croci; Salvatore Francesco Carbone; Lucio Sebaste; Tommaso Carfagno; Giuseppe Battaglia; Pierpaolo Pastina; Giovanni Rubino; Maria Antonietta Mazzei; Luigi Pirtoli
Journal:  Quant Imaging Med Surg       Date:  2018-02

9.  Delta-radiomics and response to neoadjuvant treatment in locally advanced gastric cancer-a multicenter study of GIRCG (Italian Research Group for Gastric Cancer).

Authors:  Maria Antonietta Mazzei; Letizia Di Giacomo; Giulio Bagnacci; Valerio Nardone; Francesco Gentili; Gabriele Lucii; Paolo Tini; Daniele Marrelli; Paolo Morgagni; Gianni Mura; Gian Luca Baiocchi; Frida Pittiani; Luca Volterrani; Franco Roviello
Journal:  Quant Imaging Med Surg       Date:  2021-06

10.  Using multivariate regression model with least absolute shrinkage and selection operator (LASSO) to predict the incidence of Xerostomia after intensity-modulated radiotherapy for head and neck cancer.

Authors:  Tsair-Fwu Lee; Pei-Ju Chao; Hui-Min Ting; Liyun Chang; Yu-Jie Huang; Jia-Ming Wu; Hung-Yu Wang; Mong-Fong Horng; Chun-Ming Chang; Jen-Hong Lan; Ya-Yu Huang; Fu-Min Fang; Stephen Wan Leung
Journal:  PLoS One       Date:  2014-02-28       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.