BACKGROUND: Missing data frequently create problems in the analysis of population-based data sets, such as those collected by cancer registries. Restriction of analysis to records with complete data may yield inferences that are substantially different from those that would have been obtained had no data been missing. 'Naive' methods for handling missing data, such as restriction of the analysis to complete records or creation of a 'missing' category, have drawbacks that can invalidate the conclusions from the analysis. We offer a tutorial on modern methods for handling missing data in relative survival analysis. METHODS: We estimated relative survival for 29 563 colorectal cancer patients who were diagnosed between 1997 and 2004 and registered in the North West Cancer Intelligence Service. The method of multiple imputation (MI) was applied to account for the common example of incomplete stage at diagnosis, under the missing at random (MAR) assumption. Multivariable regression with a generalized linear model and Poisson error structure was then used to estimate the excess hazard of death of the colorectal cancer patients, over and above the background mortality, adjusting for significant predictors of mortality. RESULTS: Incomplete information on stage, morphology and grade meant that only 55% of the data could be included in the 'complete-case' analysis. All cases could be included after indicator method (IM) or MI method. Handling missing data by MI produced a significantly lower estimate of the excess mortality for stage, morphology and grade, with the largest reductions occurring for late-stage and high-grade tumours, when compared with the results of complete-case analysis. CONCLUSION: In complete-case analysis, almost 50% of the information could not be included, and with the IM, all records with missing values for stage were combined into a single 'missing' category. We show that MI methods greatly improved the results by exploiting all the information in the incomplete records. This method also helped to ensure efficient inferences about survival were made from the multivariate regression analyses.
BACKGROUND: Missing data frequently create problems in the analysis of population-based data sets, such as those collected by cancer registries. Restriction of analysis to records with complete data may yield inferences that are substantially different from those that would have been obtained had no data been missing. 'Naive' methods for handling missing data, such as restriction of the analysis to complete records or creation of a 'missing' category, have drawbacks that can invalidate the conclusions from the analysis. We offer a tutorial on modern methods for handling missing data in relative survival analysis. METHODS: We estimated relative survival for 29 563 colorectal cancerpatients who were diagnosed between 1997 and 2004 and registered in the North West Cancer Intelligence Service. The method of multiple imputation (MI) was applied to account for the common example of incomplete stage at diagnosis, under the missing at random (MAR) assumption. Multivariable regression with a generalized linear model and Poisson error structure was then used to estimate the excess hazard of death of the colorectal cancerpatients, over and above the background mortality, adjusting for significant predictors of mortality. RESULTS: Incomplete information on stage, morphology and grade meant that only 55% of the data could be included in the 'complete-case' analysis. All cases could be included after indicator method (IM) or MI method. Handling missing data by MI produced a significantly lower estimate of the excess mortality for stage, morphology and grade, with the largest reductions occurring for late-stage and high-grade tumours, when compared with the results of complete-case analysis. CONCLUSION: In complete-case analysis, almost 50% of the information could not be included, and with the IM, all records with missing values for stage were combined into a single 'missing' category. We show that MI methods greatly improved the results by exploiting all the information in the incomplete records. This method also helped to ensure efficient inferences about survival were made from the multivariate regression analyses.
Authors: Scott V Adams; Dennis J Ahnen; John A Baron; Peter T Campbell; Steven Gallinger; William M Grady; Loic LeMarchand; Noralane M Lindor; John D Potter; Polly A Newcomb Journal: World J Gastroenterol Date: 2013-06-07 Impact factor: 5.742
Authors: Veronika Fedirko; Elio Riboli; Anne Tjønneland; Pietro Ferrari; Anja Olsen; H Bas Bueno-de-Mesquita; Fränzel J B van Duijnhoven; Teresa Norat; Eugène H J M Jansen; Christina C Dahm; Kim Overvad; Marie-Christine Boutron-Ruault; Françoise Clavel-Chapelon; Antoine Racine; Annekatrin Lukanova; Birgit Teucher; Heiner Boeing; Krasimira Aleksandrova; Antonia Trichopoulou; Vassiliki Benetou; Dimitrios Trichopoulos; Sara Grioni; Paolo Vineis; Salvatore Panico; Domenico Palli; Rosario Tumino; Peter D Siersema; Petra H Peeters; Guri Skeie; Magritt Brustad; Maria-Dolores Chirlaque; Aurelio Barricarte; Jose Ramón Quirós; Maria José Sánchez; Miren Dorronsoro; Catalina Bonet; Richard Palmqvist; Göran Hallmans; Timothy J Key; Francesca Crowe; Kay-Tee Khaw; Nick Wareham; Isabelle Romieu; James McKay; Petra A Wark; Dora Romaguera; Mazda Jenab Journal: Cancer Epidemiol Biomarkers Prev Date: 2012-01-25 Impact factor: 4.254
Authors: S Walters; C Maringe; J Butler; B Rachet; P Barrett-Lee; J Bergh; J Boyages; P Christiansen; M Lee; F Wärnberg; C Allemani; G Engholm; T Fornander; M L Gjerstorff; T B Johannesen; G Lawrence; C E McGahan; R Middleton; J Steward; E Tracey; D Turner; M A Richards; M P Coleman Journal: Br J Cancer Date: 2013-02-28 Impact factor: 7.640