Stefan Dietrich1, Anna Floegel2, Martina Troll3,4, Tilman Kühn5, Wolfgang Rathmann6,7, Anette Peters4,7,8, Disorn Sookthai5, Martin von Bergen9, Rudolf Kaaks5, Jerzy Adamski7,10,11, Cornelia Prehn10, Heiner Boeing2, Matthias B Schulze7,12, Thomas Illig3,13, Tobias Pischon2,14, Sven Knüppel2, Rui Wang-Sattler3,4,7, Dagmar Drogan2. 1. Department of Epidemiology, German Institute of Human Nutrition, Nuthetal, Germany stefan.dietrich@dife.de. 2. Department of Epidemiology, German Institute of Human Nutrition, Nuthetal, Germany. 3. Research Unit of Molecular Epidemiology. 4. Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany. 5. Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany. 6. Institute for Biometrics and Epidemiology, Leibniz Center for Diabetes Research at Heinrich Heine University, Germany. 7. German Center for Diabetes Research (DZD), München-Neuherberg, Germany. 8. Department of Environmental Health, Harvard School of Public Health, Boston, MA, USA and. 9. Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research (UFZ), Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany and Department of Chemistry and Bioscience, University of Aalborg, Aalborg East, Denmark. 10. Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, German Research Center for Environmental Health, München-Neuherberg, Germany. 11. Lehrstuhl für Experimentelle Genetik, Technische Universität München, Freising-Weihenstephan, Germany. 12. Department of Molecular Epidemiology, German Institute of Human Nutrition, Nuthetal, Germany. 13. Hannover Unified Biobank, and Institute for Human Genetics, Hannover, Germany. 14. Molecular Epidemiology Group, Max Delbruck Center for Molecular Medicine (MDC) Berlin-Buch, Berlin, Germany.
Abstract
BACKGROUND: The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. METHODS: Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. RESULTS: The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. CONCLUSIONS: The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data.
BACKGROUND: The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. METHODS: Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. RESULTS: The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. CONCLUSIONS: The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data.
Authors: Wansu Chen; Rebecca K Butler; Eva Lustigova; Suresh T Chari; Anirban Maitra; Jo A Rinaudo; Bechien U Wu Journal: J Clin Gastroenterol Date: 2022-04-21 Impact factor: 3.174
Authors: Lisa M Lines; Julia Cohen; Justin Kirschner; Michael T Halpern; Erin E Kent; Michelle A Mollica; Ashley Wilder Smith Journal: Int J Med Inform Date: 2020-10-21 Impact factor: 4.046
Authors: Max J Gordon; Andy Kaempf; Byung Park; Alexey V Danilov; Andrea Sitlinger; Geoffrey Shouse; Matthew Mei; Danielle M Brander; Tareq Salous; Brian T Hill; Hamood Alqahtani; Michael Choi; Michael C Churnetski; Jonathon B Cohen; Deborah M Stephens; Tanya Siddiqi; Xavier Rivera; Daniel Persky; Paul Wisniewski; Krish Patel; Mazyar Shadman Journal: Clin Cancer Res Date: 2021-06-24 Impact factor: 12.531