Literature DB >> 27591264

Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis.

Stefan Dietrich1, Anna Floegel2, Martina Troll3,4, Tilman Kühn5, Wolfgang Rathmann6,7, Anette Peters4,7,8, Disorn Sookthai5, Martin von Bergen9, Rudolf Kaaks5, Jerzy Adamski7,10,11, Cornelia Prehn10, Heiner Boeing2, Matthias B Schulze7,12, Thomas Illig3,13, Tobias Pischon2,14, Sven Knüppel2, Rui Wang-Sattler3,4,7, Dagmar Drogan2.   

Abstract

BACKGROUND: The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues.
METHODS: Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided.
RESULTS: The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites.
CONCLUSIONS: The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data.
© The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.

Entities:  

Keywords:  Cox proportional hazards regression; exploratory survival analysis; metabolomics; multicollinearity; random survival forest; right-censored data; type 2 diabetes mellitus; variable selection

Mesh:

Substances:

Year:  2016        PMID: 27591264     DOI: 10.1093/ije/dyw145

Source DB:  PubMed          Journal:  Int J Epidemiol        ISSN: 0300-5771            Impact factor:   7.196


  21 in total

1.  SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data.

Authors:  Yunwei Zhang; Germaine Wong; Graham Mann; Samuel Muller; Jean Y H Yang
Journal:  Gigascience       Date:  2022-07-30       Impact factor: 7.658

2.  A Selective Review on Random Survival Forests for High Dimensional Data.

Authors:  Hong Wang; Gang Li
Journal:  Quant Biosci       Date:  2017

3.  Prognostic value of a microRNA-pair signature in laryngeal squamous cell carcinoma patients.

Authors:  Shu Zhou; Qingchun Meng; Zexuan Wang
Journal:  Eur Arch Otorhinolaryngol       Date:  2022-04-27       Impact factor: 3.236

4.  Risk Prediction of Pancreatic Cancer in Patients With Recent-onset Hyperglycemia: A Machine-learning Approach.

Authors:  Wansu Chen; Rebecca K Butler; Eva Lustigova; Suresh T Chari; Anirban Maitra; Jo A Rinaudo; Bechien U Wu
Journal:  J Clin Gastroenterol       Date:  2022-04-21       Impact factor: 3.174

5.  Risk Prediction of Dyslipidemia for Chinese Han Adults Using Random Forest Survival Model.

Authors:  Xiaoshuai Zhang; Fang Tang; Jiadong Ji; Wenting Han; Peng Lu
Journal:  Clin Epidemiol       Date:  2019-12-10       Impact factor: 4.790

6.  Random survival forests using linked data to measure illness burden among individuals before or after a cancer diagnosis: Development and internal validation of the SEER-CAHPS illness burden index.

Authors:  Lisa M Lines; Julia Cohen; Justin Kirschner; Michael T Halpern; Erin E Kent; Michelle A Mollica; Ashley Wilder Smith
Journal:  Int J Med Inform       Date:  2020-10-21       Impact factor: 4.046

7.  The Chronic Lymphocytic Leukemia Comorbidity Index (CLL-CI): A Three-Factor Comorbidity Model.

Authors:  Max J Gordon; Andy Kaempf; Byung Park; Alexey V Danilov; Andrea Sitlinger; Geoffrey Shouse; Matthew Mei; Danielle M Brander; Tareq Salous; Brian T Hill; Hamood Alqahtani; Michael Choi; Michael C Churnetski; Jonathon B Cohen; Deborah M Stephens; Tanya Siddiqi; Xavier Rivera; Daniel Persky; Paul Wisniewski; Krish Patel; Mazyar Shadman
Journal:  Clin Cancer Res       Date:  2021-06-24       Impact factor: 12.531

8.  Ferroptosis-related gene signature predicts prognosis and immunotherapy in glioma.

Authors:  Rong-Jun Wan; Wang Peng; Qin-Xuan Xia; Hong-Hao Zhou; Xiao-Yuan Mao
Journal:  CNS Neurosci Ther       Date:  2021-05-10       Impact factor: 5.243

9.  Quantile Regression Forests to Identify Determinants of Neighborhood Stroke Prevalence in 500 Cities in the USA: Implications for Neighborhoods with High Prevalence.

Authors:  Liangyuan Hu; Jiayi Ji; Yan Li; Bian Liu; Yiyi Zhang
Journal:  J Urban Health       Date:  2021-04       Impact factor: 3.671

10.  Predictive scores for identifying patients with type 2 diabetes mellitus at risk of acute myocardial infarction and sudden cardiac death.

Authors:  Sharen Lee; Jiandong Zhou; Cosmos Liutao Guo; Wing Tak Wong; Tong Liu; Ian Chi Kei Wong; Kamalan Jeevaratnam; Qingpeng Zhang; Gary Tse
Journal:  Endocrinol Diabetes Metab       Date:  2021-02-19
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.