Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

Literature DB >> 27664504

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

Sebastian Pölsterl¹, Sailesh Conjeti², Nassir Navab³, Amin Katouzian⁴.

Abstract

BACKGROUND: In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis.
RESULTS: We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]).
CONCLUSIONS: If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods.

Entities: Gene Species

Keywords: Censoring; Dimensionality reduction; Feature extraction; Feature selection; Spectral embedding; Survival analysis

Mesh：

Year: 2016 PMID： 27664504 DOI： 10.1016/j.artmed.2016.07.004

Source DB: PubMed Journal: Artif Intell Med ISSN： 0933-3657 Impact factor: 5.326

Keyword Cloud
Cited

7 in total

1. Machine learning for optimized individual survival prediction in resectable upper gastrointestinal cancer.

Authors: Jin-On Jung; Nerma Crnovrsanin; Naita Maren Wirsik; Henrik Nienhüser; Leila Peters; Felix Popp; André Schulze; Martin Wagner; Beat Peter Müller-Stich; Markus Wolfgang Büchler; Thomas Schmidt
Journal: J Cancer Res Clin Oncol Date: 2022-05-26 Impact factor: 4.553

2. A Selective Review on Random Survival Forests for High Dimensional Data.

Authors: Hong Wang; Gang Li
Journal: Quant Biosci Date: 2017

Review 3. A Complete Process of Text Classification System Using State-of-the-Art NLP Models.

Authors: Varun Dogra; Sahil Verma; Pushpita Chatterjee; Jana Shafi; Jaeyoung Choi; Muhammad Fazal Ijaz
Journal: Comput Intell Neurosci Date: 2022-06-09

4. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction.

Authors: Annette Spooner; Emily Chen; Arcot Sowmya; Perminder Sachdev; Nicole A Kochan; Julian Trollor; Henry Brodaty
Journal: Sci Rep Date: 2020-11-23 Impact factor: 4.379

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

1. Machine learning for optimized individual survival prediction in resectable upper gastrointestinal cancer.

2. A Selective Review on Random Survival Forests for High Dimensional Data.

Review 3. A Complete Process of Text Classification System Using State-of-the-Art NLP Models.

4. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction.

5. SurvNet: A Novel Deep Neural Network for Lung Cancer Survival Analysis With Missing Values.

6. Body fat prediction through feature extraction based on anthropometric and laboratory measurements.

Review 7. Computational Analysis of High-Dimensional DNA Methylation Data for Cancer Prognosis.