Yunchuan Kong1, Tianwei Yu1. 1. Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.
Abstract
MOTIVATION: A unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This 'n≪p' property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relationships between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. RESULTS: To address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data. AVAILABILITY AND IMPLEMENTATION: The method is available at https://github.com/yunchuankong/forgeNet. CONTACT: tianwei.yu@emory.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: A unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This 'n≪p' property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relationships between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. RESULTS: To address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data. AVAILABILITY AND IMPLEMENTATION: The method is available at https://github.com/yunchuankong/forgeNet. CONTACT: tianwei.yu@emory.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman Journal: Bioinformatics Date: 2001-06 Impact factor: 6.937
Authors: Hong Zhao; Yelda C Orhan; Xiaoming Zha; Ecem Esencan; Robert T Chatterton; Serdar E Bulun Journal: Am J Transl Res Date: 2017-02-15 Impact factor: 4.060
Authors: Rachel St Clair; Michael Teti; Mirjana Pavlovic; William Hahn; Elan Barenholtz Journal: Med Biol Eng Comput Date: 2022-03-18 Impact factor: 3.079