| Literature DB >> 29048458 |
Lichy Han1, Mateusz Maciejewski2, Christoph Brockel3, William Gordon2, Scott B Snapper4,5, Joshua R Korzenik6, Lovisa Afzelius2, Russ B Altman7,8.
Abstract
Summary: Gene-based supervised machine learning classification models have been widely used to differentiate disease states, predict disease progression and determine effective treatment options. However, many of these classifiers are sensitive to noise and frequently do not replicate in external validation sets. For complex, heterogeneous diseases, these classifiers are further limited by being unable to capture varying combinations of genes that lead to the same phenotype. Pathway-based classification can overcome these challenges by using robust, aggregate features to represent biological mechanisms. In this work, we developed a novel pathway-based approach, PRObabilistic Pathway Score, which uses genes to calculate individualized pathway scores for classification. Unlike previous individualized pathway-based classification methods that use gene sets, we incorporate gene interactions using probabilistic graphical models to more accurately represent the underlying biology and achieve better performance. We apply our method to differentiate two similar complex diseases, ulcerative colitis (UC) and Crohn's disease (CD), which are the two main types of inflammatory bowel disease (IBD). Using five IBD datasets, we compare our method against four gene-based and four alternative pathway-based classifiers in distinguishing CD from UC. We demonstrate superior classification performance and provide biological insight into the top pathways separating CD from UC. Availability and Implementation: PROPS is available as a R package, which can be downloaded at http://simtk.org/home/props or on Bioconductor. Contact: rbaltman@stanford.edu. Supplementary information: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2018 PMID: 29048458 PMCID: PMC5860179 DOI: 10.1093/bioinformatics/btx651
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of PROPS feature engineering. (1) KEGG pathways are downloaded and represented as directed networks. (2) Edges are added to the pathway in random order, excluding edges that would result in cycles. (3) This results in a Bayesian network representation of each KEGG pathway. (4) Each pathway model is parameterized using the healthy and non-lesional tissue samples. (5) The parameterized network is applied to CD and UC data to (6) calculate log-likelihood values for each pathway for each patient, which are used for subsequent classification
Fig. 2.AUC comparison between all methods on all four validation datasets. PROPS consistently performs well and outperforms nearly all other methods in all studies
Fig. 3.(A) Aggregate ROC curves and (B) pairwise AUC comparison between all methods on all independent validation data. For GSE36807 five genes, only GSE10616 and GSE9686 were used, resulting in fewer samples for comparison. PROPS obtains the highest AUC and outperforms more methods than all its competitors, significantly outperforming all genes, LLR, CORG and GED, and trending towards significance against pathway genes and top 257 genes
Fig. 4.(A) The top 15 important features from our model. (B) Visualization of classification results using multidimensional scaling. The majority of the misclassified samples are located at the border between CD and UC, with five UC samples that appear to be more similar to CD than UC