Ting Jin1, Nam D Nguyen2, Flaminia Talos3,4, Daifeng Wang1,5. 1. Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53706, USA. 2. Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA. 3. Departments of Pathology and Urology, Stony Brook, NY 11794, USA. 4. Stony Brook Cancer Center, Stony Brook Medicine, Stony Brook, NY 11794, USA. 5. Waisman Center, University of Wisconsin - Madison, Madison, WI 53705, USA.
Abstract
MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a 'black box', barely providing biological and clinical interpretability from the box. RESULTS: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. AVAILABILITYAND IMPLEMENTATION: ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a 'black box', barely providing biological and clinical interpretability from the box. RESULTS: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancerpatients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancerpatients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. AVAILABILITYAND IMPLEMENTATION: ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Michael Ward; Amirreza Yeganegi; Catalin F Baicu; Amy D Bradshaw; Francis G Spinale; Michael R Zile; William J Richardson Journal: Am J Physiol Heart Circ Physiol Date: 2022-03-11 Impact factor: 4.733
Authors: Jack Albright; Miriam T Ashford; Chengshi Jin; John Neuhaus; Gil D Rabinovici; Diana Truran; Paul Maruff; R Scott Mackin; Rachel L Nosheny; Michael W Weiner Journal: Alzheimers Dement (Amst) Date: 2021-06-09