| Literature DB >> 34828360 |
Abdul Karim1, Zheng Su1,2, Phillip K West1, Matthew Keon1, Jannah Shamsani1, Samuel Brennan1, Ted Wong1, Ognjen Milicevic1, Guus Teunisse1, Hima Nikafshan Rad3, Abdul Sattar3.
Abstract
Amyotrophic lateral sclerosis (ALS) is a prototypical neurodegenerative disease characterized by progressive degeneration of motor neurons to severely effect the functionality to control voluntary muscle movement. Most of the non-additive genetic aberrations responsible for ALS make its molecular classification very challenging along with limited sample size, curse of dimensionality, class imbalance and noise in the data. Deep learning methods have been successful in many other related areas but have low minority class accuracy and suffer from the lack of explainability when used directly with RNA expression features for ALS molecular classification. In this paper, we propose a deep-learning-based molecular ALS classification and interpretation framework. Our framework is based on training a convolution neural network (CNN) on images obtained from converting RNA expression values into pixels based on DeepInsight similarity technique. Then, we employed Shapley additive explanations (SHAP) to extract pixels with higher relevance to ALS classifications. These pixels were mapped back to the genes which made them up. This enabled us to classify ALS samples with high accuracy for a minority class along with identifying genes that might be playing an important role in ALS molecular classifications. Taken together with RNA expression images classified with CNN, our preliminary analysis of the genes identified by SHAP interpretation demonstrate the value of utilizing Machine Learning to perform molecular classification of ALS and uncover disease-associated genes.Entities:
Keywords: ALS; classification; interpretation; machine learning; target identification
Mesh:
Substances:
Year: 2021 PMID: 34828360 PMCID: PMC8626003 DOI: 10.3390/genes12111754
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1ALS and control sample images with 120 × 120 resolution obtained using DeepInsight for demonstration purposes.
Figure 2DeepInsight pipeline. (a) An illustration of transformation from feature vector to feature matrix. (b) An illustration of the DeepInsight methodology to transform a feature vector to image pixels. Image taken from DeepInsight [23].
Figure 3CNN architecture for classifying ALS and control images.
Figure 4Left side is gray-scale image of an ALS sample. Right side shows highlighted pixels in the image with SHAP values.
Figure 512-fold cross-validation performance for creating images of various resolutions.
12-fold cross-validation performance comparison with classical machine learning methods such as random forest (RF), support vector machines (SVM) and fully connected neural network (FCNN). For our method while comparing with classical methods, we used images with resolution of 380 × 380 and all 33,153 RNA expression features of autosomal genes.
| Method | AUC | SPE | SEN | NPV | PPV | ACC | MCC | F1 |
|---|---|---|---|---|---|---|---|---|
|
| 0.947 ± 0.04 | 0.671 ± 0.18 | ||||||
| RF | 0.831 ± 0.04 | 0.155 ± 0.05 | 0.994 ± 0.00 | 0.798 ± 0.19 | 0.906 ± 0.00 | 0.575 ± 0.02 | 0.319 ± 0.09 | 0.602 ± 0.04 |
| SVM | 0.866 ± 0.05 | 0.083 ± 0.03 | 0.899 ± 0.00 | 0.541 ± 0.01 | 0.270 ± 0.04 | 0.549 ± 0.02 | ||
| FCNN | 0.805 ± 0.04 | 0.4 ± 0.12 | 0.974 ± 0.02 | 0.692 ± 0.20 | 0.930 ± 0.01 | 0.687 ± 0.06 | 0.478 ± 0.15 | 0.723 ± 0.07 |
12-fold crossvalidation classification performance with a resolution of 350 × 350 high-expression and protein-coding genes RNA features.
| RNA Features | AUC | SPE | SEN | NPV | PPV | ACC | MCC | F1 |
|---|---|---|---|---|---|---|---|---|
| High count genes | ||||||||
| Protein-coding genes | 0.910 ± 0.04 | 0.646 ± 0.13 | 0.968 ± 0.02 | 0.720 ± 0.12 | 0.957 ± 0.01 | 0.807 ± 0.07 | 0.643 ± 0.13 | 0.819 ± 0.06 |