| Literature DB >> 33156855 |
Nafis Irtiza Tripto1, Mohimenul Kabir1, Md Shamsuzzoha Bayzid1, Atif Rahman1.
Abstract
Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data and focus on clustering the genes based on their expression values. Other domains, such as financial stock and weather prediction, utilize time series data for forecasting purposes. Moreover, many studies have been conducted to classify generic time series data based on trend, seasonality, and other patterns. Therefore, an assessment of these approaches on gene expression data would be of great interest to evaluate their adequacy in this domain. Here, we perform a comprehensive evaluation of different traditional unsupervised and supervised machine learning approaches as well as deep learning based techniques for time series gene expression classification and forecasting on five real datasets. In addition, we propose deep learning based methods for both classification and forecasting, and compare their performances with the state-of-the-art methods. We find that deep learning based methods generally outperform traditional approaches for time series classification. Experiments also suggest that supervised classification on gene expression is more effective than clustering when labels are available. In time series gene expression forecasting, we observe that an autoregressive statistical approach has the best performance for short term forecasting, whereas deep learning based methods are better suited for long term forecasting.Entities:
Mesh:
Year: 2020 PMID: 33156855 PMCID: PMC7647064 DOI: 10.1371/journal.pone.0241686
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1CNN architecture.
CNN architecture having two hidden layers, followed by a dense and output layer. The input vector is given to the 1st convolution (hidden) layer, and output is received via the output layer as distribution of softmax function.
Fig 2LSTM architecture.
LSTM architecture having two hidden layers, followed by a dense and output layer. The input vector is given to the 1st convolution (hidden) layer and output is received via the output layer as distribution of softmax function.
Fig 3Different classes of GSE6186 gene expression.
X-axis denotes the time interval and Y-axis represents the corresponding gene expression value. (A): Maternal gene expression. (B): Transient gene expression. (C): Activated gene expression.
Accuracy of all methods on different datasets.
All accuracy value is mentioned in percentage (%).
| Dataset | Accuracy | |||||
|---|---|---|---|---|---|---|
| CNN | LSTM | SVM | One-Class SVM | DNN | DeepTrust | |
| GSE6186 | 92.19 | 95.75 | 93.02 | 93.21 | 78.23 | |
| GSE3406 | 86.38 | 88.83 | 50.13 | 87.36 | 43.28 | |
| GSE1723 | 80.25 | 76.15 | 83.11 | 73.54 | 59.3 | |
| Patient | 64.21 | 68.14 | 63.45 | 63.07 | 53.55 | |
| Yeast | 47.37 | 63.17 | 47.36 | 78.94 | 55.56 | |
F1 score of all methods on different datasets.
All F1 score value is mentioned in percentage (%).
| Dataset | F1 score | |||||
|---|---|---|---|---|---|---|
| CNN | LSTM | SVM | One-Class SVM | DNN | DeepTrust | |
| GSE6186 | 92.33 | 93.59 | 92 | 93.18 | 39.18 | |
| GSE3406 | 85.86 | 86.45 | 50.1 | 87.1 | 16.46 | |
| GSE1723 | 80.25 | 76.15 | 83.03 | 73.54 | 18.61 | |
| Patient | 63.71 | 57.86 | 46.13 | 63.02 | 18.63 | |
| Yeast | 47.37 | 63.17 | 30 | 53.84 | 17.78 | |
Accuracy & F1 score of STEM on all dataset.
All value is mentioned in percentage (%).
| GSE6186 | GSE3406 | GSE1723 | Patient | Yeast | |
|---|---|---|---|---|---|
| Accuracy | 75.5 | 30.45 | 51.64 | 52.12 | 42.55 |
| F1 score | 61.4 | 31.19 | 51.64 | 50.83 | 25.38 |
RMSE value of all methods on different datasets.
RMSE value of different methods for different test percents are grouped together and best RMSE values are highlighted.
| Test percent | Method | GSE 6186 | GSE 3406 | GSE 1723 | Patient | Yeast |
|---|---|---|---|---|---|---|
| 10 | Holt-Winters | 0.3 | 0.52 | 0.53 | - | 0.48 |
| ARIMA | - | |||||
| ANN | 0.268 | 0.55 | 0.54 | - | 0.5 | |
| LSTM | 0.3 | 0.76 | - | - | - | |
| GluonTS | 0.361 | 0.76 | 0.831 | - | 0.599 | |
| 20 | Holt-Winters | 0.538 | 0.7 | 0.72 | 0.8 | 0.79 |
| ARIMA | 0.54 | 0.414 | 0.57 | |||
| ANN | 0.5 | 0.65 | 0.65 | 0.64 | ||
| LSTM | 0.6 | 0.9 | - | |||
| GluonTS | 0.561 | 0.798 | 1.02 | 0.835 | 0.874 | |
| 30 | Holt-Winters | 0.665 | 0.87 | 0.93 | 0.94 | 1.13 |
| ARIMA | 0.64 | |||||
| ANN | 0.488 | 0.67 | - | 0.56 | ||
| LSTM | 0.746 | 0.86 | 0.73 | 1.1 | ||
| GluonTS | 0.725 | 0.985 | 1.161 | 1.066 | 1.4 | |
| 40 | Holt-Winters | 0.8 | 1.1 | 1.07 | 1.2 | 1.31 |
| ARIMA | - | - | - | - | - | |
| ANN | - | |||||
| LSTM | 0.88 | 0.91 | 0.88 | 1.35 | 0.59 | |
| GluonTS | 1.28 | 1.29 | 1.90 | 2.2 | 1.93 |