| Literature DB >> 31340596 |
Kok Keng Tan1, Nguyen Quoc Khanh Le2, Hui-Yuan Yeh3, Matthew Chin Heng Chua4.
Abstract
Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works focused on identifying enhancers based on their sequence information, however, the resulting performance still requires improvements. Using deep learning methods, this study proposes a model ensemble of classifiers for predicting enhancers based on deep recurrent neural networks. The input features of deep ensemble networks were generated from six types of dinucleotide physicochemical properties, which had outperformed the other features. In summary, our model which used this ensemble approach could identify enhancers with achieved sensitivity of 75.5%, specificity of 76%, accuracy of 75.5%, and MCC of 0.51. For classifying enhancers into strong or weak sequences, our model reached sensitivity of 83.15%, specificity of 45.61%, accuracy of 68.49%, and MCC of 0.312. Compared to the benchmark result, our results had higher performance in term of most measurement metrics. The results showed that deep model ensembles hold the potential for improving on the best results achieved to date using shallow machine learning methods.Entities:
Keywords: biocomputing; dinucleotide physicochemical properties; enhancer DNA; ensemble deep learning; gene expression; high performance; transcription factor
Mesh:
Substances:
Year: 2019 PMID: 31340596 PMCID: PMC6678823 DOI: 10.3390/cells8070767
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1A neural network model with 1 convolution 1D and 1 max pooling layer before bi-directional recurrent and fully-connected layers.
Figure 2A neural network model with 2 single-direction gated recurrent unit (GRU) layers and 1 fully-connected layer.
Physicochemical property.
| Rise | Roll | Shift | Slide | Tilt | Twist | |
|---|---|---|---|---|---|---|
| AA | 0.430303 | 0.403042 | 1.000000 | 0.545455 | 0.4 | 0.833333 |
| AC | 0.818182 | 0.695817 | 0.618557 | 1.000000 | 0.7 | 0.833333 |
| AG | 0.257576 | 0.315589 | 0.762887 | 0.772727 | 0.3 | 0.791667 |
| AT | 0.860606 | 1.000000 | 0.319588 | 0.863636 | 0.6 | 0.750000 |
| CA | 0.045455 | 0.220532 | 0.360825 | 0.090909 | 0.1 | 0.291667 |
| CC | 0.548485 | 0.171103 | 0.731959 | 0.545455 | 0.3 | 1.000000 |
| CG | 0.000000 | 0.304183 | 0.371134 | 0.000000 | 0.0 | 0.333333 |
| CT | 0.257576 | 0.315589 | 0.762887 | 0.772727 | 0.3 | 0.791667 |
| GA | 0.706061 | 0.277567 | 0.618557 | 0.500000 | 0.4 | 0.833333 |
| GC | 1.000000 | 0.536122 | 0.494845 | 0.500000 | 1.0 | 0.750000 |
| GG | 0.548485 | 0.171103 | 0.731959 | 0.545455 | 0.3 | 1.000000 |
| GT | 0.818182 | 0.695817 | 0.618557 | 1.000000 | 0.7 | 0.833333 |
| TA | 0.000000 | 0.000000 | 0.000000 | 0.136364 | 0.0 | 0.000000 |
| TC | 0.706061 | 0.277567 | 0.618557 | 0.500000 | 0.4 | 0.833333 |
| TG | 0.045455 | 0.220532 | 0.360825 | 0.090909 | 0.1 | 0.291667 |
| TT | 0.430303 | 0.403042 | 1.000000 | 0.545455 | 0.4 | 0.833333 |
Model validation results.
| S/N | Conv1D/Maxpool | GRU | Dense | Acc (%) | @Epoch |
|---|---|---|---|---|---|
| Layer 1 | |||||
| #1 | - | 1 × 8 | - | 75.61 | 586 |
| #2 | - | 1 × 16 | - | 75.20 | 252 |
| #3 | - | 2 × 16 | - | 75.51 | 232 |
| #4 | - | 2 × 16 | 1 × 16 | 75.41 | 392 |
| #5 | - | 1 × 16b | 1 × 16 | 74.82 | 206 |
| #6 | 1 × 16(9)/2 | 2 × 8 | 1 × 8 | 74.49 | 44 |
| #7 | 1 × 16(9)/2 | 2 × 8b | 1 × 8 | 75.00 | 161 |
| Layer 2 | |||||
| #8 | - | 2 × 16 | 1 × 16 | 62.29 | 61 |
| #9 | 1 × 16(9)/2 | 2 × 8b | 1 × 8 | 60.27 | 68 |
Conv1D/Max pool = 1 × 16(9)/2 means 1 convolution layer of 16 channels using filter of size 9 and 1 max pool layer of filter size 2; stride is 1 for both GRU, Dense = 1 × 8 means 1 layer of 8 hidden nodes; b means bi-directional.
Fine-tuning validation results.
| S/N | Conv1D/Maxpool | GRU | Dense | Regularization | Acc (%) | @Epoch |
|---|---|---|---|---|---|---|
| Layer 1 | ||||||
| #1 | - | 2 × 16 | 1 × 16 | dp 0.1 | 77.65 | 865 |
| #2 | - | 2 × 16 | 1 × 16 | dp 0.2 | 76.22 | 1044 |
| #3 | - | 2 × 16 | 1 × 16 | dp 0.3 | 76.12 | 3744 |
| #4 | - | 2 × 16 | 1 × 16 | dp 0.4 | 75.10 | 1104 |
| #5 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | dp 0.6 | 77.24 | 693 |
| #6 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | dp 0.6; l1l2 3 × 10−5 | 77.14 | 604 |
| Layer 2 | ||||||
| #7 | - | 2 × 16 | 1 × 16 | dp 0.1 | 62.96 | 547 |
| #8 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | dp 0.6 | 61.28 | 2081 |
Regularization = dp means dropouts with following parameter value; l1l2 means weights regularization using both l1 and l2 penalty functions with following parameter value. b means bi-directional.
Training (with warm restarts) validation results.
| S/N | Conv1D/Maxpool | GRU | Dense | Warm Restarts | Acc (%) | @Epoch |
|---|---|---|---|---|---|---|
| Layer 1 | ||||||
| #1 | - | 2 × 16 | 1 × 16 | dp 0.1; cyc 200; max_lr 0.003 | 76.73 | 1283 |
| #2 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | dp 0.6; cyc 200; max_lr 0.001 | 76.22 | 871 |
| Layer 2 | ||||||
| #3 | - | 2 × 16 | 1 × 16 | dp 0.1; cyc 200; max_lr 0.003 | 63.30 | 228 |
| #4 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | dp 0.6; cyc 200; max_lr 0.001 | 59.60 | 1224 |
Warm Restarts = dp means dropouts with following parameter value; cyc means number of epochs in a cycle; max_lr means maximum learning rate; minimum learning rate is set at 1 × 10−4 for both models. b means bi-directional.
Figure 3An example of model training using warm restarts. The cycles are apparent in the plot. The y axis is accuracy; the x axis is epochs.
Ensembles validation results.
| S/N | Conv1D/Maxpool | GRU | Dense | # of Models | Acc (%) |
|---|---|---|---|---|---|
| Layer 1 | |||||
| #1 | - | 2 × 16 | 1 × 16 | 5 | 77.45 |
| #2 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | 3 | 76.43 |
| Layer 2 | |||||
| #3 | - | 2 × 16 | 1 × 16 | 3 | 63.64 |
| #4 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | 3 | 59.60 |
# of Models = number of individual models in the model ensemble. b means bi-directional.
Cross-validation results.
| Acc (%) | MCC | Sn (%) | Sp (%) | AUC (%) |
|---|---|---|---|---|
| Layer 1 | ||||
| 74.83 | 0.498 | 73.25 | 76.42 | 76.94 |
| Layer 2 | ||||
| 58.96 | 0.197 | 79.65 | 38.28 | 60.68 |
MCC = Matthews Correlation Coefficient.
Independent test results.
| S/N | Conv1D/Maxpool | GRU | Dense | Type | Acc (%) | MCC | Sn (%) | Sp (%) | AUC (%) |
|---|---|---|---|---|---|---|---|---|---|
| Layer 1 | |||||||||
| #1 | - | 2 × 16 | 1 × 16 | Single | 74 | 0.48 | 75.00 | 73.00 | 79.63 |
| #2 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | Single | 73.75 | 0.475 | 75.00 | 72.50 | 80.86 |
| #3 | - | 2 × 16 | 1 × 16 | Ensemble | 75.25 | 0.506 | 73.00 | 77.50 | 76.16 |
| #4 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | Ensemble | 75.5 | 0.51 | 75.50 | 76.00 | 77.04 |
| Layer 2 | |||||||||
| #5 | - | 2 × 16 | 1 × 16 | Single | 60.96 | 0.100 | 86.52 | 21.05 | 58.57 |
| #6 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | Single | 68.49 | 0.312 | 83.15 | 45.61 | 67.14 |
| #7 | - | 2 × 16 | 1 × 16 | Ensemble | 58.90 | 0.071 | 79.78 | 26.32 | 53.19 |
| #8 | 1 × 16(9)/2 | 2 × 16b | 1 × 8 | Ensemble | 62.33 | 0.201 | 70.79 | 49.12 | 60.48 |
MCC = Matthews Correlation Coefficient. b means bi-directional.
Figure 4Receiver Operating Characteristic (ROC) Curve for all single and ensemble models. (a) Layer 1 classification, (b) Layer 2 classification.
Independent test results between our proposed method and the other state-of-the-art predictors.
| Predictors | Acc (%) | MCC | Sn (%) | Sp (%) | AUC (%) |
|---|---|---|---|---|---|
|
| |||||
| Ours | 75.50 | 0.510 | 75.5 | 76.0 | 77.04 |
| iEnhancer-EL | 74.75 | 0.496 | 71.0 | 78.5 | 81.73 |
| iEnhancer-2L | 73.00 | 0.460 | 71.0 | 75.0 | 80.62 |
| EnhancerPred | 74.00 | 0.480 | 73.5 | 74.5 | 80.13 |
|
| |||||
| Ours | 68.49 | 0.312 | 83.15 | 45.61 | 67.14 |
| iEnhancer-EL | 61.00 | 0.222 | 54.00 | 68.00 | 68.01 |
| iEnhancer-2L | 60.50 | 0.218 | 47.00 | 74.00 | 66.78 |
| EnhancerPred | 55.00 | 0.102 | 45.00 | 65.00 | 57.90 |
Figure 5Comparative performance among different predictors. (a) Comparison on layer 1, (b) Comparison on layer 2.