| Literature DB >> 32637044 |
Abstract
With the evolution of biotechnology and the introduction of the high throughput sequencing, researchers have the ability to produce and analyze vast amounts of genomics data. Since genomics produce big data, most of the bioinformatics algorithms are based on machine learning methodologies, and lately deep learning, to identify patterns, make predictions and model the progression or treatment of a disease. Advances in deep learning created an unprecedented momentum in biomedical informatics and have given rise to new bioinformatics and computational biology research areas. It is evident that deep learning models can provide higher accuracies in specific tasks of genomics than the state of the art methodologies. Given the growing trend on the application of deep learning architectures in genomics research, in this mini review we outline the most prominent models, we highlight possible pitfalls and discuss future directions. We foresee deep learning accelerating changes in the area of genomics, especially for multi-scale and multimodal data analysis for precision medicine.Entities:
Keywords: Bioinformatics; Computational biology; Deep learning; Gene expression and regulation; Genomics; Precision medicine
Year: 2020 PMID: 32637044 PMCID: PMC7327302 DOI: 10.1016/j.csbj.2020.06.017
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Architecture of the main deep learning models.
. List of deep learning methodologies in genomics. From left to right the columns represent the DL model acronym (if any), the respective publication, DL model, omics data used as input, prediction/research question, evaluation metrics and the comparison with other classic ML methods (if any).
| Name | Publication | DL model | omics data | Purpose / Prediction | accuracy | performance gap over other methods |
|---|---|---|---|---|---|---|
| DeepTarget | RNN | miRNA-mRNA pairing | target prediction | 0,96 | +25% f-measure | |
| DeepMirGene | LSTM | positive pre-miRNA and non-miRNA | miRNA target | 0.89 sensitivity | +4% f-measure | |
| DeepNet | ANN | RNA-Seq | control-cases | ~0.7 | same or worst AUC from LASSO | |
| AE | time-series gene expression | pre-processing step for clustering | Better than PCA | |||
| AE | cDNA microarrays | Predict the organization of transcriptomic machinery | – | significant overlap with previous studies | ||
| ADAGE | AE | gene expression | identification/reconstruction of biological signals | – | significant overlap with post-hoc analysis KEGG | |
| eADAGE | AE | gene expression | identification of biological patterns | – | significant overlap with post-hoc analysis KEGG | |
| D-GEX | RNN | expression of landmark genes | Gene expression inference | overall error 0.3204 ± 0.0879 | Outperforms Linear Regression(LR) (+15.33%) and KNN-GE in most of the target genes | |
| DeepChrome | CNN | histone modifications | classify gene expression | Average area under the curve (AUC) = 0.80 | (+5%) from support vector machines (SVM), (+21% from random forest (RF) | |
| AttentiveChrome | LSTM | histone modifications | classify gene expression | Average AUC = 0.81 | Marginally better than DeepChrome | |
| Multimodal deep belief network | DBN | gene expression, DNA methylation and miRNA expression | Identification of Key Genes and miRNAs | average correlations 0.91, 0.73 and 0.69 for the GE, DM and ME | – | |
| DeepVariant | CNN | whole-genome sequence | variant caller | 99,45% F1 | produced more accurate results with greater consistency across a variety of quality metrics | |
| ANN | cell-line with drug response | predict drug response | 0.65 AUC | Outperformed FR 0.54 AUC and elastic nets 0.51 AUC | ||
| DeepFIGV | CNN | whole-genome sequence | predict quantitative epigenetic variation | z-scores DNase rho = 0.0802, P = 5.32e–16 | ||
| DeePathology | Multiple AEs | mRNA and miRNA | predict tissue-of-origin, normal or disease state and cancer type | 99.4% accuracy for cancer subtype | 95.1% for SVM | |
| DeepCpG | CNN | Single cell methylation | predicts missing methylation states and detects sequence motifs | 89% AUC | 86% AUC for Random Forest | |
| CNNC | CNN | scRNA-seq | predicting transcription factor target | ~70% accuracy for multiple experiments | Outperformed GBA (guilt by association) and DNN (fully connected DL) across a variety of experiments | |
| DanQ | CNN and RNN | DNA-seq | predicting the function of DNA directly from sequence alone | AUC score ~ 70% | Outperformed LR and DeepSEA (CNN DL), with over 10% improvement in AUC | |
| FBGAN | GANs | DNA-seq | optimize the synthetic gene sequences | Train accuracy 0.94 test accuracy 0.84 | Outperformed kmer and Wasserstein GAN trained directly on AMPs |
Fig. 2Multi level and multi scale -omics models.