| Literature DB >> 30124358 |
Dmitry Grapov1, Johannes Fahrmann2, Kwanjeera Wanichthanarak3,4, Sakda Khoomrung3,4.
Abstract
Machine learning (ML) is being ubiquitously incorporated into everyday products such as Internet search, email spam filters, product recommendations, image classification, and speech recognition. New approaches for highly integrated manufacturing and automation such as the Industry 4.0 and the Internet of things are also converging with ML methodologies. Many approaches incorporate complex artificial neural network architectures and are collectively referred to as deep learning (DL) applications. These methods have been shown capable of representing and learning predictable relationships in many diverse forms of data and hold promise for transforming the future of omics research and applications in precision medicine. Omics and electronic health record data pose considerable challenges for DL. This is due to many factors such as low signal to noise, analytical variance, and complex data integration requirements. However, DL models have already been shown capable of both improving the ease of data encoding and predictive model performance over alternative approaches. It may not be surprising that concepts encountered in DL share similarities with those observed in biological message relay systems such as gene, protein, and metabolite networks. This expert review examines the challenges and opportunities for DL at a systems and biological scale for a precision medicine readership.Entities:
Keywords: artificial intelligence; biomarkers; deep learning; machine learning; multiomics data integration; precision medicine
Mesh:
Year: 2018 PMID: 30124358 PMCID: PMC6207407 DOI: 10.1089/omi.2018.0097
Source DB: PubMed Journal: OMICS ISSN: 1536-2310

Multiomics data integration utilizes empirical, functional, and other techniques to combine information from multiple omics domains. This systems approach enables robust characterization of biochemical signatures reflective of organismal phenotypes.

DL architectures may provide unique opportunities to encode locally optimal predictors in a variety of organisms (cellular, mouse, primate, and human) and then integrate their representations of omics layers. Through transfer learning, researchers may leverage larger expert-derived models to improve DL performance for their smaller data sets.
Deep Learning Architectures and Approaches for Omics Analysis
| CNN | Hierarchical architecture commonly used for image classification | Multidimensional arrays such as DNA-seq, DNase-seq, protein-binding microarrays, and ChIP-seq |
| Includes convolution and pooling layers (Miotto et al., | ||
| Detection of locally and globally consistent features in the data (Min et al., | ||
| Strength: established architectures useful for encoding complex local and global interactions (e.g., relationships between DNA motifs) (Angermueller et al., | ||
| RNN | Sequential architecture useful for text and time series data (Wenpeng et al., | Sequential data such as genomic sequences or natural language |
| Cyclic connections share information from previous and current state (Min et al., | Prediction of protein structure, gene expression regulation, protein homology, and DNA methylation (Angermueller et al., | |
| Strength: identification of latent relationships in sequential (Angermueller et al., | ||
| AE | Unsupervised learning | Genome-scale omics data such as gene expression data |
| DNN-MDA (Date and Kikuchi, | Application of DNN for construction of classification and regression models, and estimation of variable importance by an MDA | NMR-based metabolite profiling |
| Strength: estimation of variable importance | ||
| DeepNovo (Tran et al., | Integrating CNN and LSTM RNN | Tandem mass spectra of proteomics data |
| Strength: combining useful features from CNN and RNN | Prediction of novel peptide sequence |
AE, autoencoder; CNN, convolutional neural network; DNN, deep neural network; LSTM, long short-term memory; MDA, mean decrease accuracy; NMR, nuclear magnetic resonance; RNN, recurrent neural network.

DL model architectures and training techniques share many similarities with biological message passing systems. DL models contain a minimum of three layers: input, hidden, and output. This could mimic representation of relationships between gene transcription, protein expression, and metabolite concentrations, but can also extend other omics layers. Interesting parallels between computational and biological optimizations such as backward propagation in DL and signal inhibition in omics have also emerged.

Personalized medicine is a quickly growing area of research that requires complex data encoding and integration tasks, which are well suited for DL.