| Literature DB >> 33059075 |
Bertil Schmidt1, Andreas Hildebrandt2.
Abstract
Next-generation sequencing (NGS) methods lie at the heart of large parts of biological and medical research. Their fundamental importance has created a continuously increasing demand for processing and analysis methods of the data sets produced, addressing questions such as variant calling, metagenomic classification and quantification, genomic feature detection, or downstream analysis in larger biological or medical contexts. In addition to classical algorithmic approaches, machine-learning (ML) techniques are often used for such tasks. In particular, deep learning (DL) methods that use multilayered artificial neural networks (ANNs) for supervised, semisupervised, and unsupervised learning have gained significant traction for such applications. Here, we highlight important network architectures, application areas, and DL frameworks in a NGS context.Entities:
Mesh:
Year: 2020 PMID: 33059075 PMCID: PMC7550123 DOI: 10.1016/j.drudis.2020.10.002
Source DB: PubMed Journal: Drug Discov Today ISSN: 1359-6446 Impact factor: 7.851
Figure 1Overview of ANN architectures: (a) An artificial neuron maps an input vector x, 0≤i≤n, to a scalar output y by applying a nonlinear activation function φ to a weighted sum . (b) A multilayer perceptron (MLP) comprising an input layer, a fully connected hidden layer, and an output layer. (c) A single layer of a convolutional neural network (CNN), where matrix multiplication is replaced by a convolution with a small filter kernel matrix, the entries of which are learned during training followed by a ReLu activation function and (max)pooling. (d) Recurrent neural networks (RNNs) feature feedback connections to earlier layers and can be trained to learn time-dependent relations. (e) Autoencoders (AEs) are designed to identify useful data encodings in an unsupervised setting. (f) Generative adversarial networks (GANs) train two networks simultaneously. The generator produces new data points, whereas the discriminator classifies data points as either genuine or fake.
Summary of DL methods for the analysis of NGS data in the four selected application areas
| Application area | Method | NN architecture | Framework | Refs |
|---|---|---|---|---|
| Variant calling | Deep Variant | CNN | Nucleus and TF | |
| NeuSomatic | CNN | PyTorch | ||
| Clairvoyante | CNN | TF | ||
| Clair | RNN | TF | ||
| DeepSC | CNN | TF | ||
| CNNScoreVariants | CNN | TF | ||
| Metagenomics | DeepMicrobes | LSTM | TF | |
| seq2species | CNN | TF | ||
| GeNet | CNN | TF | ||
| Meta2 | MIL | – | ||
| Transcriptomics | AutoImpute | AE | TF | |
| DCA | AE | Keras and TF | ||
| scScope | AE | TF | ||
| scvis | AE | TF | ||
| scDeepCluster | AE | Keras and TF | ||
| DeepImpute | MLP | Keras and TF | ||
| scIGain | GAN | PyTorch | ||
| Epigenetics | DeepCpG | CNN & RNN | Keras and Theano | |
| MRCNN | CNN | Keras and TF | ||
| FunDMDeep-m6A | CNN | Keras and TF | ||
| DeepSEA | CNN | Torch | ||
| DeepBind | CNN | C++ and Python | ||
| DanQ | CNN & LSTM | Keras and TF | ||
| DeepHistone | CNN | PyTorch | ||
| DeepLift | CNN | Keras and TF |