| Literature DB >> 31330861 |
Miguel Pérez-Enciso1,2, Laura M Zingaretti3.
Abstract
Deep learning (DL) has emerged as a powerful tool to make accurate predictions from complex data such as image, text, or video. However, its ability to predict phenotypic values from molecular data is less well studied. Here, we describe the theoretical foundations of DL and provide a generic code that can be easily modified to suit specific needs. DL comprises a wide variety of algorithms which depend on numerous hyperparameters. Careful optimization of hyperparameter values is critical to avoid overfitting. Among the DL architectures currently tested in genomic prediction, convolutional neural networks (CNNs) seem more promising than multilayer perceptrons (MLPs). A limitation of DL is in interpreting the results. This may not be relevant for genomic prediction in plant or animal breeding but can be critical when deciding the genetic risk to a disease. Although DL technologies are not "plug-and-play", they are easily implemented using Keras and TensorFlow public software. To illustrate the principles described here, we implemented a Keras-based code in GitHub.Entities:
Keywords: deep learning; genomic prediction; machine learning
Year: 2019 PMID: 31330861 PMCID: PMC6678200 DOI: 10.3390/genes10070553
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Some usual terms in deep learning (DL) methodology.
| Term | Definition |
|---|---|
| Activation function | The mathematical function f that produces neuron’s output f( |
| Backpropagation | Backpropagation is an efficient algorithm to compute the loss, it propagates the error at the output layer level backward. |
| Batch | In stochastic gradient Ddescent (SGD) algorithm, each of the sample partitions within a given epoch. |
| Convolution | Mathematically, a convolution is defined as an “integral transform” between two functions, where one of the functions must be a kernel. The discrete version of the operation is simply the weighting sum of several copies of the original function ( |
| Convolutional neural network | A CNN is a special case of neural networks which uses convolution instead a full matrix multiplication in the hidden layers. A typical CNN is made up of dense fully connected layers and “convolutional layers”. |
| Dropout | Dropout means that a given percentage of neurons output is set to zero. The percentage is kept constant, but the specific neurons are randomly sampled in every iteration. The goal of dropout is to avoid overfitting. |
| Early stopping | An anti-overfitting strategy that consists of stopping the algorithm before it converges. |
| Epoch | In SGD and related algorithms, an iteration comprising all batches in a given partition. In the next epoch, a different partition is employed. |
| Kernel = Filter = Tensor | In DL terminology, the kernel is a multidimensional array of weights. |
| Generative adversarial network (GAN) | GANs are based on a simple idea: train two networks simultaneously, the generator (G), which defines a probability distribution based on the information from the samples, and the discriminator (D), which distinguishes data produced by G from the real data. |
| Learning rate | Specify the speed of gradient update (α in Algorithm 1). |
| Loss | Loss function measures how differences between observed and predicted target variables are quantified. |
| Neuron | The basic unit of a DL algorithm. A “neuron” takes as input a list of variable values ( |
| Neuron layer | “Neurons” are arranged in layers, i.e., groups of neurons that take the output of previous group of neurons as input ( |
| Multilayer perceptron (MLP) | Multilayer perceptron network is one of the most popular NN architectures, which consists of a series of fully connected layers, called input, hidden, and output layers. The layers are connected by a directed graph. |
| Optimizer | An algorithm to find weights ( |
| Pooling | A pooling function substitutes the output of a network at a certain location with a summary statistic of the neighboring outputs. This is one of the crucial steps on the CNN architecture. The most common pooling operations are maximum, mean, and median. |
| Recurrent neural Network (RNN) | RNN architecture considers information from multiple previous layers. In an RNN, the current hidden layer is a nonlinear function of both the previous layer(s) and the current input (x). The model has memory since the bias term is based on the “past”. These networks can be used in temporal-like data structures. |
| Stochastic gradient descent (SGD) | An optimizing algorithm that consists of randomly partitioning the whole dataset into subsets called “batches” or “minibatches” and updates the gradient using only that data subset. The next batch is used in the next iteration. |
| Weight regularization | An excess of parameters (weights, |
Figure 1Multi-Layer Perceptron (MLP) diagram with four hidden layers and a collection of single nucleotide polymorphisms (SNPs) as input and illustrates a basic “neuron” with n inputs. One neuron is the result of applying the nonlinear transformations of linear combinations (xi, wi, and biases b). These figures were redrawn from tikz code in http://www.texample.net/tikz/examples/neural-network.
Figure 2(a) Simple scheme of a one-dimension (1D) convolutional operation. (b) Full representation of a 1D convolutional neural network for a SNP-matrix. The convolution outputs are represented in yellow. Pooling layers after convolutional operations combining the output of the previous layer at certain locations into a single neuron are represented in green. The final output is a standard MLP.
Figure 3Scheme of recurrent neural networks (RNNs): The left part of the image (in colors) shows the whole network structure; whereas, the recursive structure of the network is shown in the right, where represents inputs, h are the hidden layers, o = are the outputs (Equation (4b)), y are the target variables, and L is the loss function.
Figure 4Scheme of generative adversarial networks (GANs). The generator (G), defines a probability distribution based on the information from the samples, whereas, the discriminator (D) distinguishes data produced by G from the real data. The figure was redrawn using code from http://www.texample.net/tikz/examples/neural-network.
Figure 5Correlations between observed and predicted phenotypes in the validation dataset as a function of the number of layers and of number of neurons per layer for the CYMMIT wheat data. Each dot corresponds to a single phenotype in the validation dataset. The best combination was 32 neurons and four hidden layers (correlation = 0.55).
Figure 6Box-plot representing loss values on the validation set with a convolutional neural network (CNN) architecture using different kernel sizes (3, 5, and 7), strides (1, 2, and 5), and number of filters (16 and 32).
Figure 7Box-plot representing the loss in the validation set vs. the number of neurons using RNN architecture.
Applications of deep learning to genomic prediction.
| Study | Species | Approx. N | Approx. No. SNPs | Performance * |
|---|---|---|---|---|
| Mcdowell [ | Arabidopsis, Maize, wheat | 270–400 | 70–1k | MLP ≥ PL |
| Liu and Wang [ | Soybean | 5k | 4k | CNN > RR-BLUP, Lasso-Bayes, Bayes A |
| Rachmatia et al. [ | Maize | 300 | 1k | PL > DBN |
| Bellot et al. [ | Human | 100k | 10k–50k | PL ≥ CNN > MLP |
| Ma et al. [ | Wheat | 2k | 33k | CNN ~ PL ~ GBLUP > MLP |
| Montesinos-López et al. [ | Maize, wheat | 250–2k | 12k–160k | GBLUP > MLP |
| Montesinos-López et al. [ | Wheat | 800–4k | 2k | GBLUP > MLP |
| Khaki and Wang [ | Maize | 2k genotypes (150k samples) | 20k | DL > PL |
| Waldmann [ | pig | 3226 (simulated) | 10k, 50k | DL > GBLUP/BayesLasso |
* PL, penalized linear method; DBN, deep belief network.