| Literature DB >> 34031788 |
Abstract
Convolutional neural networks (CNNs) have been used to extract information from various datasets of different dimensions. This approach has led to accurate interpretations in several subfields of biological research, like pharmacogenomics, addressing issues previously faced by other computational methods. With the rising attention for personalized and precision medicine, scientists and clinicians have now turned to artificial intelligence systems to provide them with solutions for therapeutics development. CNNs have already provided valuable insights into biological data transformation. Due to the rise of interest in precision and personalized medicine, in this review, we have provided a brief overview of the possibilities of implementing CNNs as an effective tool for analyzing one-dimensional biological data, such as nucleotide and protein sequences, as well as small molecular data, e.g., simplified molecular-input line-entry specification, InChI, binary fingerprints, etc., to categorize the models based on their objective and also highlight various challenges. The review is organized into specific research domains that participate in pharmacogenomics for a more comprehensive understanding. Furthermore, the future intentions of deep learning are outlined.Entities:
Keywords: CNN; Convolutional neural networks; One-dimensional data; Pharmacogenomics; SMILES
Mesh:
Substances:
Year: 2021 PMID: 34031788 PMCID: PMC8342355 DOI: 10.1007/s11030-021-10225-3
Source DB: PubMed Journal: Mol Divers ISSN: 1381-1991 Impact factor: 3.364
Fig.1Classification of CNN methods into five major subdomains, each corresponding to the final objective of the analysis
Fig.2Basic architecture of a CNN. The input layer extracts information from the input sequence by multiplying with weights. The subsequent layers perform the function of convolution and pooling, wherein these layers extract local information and pool it, reducing dimensions of the sequence vector. Fully connected layers have its nodes connected to all the nodes in the previous layer. The final activation function outputs the sequence classification. This predicted value is compared to the actual annotated value when the model is being trained. The prediction errors are assessed, and the model undergoes back propagation iteratively to update the existing parameters each time to reduce the errors in prediction until the values converge
Fig.3The production of datasets. (a) Different techniques involved to create annotations on the sequences. Some of these techniques include ChIP-seq to identify protein binding sites, mass spectrometry to identify protein/drug structures or qPCR to quantify the gene expression. (b) Annotated sequences, SMILES codes or interaction networks uploaded to various databases like Protein Data Bank (PDB), DrugBank, or large-scale projects like ENCODE, Roadmap Epigenomics. (c) Obtained annotated sequences, SMILES representation present in databases or medical texts containing unstructured data of drug–target or drug–drug interactions
A summary of CNN models along with the applications and challenges
| Sl No | Model name | Description | Applications | Dataset ( | Challenges addressed | References |
|---|---|---|---|---|---|---|
| 1 | DeepSEA | 3 layers with kernel numbers 320, 480 and 960, respectively | Predicting effects of non-coding variants, transcription factor binding, DNase I sensitivity, and histone marks | Achieving single-nucleotide sensitivity; Flexibility in the model to address more complex mechanisms involved | [ | |
| 2 | DeepVariant | Learning rate 0.0015; momentum of 0.8 and the output layer being a three-class Softmax classifier | Variant calling in sequencing technologies | Manual adjustment of features in statistical models, assumption that the read errors are independent | [ | |
| 3 | NeuSomatic | 9 convolutional layers; initial learning rate and momentum of 0.01 and 0.9, respectively | Identification of the length and type of a somatic mutation | Achieved the best accuracy when compared to all the other tested models across multiple datasets for different tumor purities | [ | |
| 4 | Cancer classification | 2 hidden layers with sigmoid activation function in the output layer | Classification of Leukemia, Adenocarcinoma, Breast cancer, ovarian cancer | Data insufficiency problem | [ | |
| 5 | Detecting SNP sites | Bi-stream CNN model with 8 hidden layers that includes 4 convolutional layers; learning rate 0.01 and momentum 0.9; final fully connected layer consists of 512 nodes | Applied to datasets with human Down Syndrome samples | Limited number of machine learning algorithms available for human Down Syndrome studies | [ | |
| 6 | Deopen | CNN-three-layer FNN hybrid; filter size 20 and learning rate 0.7 | Prediction of chromatin accessibility and identification of functionally influencing SNPs | Greater ability to capture regulatory codes of DNA, potential to identify the impact of non-coding variants on gene expression | [ | |
| 7 | Identification of the conserved sequence motifs | - | Applied to enhancers across different mammalian species | Generalizing the model for all species while being trained for only a single species | [ | |
| 8 | iEnhancer-ECNN | 6 convolutional layers and two fully connected layer with 768 and 256 nodes; 0.0001 learning and 20 epochs | Prediction of enhancers | Low Matthews correlation coefficients (MCCs) | [ | |
| 9 | BiREN | 3 convolutional layers with the first consisting of 320 kernels; 925 nodes in fully connected layer | Prediction of enhancers | Limited availability of enhancer data | [ | |
| 10 | DeepEnhancer | 4 convolutional layers with the first containing 128 kernels of size 1 × 8; Final fully connected layer of 128 nodes; 0.5 dropout rate; learning rate as 0.0001 and 30 epochs | Prediction of enhancers | Failure to record sophisticated features from enhancer sequences | [ | |
| 11 | CNNProm | 1 convolutional layer with 200 filters; fully connected layer of 128 nodes; 5 epochs | Classification of promoter sequences, given RNA samples | Poorly recorded universal characteristics of promoters | [ | |
| 12 | DeeReCT-PromID | 2 convolutional layers with filter length 15; dropout rate 0.5 | Identifying RNA polymerase II core promoters in human RNA sequences | Learning patterns for longer input sequences | [ | |
| 13 | Xpresso | 2 convolutional and fully connected layers; 10 epochs and a dropout rate of 0.5 | Evaluation of mRNA expression levels | The degree to which promoter sequences influence gene expression levels was unanswered | [ | |
| 14 | DNA binding site prediction | 4 convolutional layers with filter sizes 9 × 1 and 7 × 1; Run for 100 iterations | DNA–protein binding sites datasets | Improved sensitivity, specificity, and accuracy than the models compared alongside | [ | |
| 15 | DeepBind | Motif lengths of 14, 20, 24, 32; learning rate and momentum in the ranges 0.0005–0.5 and 0.95–0.99, respectively | Identification of DNA-/RNA- binding sites; examination of SNVs in promoters | Applied to microarray and sequencing data; toleration of noise and mislabeled data; Automatic calibration of the parameter | [ | |
| 16 | DeepDBP-CNN | Convolutional layer uses 128 filters of size L × 31 to extract 128 feature maps (L is the length of the vector) | Identifying DNA-binding proteins | Manual feature extraction from other models | [ | |
| 17 | iDeepE | 2 layers of convolutional, max pooling and fully connected layers; filter length 16 and learning rates 0.001 and 0.0001 | RNA binding protein (RPB) binding site prediction | Extracting crucial information from local sequences | [ | |
| 18 | iDeepS | Epoch set as 30 and filter length 10 | RBP binding site prediction | Detection of sites in structure motifs was not possible in iDeepE | [ | |
| 19 | Calculation of KD values | 3 hidden layers with 12-nucleotide k-mer | Identification of miRNA target sites | Calculation of the relative KD for sequences of length ≤ 12 nucleotides | [ | |
| 20 | QSAR model | 2 convolutional layers, 5 max pooling layers, 2 fully connected layers | Identifying chemical molecules that target a given protein | SMILES codes can be represented as fixed-size features | [ | |
| 21 | FP2VEC CNN | 1 convolutional layer, max pooling, fully connected layer each; dropout rate 0.5 | QSAR model to predict the biological activity and properties of chemical compounds | Fast and training, high accuracy and effective as a multitask learning method | [ | |
| 22 | DeepACTION | Learning rate of 0.0001; 1483-dimensional feature vector | DTI prediction model | Integrated MMIB to handle imbalanced datasets and LASSO for high-dimensional data | [ | |
| 23 | Transformer-CNN | 100 epochs; learning rate 0.001; < 100 iterations | QSAR model to predict the biological activity and properties of chemical compounds | No adjustable parameters, so less overfitting | [ | |
| 24 | DeepDTA | 2 CNN blocks, each with 3 convolutional layers, 1 max pooling layer, 3 fully connected layers; dropout rate 0.1; learning rate of 0.001; 100 epochs | PCM model to predict drug–target interactions | Produces better accuracy with only raw sequences of compounds than methods that included structural data | [ | |
| 25 | FRnet-DTI | FRnet-Encode: 2 fully connected layers Learning rate of 0.001; Dropout rate of 0.5 | Two model architecture for DTI; FRnet-Encode for feature extraction and FRnet-Predict for classification problem | Boosted an improved accuracy, although not the best from the models tested | [ | |
| 26 | Attention-based multi- scale convolutional encoder | 4 convolutional layers | Predicting drug sensitivity (IC50) values for a chemical compound | Higher significance of results produced due to strict training and evaluation; the cells and compounds were split and did not see each other during training | [ | |
| 27 | DeepPurpose | - | DTI prediction model that uses CNN on SMILES strings | Availability of a web interface | [ | |
| 28 | ConvS2S | Learning rate 0.00001 | Predicting compound's aqueous solubility | No structural data, or ‘engineered features’ that if present, limit the applicability of the model | [ | |
| 29 | DeepConv-DTI | Learning rate 0.0001; 15 epochs; dropout rate of 0 | Detecting protein binding sites for drug–target interactions | Since protein structures are limited, an input of raw protein sequences provides a larger training dataset | [ | |
| 30 | DTI-CNN | 1 of each convolutional, max-pooling and fully connected layers; Convolutional layer consisting of 4 kernels; Learning rate of 0.001, dropout rate 0.5 and 35 epochs | Constructing heterogeneous networks of protein and drugs for DTI prediction | Dimensional reduction and improved accuracy | [ | |
| 31 | DDI extraction model | A ‘look-up’ table layer for position and word embedding representation; 3 hidden layers; dropout rate of 0.5 | DDI extraction from medical literature | First ever CNN model for DDI extraction, improved accuracy than other machine learning methods | [ | |
| 32 | Multi-channel CNN for DDI extraction | - | DDI extraction model consisting of multi-channels | Maximum coverage of sentences due to multi-channels | [ | |
| 33 | DDI extraction model | 1 of each convolutional, max-pooling and fully connected layers; 200 filters of each window size; dropout rate of 0.5; maximum sentence length of 128; 27 epochs | DDI extraction without using any external features | No external features, hence, the improved reliability on the learning process | [ | |
| 34 | Two stage learning Bi-LSTM CNN model | 1 of each convolutional, max-pooling and fully connected layers; 200 filters of each window size; dropout rate of 0.5, learning rate of 0.001 | DDI extraction from English and Spanish medical texts | Outperformed complex CNN models of 10 layers; can be used on different languages | [ | |
| 35 | SGRU-CNN | 1 of each convolutional, max-pooling and fully connected layers; maximum sentence length of 186; learning rate of 0.0005 and dropout rate of 0.8; feature vector dimensions: position embeddings as 50 and word embeddings as 300 | DDI extraction from medical literature | No external features or any linguistic tools | [ | |
| 36 | AGCN | 1 of each convolutional, max-pooling and fully connected layers; dropout rate of 0.5 | DDI extraction from medical literature | A self-attention technique to ignore irrelevant information | [ | |
| 37 | RHCNN | An embedding layer used, similar to the ‘look-up’ table layer; 2 of each convolutional and max pooling layers; dropout rate of 0.5 | DDI extraction from medical literature | Novel method of using dilated convolutions for the given dataset | [ |