| Literature DB >> 35765652 |
Lingling Zhao1, Yan Zhu1, Junjie Wang2, Naifeng Wen3, Chunyu Wang1, Liang Cheng4,5.
Abstract
The task of identifying protein-ligand interactions (PLIs) plays a prominent role in the field of drug discovery. However, it is infeasible to identify potential PLIs via costly and laborious in vitro experiments. There is a need to develop PLI computational prediction approaches to speed up the drug discovery process. In this review, we summarize a brief introduction to various computation-based PLIs. We discuss these approaches, in particular, machine learning-based methods, with illustrations of different emphases based on mainstream trends. Moreover, we analyzed three research dynamics that can be further explored in future studies.Entities:
Keywords: Drug discovery; Drug-target binding affinity; Machine learning; Protein–ligand interactions
Year: 2022 PMID: 35765652 PMCID: PMC9189993 DOI: 10.1016/j.csbj.2022.06.004
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Workflow of ML methods used in PLI prediction, including (a) benchmark data collection and preprocessing; (b) framework building and model training; and (c) model evaluation.
PLI prediction methods as classification tasks based on the ML framework in recent yearsa.
| DeepDTIs | 03/2017 | Protein sequence composition descriptors | Extended | – | – | DBN |
| DDR | 01/2018 | Similarity measures | Similarity measures | – | – | RF |
| CPI-GNN | 07/2018 | N-gram amino acids | Molecular graphs | CNN | GNN | Softmax classifier |
| DeepConv-DTI | 06/2019 | Local residue patterns | PubChem fingerprints | Convolution and global max-pooling layers | Fully connected layer | Fully connected layer |
| DTI-CDF | 12/2019 | Similarity-based features | Similarity-based features | – | – | Cascade deep forest |
| DEEPScreen | 01/2020 | – | 2-D compound images | – | Convolutional and pooling layers | Fully connected layers |
| TransformerCPI | 05/2020 | Amino acid sequence | CNN | Graph structure | GCNs | Transformer with self-attention mechanism |
| DTI-CNN | 08/2020 | Similarity matrix | Similarity matrix | Random walk with restart | Random walk with restart | Fully connected layer |
| MolTrans | 10/2020 | Substructure | Substructure | Transformer encoder | Transformer encoder | Linear layer |
| BridgeDPI | 02/2021 | K-mer/sequence features | Fingerprint/sequence features | Perceptron layers | Perceptron layers | GNN and a full connected layer |
| CSConv2d | 04/2021 | – | 2-D structural representations | – | A channel and spatial attention mechanism | Fully connected layer |
| GADTI | 04/2021 | Similarity data | Similarity data | Heterogeneous network | Heterogeneous network | Graph autoencoder |
| LGDTI | 04/2021 | K-mer | Molecular fingerprint | Graph convolutional network and DeepWalk | Graph convolutional network and DeepWalk | RF |
| PretrainDPI | 05/2021 | Pretrained models | Molecular graph | CNN | GraphNet | Fully connected layers |
| X-DPI | 06/2021 | Structure and sequence features | Atomic features | TAPE embedding | Mol2vec embedding | Transformer decoder |
| MultiDTI | 07/2021 | N-gram embedding | N-gram embedding | Deep downsampling residual module | Deep downsampling residual module | Multilayer perceptron |
| HyperAttentionDTI | 10/2021 | Amino acid sequences | SMILES strings | CNN and attention mechanism | CNN and attention mechanism | Fully connected layer |
| DTIHNC | 02/2022 | Protein-protein interactions, protein-disease associations | Drug-drug interactions, drug-disease associations, drug-side-effects associations | Denoising autoencoder | Denoising autoencoder | CNN module |
| HIDTI | 03/2022 | Protein sequences, protein–protein similarities, protein–protein interactions, protein-disease interactions | SMILES strings, drug-drug interactions, drug-side effect associations, drug- | A residual block | A residual block | Fully connected layers |
| HGDTI | 04/2022 | Node features encoding (interactions, similarities, associations) | Node features encoding (interactions, similarities, associations) | BiLSTM | BiLSTM | Fully connected layers |
Note: “-” in the table indicates that there is no such information in the corresponding article.
Abbreviations: DBN – deep belief network; RF – random forest; CNN – convolutional neural network; GNN – graph neural network; GCNs – graph convolutional networks; TAPE – tasks assessing protein embeddings; SMILES – simplified molecular-input line-entry system; BiLSTM – bidirectional long short-term memory;
URL addresses for the listed tools: DeepDTIs – ; DDR – ; CPI-GNN – ; DeepConv-DTI – ; DTI-CDF – ; DEEPscreen – ; transformerCPI – ; DTI-CNN – ; MolTrans – ; BridgeDPI – ; CSConv2d – https://doi.org/10.4121/uuid:547e8014-d662-4852–9840-c1ef065d03ef; GADTI – ; PretrainDPI – ; MultiDTI – ; HyperAttentionDTI – https://github.com/zhaoqichang/HpyerAttentionDTI; DTIHNC – https://github.com/ningq669/DTIHNC; HIDTI – https://github.com/DMCB-GIST/HIDTI; HGDTI – https://bioinfo.jcu.edu.cn/hgdti.
PLI prediction methods as regression tasks based on the ML framework in recent yearsa.
| SimBoost | 04/2017 | Target similarity | Drug similarity | – | – | Gradient boosting tree model |
| ACNN | 2017 | Atomic coordinates | Atomic coordinates | Atomic convolution layer | Atomic convolution layer | Atomic fully connected layer |
| DeepDTA | 09/2018 | Label encoding | Label encoding | CNN blocks | CNN blocks | Fully connected layer |
| DeepAffinity | 02/2019 | Structural property sequence representation | Structural property sequence representation | Seq2seq autoencoders | Seq2seq autoencoders | Unified RNN-CNN |
| WideDTA | 02/2019 | Textual information | Textual information | CNN blocks | CNN blocks | Fully connected layers |
| GraphDTA | 06/2019 | One-hot encoding | Molecular graph | Convolutional layers | 4 graph neural network variants | Fully connected layers |
| RFScore | 08/2019 | 36 intermolecular features | 36 intermolecular features | – | – | Random forest |
| AttentionDTA | 11/2019 | Label encoding | Label encoding | CNN block | CNN block | Attention block- fully connected layers |
| Taba | 01/2020 | The average distance between pairs of atoms | The average distance between pairs of atoms | – | – | Machine-learning model |
| GAT_GCN | 04/2020 | Peptide frequency | Graph structure | CNN | GCN | Fully connected layers |
| SAnDReS | 05/2020 | Docking scores | Docking scores | – | – | Machine-learning model |
| DeepCDA | 05/2020 | N-gram embedding | SMILES sequence | CNN-LSTM-Two-sided attention mechanism | CNN-LSTM-Two-sided attention mechanism | Fully connected layers |
| DGraphDTA | 06/2020 | Protein graph | Molecular graph | GNN | GNN | Fully connected layers |
| JoVA | 08/2020 | Multiple unimodal representations | Multiple unimodal representations | Joint view attention module | Joint view attention module | Prediction model |
| Fusion | 11/2020 | Atomic representation | Atomic representation | CNNs | SG-GCNs | Fully connected layers |
| DeepGS | 2020 | Symbolic sequences | Molecular structure | Prot2Vec-CNN-BiGRU blocks | Smi2Vec-CNN-BiGRU blocks | Fully connected layer |
| DeepDTAF | 01/2021 | Sequence, structural property information | SMILES string | Dilated/traditional convolution layers | Dilated convolution layers | Fully connected layers |
| GanDTI | 03/2021 | Protein sequences | Molecule fingerprints-adjacency matrix | Attention module | Residual graph neural network | MLP |
| Multi-PLI | 04/2021 | One-hot vectors | One-hot vectors | CNN blocks | CNN blocks | Fully connected layers |
| ML-DTI | 04/2021 | Protein sequences | SMILES string | CNN block (mutual learning) | CNN block (mutual learning) | Linear transformation layers |
| DEELIG | 06/2021 | Atomic level-structural information-sequences | Physical properties-fingerprints | CNN | Fully connected layers | Fully connected layers |
| GEFA | 07/2021 | Sequence embedding features | Graph representation | GCN | GCN | Linear layers |
| SAG-DTA | 08/2021 | Label encoding | Molecular graph | CNN | Graph convolutional layer-SAGPooling layer | Fully connected layers |
| Tanoori et al. | 08/2021 | SW sequence similarity | CS similarity | – | – | GBM |
| EmbedDTI | 11/2021 | Amino acids | Structural information | CNN | Attention-GCNs | Fully connected layers |
| DeepPLA | 12/2021 | Protein sequences (ProSE) | SMILES strings (Mol2Vec) | Head CNN modules-ResNet-based CNN module | Head CNN modules-ResNet-based CNN module | BiLSTM module-MLP module |
| DeepGLSTM | 01/2022 | Amino acids | Adjacency representation | BiLSTM | GCN | Fully connected layers |
| MGraphDTA | 01/2022 | Integers | Graph structure | Multiscale convolutional neural network | GNN | MLP |
| FusionDTA | 01/2022 | word embeddings | SMILES strings | BiLSTM | BiLSTM | Multi- |
| HoTS | 02/2022 | Protein sequences | Morgan/circular fingerprints | Transformer blocks | Transformer blocks | Fully connected layers |
| ELECTRA-DTA | 03/2022 | Protein sequences | SMILES string | Squeeze-and-excitation convolutional neural network blocks | Squeeze-and-excitation convolutional neural network blocks | Fully connected layers |
Note: “-” in the table indicates that there is no such information in the corresponding article.
Abbreviations: CNN – convolutional neural network; GNN – graph neural network; GCNs – graph convolutional networks; LSTM – long short-term memory; SG-CNNs – spatial graph neural networks; BiGRU – bidirectional gate recurrent unit; MLP – multilayer perceptron; GCN – graph convolutional network; SW – Smith-Waterman; CS – chemical structure; GBM – gradient boosting machine; BiLSTM – bidirectional long short-term memory;
URL addresses for the listed tools: SimBoost – ; ACNN – ; DeepDTA – ; DeepAffinity – ; GraphDTA – ; Taba – https://github.com/azevedolab/taba; SAnDReS – https://github.com/azevedolab/sandres; DeepCDA – ; Fusion – ; DeepGS – ; DeepDTAF – ; GanDTI – ; Multi-PLI – ; ML-DTI – ; DEELIG – ; GEFA – ; EmbedDTI – ; DeepPLA – ; DeepGLSTM – https://github.com/MLlab4CS/DeepGLSTM.git; MGraphDTA – https://github.com/guaguabujianle/MGraphDTA; FusionDTA – https://github.com/yuanweining/FusionDTA; HoTS – https:// github. com/ GIST- CSBL/ HoTS; ELECTRA-DTA – https://github.com/IILab-Resource/ELECTRA-DTA.