| Literature DB >> 35409140 |
Xun Wang1,2, Jiali Liu1, Chaogang Zhang1, Shudong Wang1.
Abstract
Identifying compound-protein (drug-target, DTI) interactions (CPI) accurately is a key step in drug discovery. Including virtual screening and drug reuse, it can significantly reduce the time it takes to identify drug candidates and provide patients with timely and effective treatment. Recently, more and more researchers have developed CPI's deep learning model, including feature representation of a 2D molecular graph of a compound using a graph convolutional neural network, but this method loses much important information about the compound. In this paper, we propose a novel three-channel deep learning framework, named SSGraphCPI, for CPI prediction, which is composed of recurrent neural networks with an attentional mechanism and graph convolutional neural network. In our model, the characteristics of compounds are extracted from 1D SMILES string and 2D molecular graph. Using both the 1D SMILES string sequence and the 2D molecular graph can provide both sequential and structural features for CPI predictions. Additionally, we select the 1D CNN module to learn the hidden data patterns in the sequence to mine deeper information. Our model is much more suitable for collecting more effective information of compounds. Experimental results show that our method achieves significant performances with RMSE (Root Mean Square Error) = 2.24 and R2 (degree of linear fitting of the model) = 0.039 on the GPCR (G Protein-Coupled Receptors) dataset, and with RMSE = 2.64 and R2 = 0.018 on the GPCR dataset RMSE, which preforms better than some classical deep learning models, including RNN/GCNN-CNN, GCNNet and GATNet.Entities:
Keywords: IC50 value; compound properties; compound-protein interactions; deep learning; protein preperties
Mesh:
Substances:
Year: 2022 PMID: 35409140 PMCID: PMC8998983 DOI: 10.3390/ijms23073780
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
RMS errors for different models on the Test dataset.
| Models | Representation of Compound | Representation of Protein | RMS Error |
|
|---|---|---|---|---|
| GRU/GCNN-CNN | molecular graph (GCNN) | SPS sequence | 1.62 | 0.28 |
| GCNNet | molecular graph (GCNN) + SMILES | Amino acid sequence | 0.95 | 0.51 |
| GATNet | molecular graph (GCNN + Attention) + SMILES | Amino acid sequence | 0.91 | 0.57 |
| SSGraphCPI | molecular graph (GCNN + Attention) + SMILES | SPS sequence |
|
|
| SSGraphCPI2 | molecular graph (GCNN + Attention) + SMILES | SPS sequence + Amino acid sequence |
|
|
RMS errors for different models on ER dataset.
| Models | Representation of Compound | Representation of Protein | RMS Error |
|
|---|---|---|---|---|
| GRU/GCNN-CNN | molecular graph (GCNN) | SPS sequence | 2.33 | 0.036 |
| GCNNet | molecular graph (GCNN) + SMILES | Amino acid sequence | 2.29 | 0.048 |
| GATNet | molecular graph (GCNN + Attention) + SMILES | Amino acid sequence | 2.14 | 0.044 |
| SSGraphCPI | molecular graph (GCNN + Attention) + SMILES | SPS sequence |
|
|
| SSGraphCPI2 | molecular graph (GCNN + Attention) + SMILES | SPS sequence + Amino acid sequence | 2.12 | 0.045 |
RMS errors for different models on Channel dataset.
| Models | Representation of Compound | Representation of Protein | RMS Error |
|
|---|---|---|---|---|
| GRU/GCNN-CNN | molecular graph (GCNN) | SPS sequence | 2.62 | 0.019 |
| GCNNet | molecular graph (GCNN) + SMILES | Amino acid sequence | 2.17 | 0.045 |
| GATNet | molecular graph (GCNN + Attention) + SMILES | Amino acid sequence | 2.26 | 0.039 |
| SSGraphCPI | molecular graph (GCNN + Attention) + SMILES | SPS sequence |
|
|
| SSGraphCPI2 | molecular graph (GCNN + Attention) + SMILES | SPS sequence + Amino acid sequence | 2.23 | 0.041 |
RMS errors of different models on the GPCR dataset.
| Models | Representation of Compound | Representation of Protein | RMS Error |
|
|---|---|---|---|---|
| GRU/GCNN-CNN | molecular graph (GCNN) | SPS sequence | 2.44 | 0.026 |
| GCNNet | molecular graph (GCNN) + SMILES | Amino acid sequence | 2.45 | 0.026 |
| GATNet | molecular graph (GCNN + Attention) + SMILES | Amino acid sequence | 2.37 | 0.035 |
| SSGraphCPI | molecular graph (GCNN + Attention) + SMILES | SPS sequence |
|
|
| SSGraphCPI2 | molecular graph (GCNN + Attention) + SMILES | SPS sequence + Amino acid sequence |
|
|
RMS errors of different models on the Kinase dataset.
| Models | Representation of Compound | Representation of Protein | RMS Error |
|
|---|---|---|---|---|
| GRU/GCNN-CNN | molecular graph (GCNN) | SPS sequence | 2.98 | 0.011 |
| GCNNet | molecular graph (GCNN) + SMILES | Amino acid sequence | 2.76 | 0.014 |
| GATNet | molecular graph (GCNN + Attention) + SMILES | Amino acid sequence | 2.73 | 0.015 |
| SSGraphCPI | molecular graph (GCNN + Attention) + SMILES | SPS sequence |
|
|
| SSGraphCPI2 | molecular graph (GCNN + Attention) + SMILES | SPS sequence + Amino acid sequence |
|
|
Figure 1Loss value variation diagram of different models.
The first 30 compounds known to interact with EGF proteins.
| id | BindingDB_id | PubChemCID | Compound Molecular Formula |
|---|---|---|---|
| 1 | BDBM4343 | 736236 | C11H9NO3 |
|
|
|
|
|
| 3 | BDBM4279 | 37583 | C11H5N3 |
| 4 | BDBM4377 | 836 | C9H11NO4 |
| 5 | BDBM4320 | 746495 | C12H10N2O3 |
| 6 | BDBM4348 | 720879 | C11H9NO4 |
| 7 | BDBM4381 | 228618 | C10H8N4O |
| 8 | BDBM4325 | 5614 | C18H22N2O |
| 9 | BDBM4383 | 54212223 | C10H7N3O |
| 10 | BDBM4012 | 5328588 | C38H38N4O2S2 |
| 11 | BDBM4013 | 5328589 | C17H16N2OS |
| 12 | BDBM3320 | 5328066 | C15H16N6 |
| 13 | BDBM3972 | 5328551 | C22H20N2O4S2 |
|
|
|
|
|
|
|
|
|
|
| 16 | BDBM3991 | 5328570 | C20H14N4S2 |
| 17 | BDBM4041 | 5328617 | C30H24N6O2S2 |
| 18 | BDBM4014 | 5328590 | C11H12N2OS |
| 19 | BDBM4407 | 5328824 | C16H15N3 |
| 20 | BDBM3956 | 5328535 | C10H9NO2S |
| 21 | BDBM3356 | 5328102 | C20H18N4O2Se2 |
| 22 | BDBM3971 | 5328550 | C20H16N2O4S2 |
| 23 | BDBM3333 | 5328079 | C14H12N6O2 |
| 24 | BDBM3258 | 5328015 | C16H15N3O |
| 25 | BDBM3338 | 5328084 | C15H12F3N5 |
| 26 | BDBM3275 | 5328024 | C15H12N4O2 |
| 27 | BDBM3340 | 5328086 | C15H12F3N5 |
| 28 | BDBM4003 | 5328580 | C38H34N4O6S2 |
| 29 | BDBM4405 | 720610 | C17H17N3 |
| 30 | BDBM3964 | 5328543 | C12H13NO2S |
Figure 2A 2D molecular diagram of the top three compounds.
Figure 3A 3D molecular diagram of the top three compounds.
Figure 4Docking diagram of C34H30N4O2S2 molecule and EGF receptor protein.
Representation of the protein SPS sequence.
| Secondary Structure | Solvent Exposure | Property | Length | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Alpha | Beta | Coil | Not Exposed | Exposed | Non-polar | Polar | Acidic | Basic | Short | Medium | Long |
| A | B | C | N | E | G | T | D | K | S | M | L |
Figure 5The overall flow chart of the SSGraphCPI model.
Figure 6Network diagram of the protein feature vectors extraction.