| Literature DB >> 34884555 |
Elena Rica1, Susana Álvarez1, Francesc Serratosa1.
Abstract
Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets-CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS-have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.Entities:
Keywords: extended reduced graph; graph edit distance; machine learning; molecular similarity; structure activity relationships; virtual screening
Mesh:
Substances:
Year: 2021 PMID: 34884555 PMCID: PMC8658044 DOI: 10.3390/ijms222312751
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Datasets used for the experiments. Each dataset on the left contains the targets on the right.
| Dataset | Used Targets |
|---|---|
| CAPST | CDK2, CHK1, PTP1B, UROKINASE |
| DUD-E | COX2, DHFR, EGFR, FGFR1, FXA, P38, PDGFRB, SRC, AA2AR |
| GLL&GDD | 5HT1A_Agonist, 5HT1A_Antagonist, 5HT1D_Agonist, 5HT1D_Antagonist, 5HT1F_Agonist, 5HT2A_Antagonist, 5HT2B_Antagonist, 5HT2C_Agonist, 5HT2C_Antagonist, 5HT4R_Agonist, 5HT4R_Antagonist, AA1R_Agonist, AA1R_Antagonist, AA2AR_Antagonist, AA2BR_Antagonist, ACM1_Agonist, ACM2_Antagonist, ACM3_Antagonist, ADA1A_Antagonist, ADA1B_Antagonist, ADA1D_Antagonist, ADA2A_Agonist, ADA2A_Antagonist, ADA2B_Agonist, ADA2B_Antagonist, ADA2C_Agonist, ADA2C_Antagonist, ADRB1_Agonist, ADRB1_Antagonist, ADRB2_Agonist, ADRB2_Antagonist, ADRB3_Agonist, ADRB3_Antagonist, AG2R_Antagonist, BKRB1_Antagonist, BKRB2_Antagonist, CCKAR_Antagonist, CLTR1_Antagonist, DRD1_Antagonist, DRD2_Agonist, DRD2_Antagonist, DRD3_Antagonist, DRD4_Antagonist, EDNRA_Antagonist, EDNRB_Antagonist, GASR_Antagonist, HRH2_Antagonist, HRH3_Antagonist, LSHR_Antagonist, LT4R1_Antagonist, LT4R2_Antagonist, MTR1A_Agonist, MTR1B_Agonist, MTR1L_Agonist, NK1R_Antagonist, NK2R_Antagonist, NK3R_Antagonist, OPRD_Agonist, OPRK_Agonist, OPRM_Agonist, OXYR_Antagonist, PE2R1_Antagonist, PE2R2_Antagonist, PE2R3_Antagonist, PE2R4_Antagonist, TA2R_Antagonist, V1AR_Antagonist, V1BR_Antagonist, V2R_Antagonist |
| MUV | 466, 548, 600, 644, 652, 689, 692, 712, 713, 733, 737, 810, 832, 846, 852, 858, 859 |
| NRLiSt_BDB | AR_Agonist, AR_Antagonist, ER_Alpha_Agonist, ER_Alpha_Antagonist, ER_Beta_Agonist, FXR_Alpha_Agonist, GR_Agonist, GR_Antagonist, LXR_Alpha_Agonist, LXR_Beta_Agonist, MR_Antagonist, PPAR_Alpha_Agonist, PPAR_Beta_Agonist, PPAR_Gamma_Agonist, PR_Agonist, PR_Antagonist, PXR_Agonist, RAR_Alpha_Agonist, RAR_Beta_Agonist, RAR_Gamma_Agonist, RXR_Alpha_Agonist, RXR_Alpha_Antagonist, RXR_Gamma_Agonist, VDR_Agonist |
| ULS-UDS | 5HT1F_Agonist, MTR1B_Agonist, OPRM_Agonist, PE2R3_Antagonist |
Figure 1Example of molecule reduction using ErG. The original molecule is on the top and its ErG representation is below. Elements of the same colour on the top are reduced to nodes on the ErG. R: Ring system, Ac: Acyclic components.
Node and edge attributes description in an ErG.
|
| |
|
|
|
|
| hydrogen-bond donor |
|
| hydrogen-bond acceptor |
|
| positive charge |
|
| negative charge |
|
| hydrophobic group |
|
| aromatic ring system |
|
| carbon link node |
|
| non-carbon link node |
|
| hydrogen-bond donor + hydrogen-bond acceptor |
|
| hydrogen-bond donor + positive charge |
|
| hydrogen-bond donor + negative charge |
|
| hydrogen-bond acceptor + positive charge |
|
| hydrogen-bond acceptor + negative charge |
|
| positive charge + negative charge |
|
| hydrogen-bond donor + hydrogen-bond acceptor + positive charge |
|
| |
|
|
|
| - | single bond |
| = | double bond |
| ≡ | triple bond |
Substitution, insertion and deletion costs for nodes proposed by Harper et al. [33].
|
| |||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0 |
|
| 2 |
|
|
| 3 |
|
| 1 | 2 | 2 | 2 |
|
|
|
| 0 | 2 | 2 |
|
|
| 3 |
|
| 2 |
| 1 | 2 |
|
|
|
| 2 | 0 | 2 | 2 | 2 |
| 3 | 2 |
| 2 | 1 | 2 | 1 |
|
|
| 2 | 2 | 2 | 0 | 2 | 2 |
| 3 |
| 2 | 1 | 2 | 1 | 1 | 2 |
|
|
|
| 2 | 2 | 0 |
|
| 3 |
|
| 2 | 2 | 2 | 2 | 2 |
|
|
|
| 2 | 2 |
| 0 |
| 3 |
|
| 2 | 2 | 2 | 2 |
|
|
|
|
|
|
|
|
| 0 | 3 | 2 |
| 2 | 2 | 2 | 2 |
|
|
| 3 | 3 | 3 | 3 | 3 | 3 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
|
|
|
| 2 |
|
|
| 2 | 3 | 0 |
| 2 | 2 | 2 | 2 |
|
|
|
|
|
| 2 |
|
|
| 3 |
| 0 | 2 | 2 | 2 | 2 | 2 |
|
| 1 | 2 | 2 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 0 | 2 | 2 | 2 | 2 |
|
| 2 |
| 1 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 0 | 2 | 2 | 2 |
|
| 2 | 1 | 2 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 0 | 2 | 2 |
|
| 2 | 2 | 1 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 0 | 2 |
|
|
|
|
| 2 | 2 |
|
| 3 |
| 2 | 2 | 2 | 2 | 2 | 0 |
|
| |||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
| 2 | 2 |
|
|
| 1 |
|
| 2 |
| 2 | 2 |
|
|
|
|
| 2 | 2 |
|
|
| 1 |
|
| 2 |
| 2 | 2 |
|
Substitution, insertion and deletion costs for edges proposed by Harper et al. [33].
|
| |||
|
| |||
|
|
|
| |
|
| 0 | 3 | 3 |
|
| 3 | 0 | 3 |
|
| 3 | 3 | 0 |
|
| |||
|
| |||
|
|
|
| |
| insert | 0 |
| 1 |
| delete | 0 |
| 1 |
Figure 2Transformation sequence from graph to graph .
Costs obtained in [35]. Each row corresponds to one of their experiments.
| Type of Cost | CAPST | DUD-E | GLL&GDD | MUV | NRLiSt_BDB | ULS-UDS | |
|---|---|---|---|---|---|---|---|
| C1 | Ins/Del [6] | 0.000002 | 0.005 | 0.014 | 0.490 | 0.012 | 0.115 |
| C2 | Subs [5] by [6] | 0.013 | 0.145 | 0.333 | 0.867 | 0.104 | 0.500 |
| C3 | Ins/Del [-] | 0.004 | 0.001 | 0.003 | 0.327 | 0.003 | 0.011 |
| C4 | Subs [-] by [=] | 0.017 | 0.186 | 0.206 | 1.005 | 0.024 | 0.607 |
Figure 3Classification of molecule . The true classes are in solid colours. is classified in the wrong class (blue), but the correct class is the red one. The distance between and is lower than the distance between and .
Figure 4Stripped molecules have been improperly classified using NN strategy. is the one that minimises being .
Accuracy (%) obtained in each dataset. In bold, the highest ones. The last column shows the mean accuracy.
| CAPST | DUD-E | GLL&GDD | MUV | NRLiSt_BDB | ULS-UDS | Mean | |
|---|---|---|---|---|---|---|---|
|
| 93.75 | 95.88 | 85.68 |
| 93.17 |
| 92.89 |
|
| 92.93 | 91.25 | 93.03 | 56.01 | 94.75 | 92.94 | 86.82 |
|
| 89.25 | 92.63 | 82.47 | 86.06 | 88.58 | 89.65 | 88.11 |
|
| 89.75 | 91.13 | 82.51 | 87.35 | 88.21 | 91.69 | 88.44 |
|
| 91.25 | 91.25 | 83.25 | 86.65 | 87.75 | 92.34 | 88.75 |
|
| 89.50 | 90.88 | 82.43 | 86.00 | 89.92 | 92.59 | 88.55 |
|
|
|
|
| 88.63 |
| 94.00 |
|
|
| |||||||
|
| 88.15 | 93.50 | 93.30 | 61.76 | 94.98 | 95.25 | 87.82 |
|
|
Figure 5Classification ratio in the test set over the 127 targets available in the six datasets. The horizontal axis represents the index of the targets presented in Table 1.
Figure 6Percentage of times that each set of costs returns the best classification ratio.
Substitution, insertion and deletion costs of nodes obtained with our method. In bold, the ones that are different from Table 3 and Table 4.
|
| |||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
| 0 |
|
| 2.00 |
|
|
| 3.00 |
|
| 1.00 | 2.00 | 2.00 | 2.00 |
|
|
|
| 0 | 2.00 | 2.00 |
|
|
| 3.00 |
|
| 2.00 |
| 1.00 | 2.00 |
|
|
|
| 2.00 | 0 | 2.00 | 2.00 | 2.00 |
| 3.00 | 2.00 |
| 2.00 | 1.00 | 2.00 | 1.00 |
|
|
| 2.00 | 2.00 | 2.00 | 0 | 2.00 | 2.00 |
| 3.00 |
| 2.00 | 1.00 | 2.00 | 1.00 | 1.00 | 2.00 |
|
|
|
| 2.00 | 2.00 | 0 |
|
| 3.00 |
|
| 2.00 | 2.00 | 2.00 | 2.00 | 2.00 |
|
|
|
| 2.00 | 2.00 |
| 0 |
| 3.00 |
|
| 2.00 | 2.00 | 2.00 | 2.00 |
|
|
|
|
|
|
|
|
| 0 | 3.00 | 2.00 |
| 2.00 | 2.00 | 2.00 | 2.00 |
|
|
| 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 0 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 |
|
|
|
| 2.00 |
|
|
| 2.00 | 3.00 | 0 |
| 2.00 | 2.00 | 2.00 | 2.00 |
|
|
|
|
|
| 2.00 |
|
|
| 3.00 |
| 0 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 |
|
| 1.00 | 2.00 | 2.00 | 1.00 | 2.00 | 2.00 | 2.00 | 3.00 | 2.00 | 2.00 | 0 | 2.00 | 2.00 | 2.00 | 2.00 |
|
| 2.00 |
| 1.00 | 2.00 | 2.00 | 2.00 | 2.00 | 3.00 | 2.00 | 2.00 | 2.00 | 0 | 2.00 | 2.00 | 2.00 |
|
| 2.00 | 1.00 | 2.00 | 1.00 | 2.00 | 2.00 | 2.00 | 3.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0 | 2.00 | 2.00 |
|
| 2.00 | 2.00 | 1.00 | 1.00 | 2.00 | 2.00 | 2.00 | 3.00 | 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0 | 2.00 |
|
|
|
|
| 2.00 | 2.00 |
|
| 3.00 |
| 2.00 | 2.00 | 2.00 | 2.00 | 2.00 | 0 |
|
| |||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
| 2.00 | 2.00 |
|
|
| 1.00 |
|
| 2.00 |
| 2.00 | 2.00 |
|
|
|
|
| 2.00 | 2.00 |
|
|
| 1.00 |
|
| 2.00 |
| 2.00 | 2.00 |
|
Substitution, insertion and deletion costs of edges obtained with our method.
|
| |||
|
| |||
|
|
|
| |
|
| 0 | 3.00 | 3.00 |
|
| 3.00 | 0 | 3.00 |
|
| 3.00 | 3.00 | 0 |
|
| |||
|
| |||
|
|
|
| |
| insert | 0 |
| 1.00 |
| delete | 0 |
| 1.00 |