| Literature DB >> 32493194 |
Carlos Garcia-Hernandez1, Alberto Fernández1, Francesc Serratosa2.
Abstract
BACKGROUND: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem.Entities:
Keywords: Extended reduced graph; Graph edit distance; Machine learning; Molecular similarity; Structure-activity relationships; Virtual screening
Mesh:
Substances:
Year: 2020 PMID: 32493194 PMCID: PMC7536799 DOI: 10.2174/1568026620666200603122000
Source DB: PubMed Journal: Curr Top Med Chem ISSN: 1568-0266 Impact factor: 3.295
Fig. (1)Example of molecule reduction using ErG. The original molecule is at the top and its ErG representation is at the bottom. Ac: H-bond acceptor; Hf: hydrophobic group; Ar: aromatic ring system; +: positive charge. Colors are used to show how different parts of the original structure are reduced to nodes in the ErG. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (2)Comparison of two molecules comprising two steps. First, we extract the ErGs; second, we apply the GED. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (3)An edit path that transforms graph A into graph B.
Fig. (4)Objective function. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (5)Training evolution for target FXA in dataset DUD-E. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (6)Number of misclassifications using the test set over the 127 targets available in the six datasets combined. The scattered values on the left of the plot represent the number of classification errors (the lower the values, the better) using different colors and shapes depending on the edit costs used. Vertical segmented lines mark the limits between different datasets (from left to right: ULS-UDS, GLL&GDD, CAPST, DUD-E, NRLiSt_BDB, and MUV). The box-and-whisker plots on the right show the distribution of the resulting values. The boxes show the first and third quartiles, the line in the middle of the box is the median value (second quartile), and the whiskers extend from the boxes to show the range of the data (outliers are not included). (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (7)The number of misclassifications for all available targets in the LBVS benchmarking platform separated for each dataset and each experiment. The scattered values on the left of each subplot represent the number of classification errors (the lower the values, the better) using different colors and shapes depending on the edit costs used. Box-and-whisker plots on the right of each subplot show the distribution of the resulting values for each experiment. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Input data used for the experiments. The column entitled ‘Dataset’ contains the name of each dataset, and the column entitled ‘Targets used’ contains the name of the targets used during the experiments for each dataset. Note that in the result plots shown below, per-target points are arranged in the same order as they are in this table.
|
|
|
|---|---|
| ULS-UDS | 5HT1F_Agonist, MTR1B_Agonist, OPRM_Agonist, PE2R3_Antagonist |
| GLL&GDD | 5HT1A_Agonist, 5HT1A_Antagonist, 5HT1D_Agonist, 5HT1D_Antagonist, 5HT1F_Agonist, 5HT2A_Antagonist, 5HT2B_Antagonist, 5HT2C_Agonist, 5HT2C_Antagonist, 5HT4R_Agonist, 5HT4R_Antagonist, AA1R_Agonist, AA1R_Antagonist, AA2AR_Antagonist, AA2BR_Antagonist, ACM1_Agonist, ACM2_Antagonist, ACM3_Antagonist, ADA1A_Antagonist, ADA1B_Antagonist, ADA1D_Antagonist, ADA2A_Agonist, ADA2A_Antagonist, ADA2B_Agonist, ADA2B_Antagonist, ADA2C_Agonist, ADA2C_Antagonist, ADRB1_Agonist, ADRB1_Antagonist, ADRB2_Agonist, ADRB2_Antagonist, ADRB3_Agonist, ADRB3_Antagonist, AG2R_Antagonist, BKRB1_Antagonist, BKRB2_Antagonist, CCKAR_Antagonist, CLTR1_Antagonist, DRD1_Antagonist, DRD2_Agonist, DRD2_Antagonist, DRD3_Antagonist, DRD4_Antagonist, EDNRA_Antagonist, EDNRB_Antagonist, GASR_Antagonist, HRH2_Antagonist, HRH3_Antagonist, LSHR_Antagonist, LT4R1_Antagonist, LT4R2_Antagonist, MTR1A_Agonist, MTR1B_Agonist, MTR1L_Agonist, NK1R_Antagonist, NK2R_Antagonist, NK3R_Antagonist, OPRD_Agonist, OPRK_Agonist, OPRM_Agonist, OXYR_Antagonist, PE2R1_Antagonist, PE2R2_Antagonist, PE2R3_Antagonist, PE2R4_Antagonist, TA2R_Antagonist, V1AR_Antagonist, V1BR_Antagonist, V2R_Antagonist |
| CAPST | CDK2, CHK1, PTP1B, UROKINASE |
| DUD-E | COX2, DHFR, EGFR, FGFR1, FXA, P38, PDGFRB, SRC, AA2AR |
| NRLiSt_BDB | AR_Agonist, AR_Antagonist, ER_Alpha_Agonist, ER_Alpha_Antagonist, ER_Beta_Agonist, FXR_Alpha_Agonist, GR_Agonist, GR_Antagonist, LXR_Alpha_Agonist, LXR_Beta_Agonist, MR_Antagonist, PPAR_Alpha_Agonist, PPAR_Beta_Agonist, PPAR_Gamma_Agonist, PR_Agonist, PR_Antagonist, PXR_Agonist, RAR_Alpha_Agonist, RAR_Beta_Agonist, RAR_Gamma_Agonist, RXR_Alpha_Agonist, RXR_Alpha_Antagonist, RXR_Gamma_Agonist, VDR_Agonist |
| MUV | 466, 548, 600, 644, 652, 689, 692, 712, 713, 733, 737, 810, 832, 846, 852, 858, 859 |
Description of the node and edge attributes that compose an ErG.
|
| |
|---|---|
|
|
|
| [0] | Hydrogen-bond donor |
| [ | Hydrogen-bond acceptor |
| [ | Positive charge |
| [ | Negative charge |
| [ | Hydrophobic group |
| [ | Aromatic ring system |
| [ | Carbon link node |
| [ | Non-carbon link node |
| [0, 1] | Hydrogen-bond donor + hydrogen-bond acceptor |
| [0, 2] | Hydrogen-bond donor + positive charge |
| [0, 3] | Hydrogen-bond donor + negative charge |
| [ | Hydrogen-bond acceptor + positive charge |
| [ | Hydrogen-bond acceptor + negative charge |
| [ | Positive charge + negative charge |
| [0, 1, 2] | Hydrogen-bond donor + hydrogen-bond acceptor + positive charge |
| - | Single bond |
| = | Double bond |
| ≡ | Triple bond |
Substitution, insertion and deletion costs for nodes, as proposed by Harper et al. [38].
|
| |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| 0 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | |
| 2 | 0 | 2 | 2 | 2 | 2 | 2 | 3 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | |
| 2 | 2 | 0 | 2 | 2 | 2 | 2 | 3 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | |
| 2 | 2 | 2 | 0 | 2 | 2 | 2 | 3 | 2 | 2 | 1 | 2 | 1 | 1 | 2 | |
| 2 | 2 | 2 | 2 | 0 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| 2 | 2 | 2 | 2 | 2 | 0 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| 2 | 2 | 2 | 2 | 2 | 2 | 0 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| 3 | 3 | 3 | 3 | 3 | 3 | 3 | 0 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | |
| 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | |
| 1 | 2 | 1 | 2 | 2 | 2 | 2 | 3 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | |
| 1 | 2 | 2 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 0 | 2 | 2 | 2 | 2 | |
| 2 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 0 | 2 | 2 | 2 | |
| 2 | 1 | 2 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 0 | 2 | 2 | |
| 2 | 2 | 1 | 1 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 0 | 2 | |
| 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | |
| 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
Substitution, insertion and deletion costs for edges, as proposed by Harper et al. [38].
|
| |||
|---|---|---|---|
| - | = | ≡ | |
| - | 0 | 3 | 3 |
| = | 3 | 0 | 3 |
| ≡ | 3 | 3 | 0 |
| - | = | ≡ | |
| 0 | 1 | 1 | |
| 0 | 1 | 1 | |
Harper’s costs and learned values obtained per experiment.
|
|
|
|
| |
|---|---|---|---|---|
| 1 | 2 | 0 | 3 | |
| 0.000 | 0.013 | 0.004 | 0.017 | |
| 0.005 | 0.145 | 0.001 | 0.186 | |
| 0.014 | 0.333 | 0.003 | 0.206 | |
| 0.490 | 0.867 | 0.327 | 1.005 | |
| 0.012 | 0.104 | 0.003 | 0.024 | |
| 0.115 | 0.500 | 0.011 | 0.607 |