| Literature DB >> 24887178 |
Francois Berenger1, Arnout Voet1, Xiao Yin Lee1, Kam Yj Zhang1.
Abstract
BACKGROUND: Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule.Entities:
Keywords: ACPC; Cross-correlation; Ligand-based virtual screening; Linear binning; Partial charges; RTI molecular descriptor; Spatial auto-correlation
Year: 2014 PMID: 24887178 PMCID: PMC4030740 DOI: 10.1186/1758-2946-6-23
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Overview of themethod. It encodes a 3D molecule into a rotation-translation invariant molecular descriptor based on partial charges and inter-atomic distances. The descriptor consists of two vectors separating the positive from the negative autocorrelation values. These vectors are discretized by linear-binning with discretization step dx in order to obtain the final pair of vectors (LBAC+ and LBAC-).
Figure 2Autocorrelogram. In blue and red impulses: positive and negative values of the autocorrelation (AC±) of partial charges for the first ligand of the comt target. The ligand is depicted in 2D at the top right. The linearly binned versions (LBAC±) of the positive and negative parts of the autocorrelogram are the cyan and pink lines.
Effect of the parameter
| 0.005 | 0.73 | 0.75 |
| 0.01 | 0.69 | 0.71 |
| 0.05 | 0.68 | 0.70 |
| 0.1 | 0.67 | 0.70 |
| 0.5 | 0.66 | 0.66 |
Test protocol: 20 targets and five queries per target were randomly chosen on “1conf”.
Effect of the force field used to assign partial charges (OB stands for Open Babel)
| MOE’s MMFF94x | 0.77 | 0.84 |
| OB’s Gasteiger | 0.75 | 0.77 |
| OB’s MMFF94 | 0.72 | 0.75 |
| OB’s QEQ | 0.70 | 0.72 |
| OB’s QTPIE | 0.69 | 0.74 |
Test protocol: 20 targets and five queries per target were randomly chosen on “1conf”.
Effect of the sign_split function
| Average (AUCs) | 0.77 | 0.76 |
| Median (AUCs) | 0.83 | 0.75 |
Test protocol: 20 targets and five queries per target were randomly chosen on “1conf”.
Figure 31st, 2nd and 3rd quartile plots for, MACCS, Pharao and Shape-it on all queries of the “1conf” dataset. For clarity, targets are sorted on the x axis by decreasing median value obtained by ACPC. The red dotted horizontal line at AUC = 0.5 indicates random performance.
Median AUCs on the “1conf” dataset
| ace (49/1753) | 0.47 | 0.55 | 0.48 | |
| ache (106/3711) | 0.56 | 0.67 | 0.63 | |
| ada (37/844) | 0.65 | 0.62 | ||
| alr2 (23/939) | 0.61 | 0.61 | 0.63 | |
| ampc (21/767) | 0.88 | 0.87 | 0.81 | |
| ar (74/2709) | 0.73 | 0.66 | 0.72 | |
| cdk2 (58/1866) | 0.62 | 0.55 | 0.60 | |
| comt (10/425) | 0.82 | 0.85 | 0.40 | |
| cox1 (25/885) | 0.29 | 0.42 | 0.46 | |
| cox2 (411/12281) | 0.92 | 0.85 | 0.90 | |
| dhfr (405/7418) | 0.87 | 0.95 | 0.62 | |
| egfr (458/14449) | 0.92 | 0.83 | 0.54 | |
| er+ (67/2387) | 0.76 | 0.77 | ||
| er- (39/1330) | 0.59 | 0.70 | 0.67 | |
| fgfr1 (120/4305) | 0.84 | 0.81 | 0.53 | |
| fxa (146/4969) | 0.79 | 0.85 | 0.72 | |
| gart (31/845) | 0.86 | 0.98 | 0.68 | |
| gpb (50/2072) | 0.95 | 0.89 | 0.62 | |
| gr (78/2803) | 0.75 | 0.77 | 0.69 | |
| hivpr (57/1797) | 0.70 | 0.72 | 0.60 | |
| hivrt (41/1433) | 0.51 | 0.47 | 0.65 | |
| hmga (35/1362) | 0.90 | 0.74 | 0.81 | |
| hsp90 (25/918) | 0.62 | 0.77 | 0.76 | |
| inha (79/3131) | 0.60 | 0.59 | 0.51 | |
| mr (14/561) | 0.77 | 0.71 | ||
| na (49/1826) | 0.68 | 0.91 | 0.59 | |
| p38 (366/8722) | 0.55 | 0.72 | 0.66 | |
| parp (35/1296) | 0.92 | 0.90 | 0.60 | |
| pde5 (76/1955) | 0.66 | 0.64 | 0.60 | |
| pdgfrb (169/5560) | 0.50 | 0.59 | 0.57 | |
| pnp (30/962) | 0.93 | 0.87 | 0.59 | |
| ppar | 0.91 | 0.88 | 0.83 | |
| pr (27/989) | 0.76 | 0.73 | 0.57 | |
| rxr | 0.85 | 0.99 | 0.62 | |
| sahh (32/1250) | 0.74 | 0.96 | 0.78 | |
| src (159/5904) | 0.80 | 0.74 | 0.53 | |
| thrombin (67/2308) | 0.79 | 0.77 | 0.69 | |
| tk (22/860) | 0.90 | 0.91 | 0.70 | |
| trypsin (46/1565) | 0.88 | 0.78 | 0.73 | |
| vegfr2 (77/2701) | 0.66 | 0.53 | 0.50 | |
| Average | 0.81 | 0.77 | 0.79 | 0.65 |
| Median | 0.86 | 0.77 | 0.82 | 0.63 |
| |Best method| | 20 | 7 | 13 | 3 |
For each of the 40 targets, each ligand in the ligands list was used in turn as the query. On each line, the maximum value is underlined and in bold font. L = number of ligands; D = number of decoys.
Figure 4Cumulative distribution functions for Shape-it, Pharao, MACCS and on all queries of the “1conf” dataset.
Average AUCs on the “25conf” dataset
| ace (1140/43653) | 0.49 | 0.48 | 0.45 | 0.58 | 0.68 | 0.57 | 0.34 | 0.44 | 0.60 | 0.62 | 0.61 | 0.57 | 0.49 | 0.60 | ||||
| ache (2450/89138) | 0.80 | 0.80 | 0.80 | 0.69 | 0.72 | 0.69 | 0.48 | 0.28 | 0.81 | 0.66 | 0.75 | 0.60 | 0.74 | |||||
| ada (876/18697) | 0.82 | 0.81 | 0.78 | 0.66 | 0.80 | 0.64 | 0.71 | 0.42 | 0.81 | 0.75 | 0.74 | 0.84 | 0.82 | 0.85 | ||||
| alr2 (348/16063) | 0.60 | 0.59 | 0.47 | 0.42 | 0.52 | 0.35 | 0.33 | 0.54 | 0.66 | 0.64 | 0.32 | 0.24 | 0.29 | 0.44 | ||||
| ampc (507/15069) | 0.84 | 0.83 | 0.89 | 0.86 | 0.88 | 0.71 | 0.82 | 0.59 | 0.87 | 0.87 | 0.75 | 0.83 | 0.84 | 0.75 | ||||
| ar (363/42557) | 0.75 | 0.73 | 0.72 | 0.67 | 0.80 | 0.75 | 0.68 | 0.44 | 0.71 | 0.76 | 0.71 | 0.66 | 0.80 | 0.67 | ||||
| cdk2 (1138/43682) | 0.88 | 0.88 | 0.62 | 0.49 | 0.54 | 0.42 | 0.50 | 0.66 | 0.68 | 0.67 | 0.34 | 0.41 | 0.34 | 0.46 | ||||
| comt (124/7148) | 0.83 | 0.81 | 0.80 | 0.77 | 0.65 | 0.38 | 0.34 | 0.76 | 0.62 | 0.75 | 0.43 | 0.46 | 0.46 | 0.53 | ||||
| cox1 (422/15672) | 0.30 | 0.29 | 0.27 | 0.55 | 0.51 | 0.46 | 0.47 | 0.38 | 0.50 | 0.40 | 0.48 | 0.49 | 0.61 | 0.50 | ||||
| cox2 (6483/270311) | 0.94 | 0.94 | 0.46 | 0.15 | 0.16 | 0.62 | 0.52 | 0.22 | 0.26 | 0.33 | 0.49 | 0.50 | 0.65 | 0.74 | ||||
| dhfr (9550/166944) | 1.00 | 1.00 | 0.94 | 0.96 | 0.56 | 0.63 | 0.46 | 0.98 | 0.93 | 0.98 | 0.99 | 0.98 | 0.91 | |||||
| egfr (9964/337283) | 0.91 | 0.89 | 0.90 | 0.78 | 0.57 | 0.49 | 0.25 | 0.89 | 0.69 | 0.89 | 0.50 | 0.66 | 0.46 | 0.53 | ||||
| er+ (289/39507) | 0.92 | 0.91 | 0.94 | 0.81 | 0.95 | 0.63 | 0.26 | 0.38 | 0.76 | 0.89 | 0.68 | |||||||
| er- (975/32961) | 0.62 | 0.60 | 0.71 | 0.75 | 0.89 | 0.66 | 0.29 | 0.40 | 0.83 | 0.84 | 0.69 | 0.92 | 0.92 | |||||
| fgfr1 (2360/106612) | 0.93 | 0.94 | 0.54 | 0.55 | 0.63 | 0.61 | 0.41 | 0.79 | 0.59 | 0.49 | 0.37 | 0.49 | 0.36 | 0.50 | ||||
| fxa (3647/123379) | 0.89 | 0.89 | 0.61 | 0.48 | 0.40 | 0.39 | 0.42 | 0.42 | 0.40 | 0.44 | 0.24 | 0.42 | 0.29 | 0.30 | ||||
| gart (775/20938) | 1.00 | 1.00 | 0.94 | 0.98 | 0.69 | 0.89 | 0.89 | 0.97 | 0.97 | 0.98 | 0.98 | 0.98 | 0.96 | 0.98 | ||||
| gpb (845/44604) | 0.97 | 0.96 | 0.23 | 0.68 | 0.42 | 0.24 | 0.29 | 0.34 | 0.16 | 0.32 | 0.18 | 0.20 | 0.17 | 0.16 | ||||
| gr (553/56086) | 0.73 | 0.72 | 0.75 | 0.75 | 0.55 | 0.50 | 0.48 | 0.79 | 0.82 | 0.85 | 0.64 | 0.67 | 0.62 | 0.60 | ||||
| hivpr (1404/44909) | 0.72 | 0.72 | 0.79 | 0.73 | 0.47 | 0.74 | 0.63 | 0.56 | 0.44 | 0.42 | 0.53 | 0.92 | 0.90 | 0.92 | ||||
| hivrt (822/32688) | 0.86 | 0.86 | 0.47 | 0.43 | 0.28 | 0.55 | 0.47 | 0.47 | 0.57 | 0.51 | 0.31 | 0.39 | 0.39 | 0.34 | ||||
| hmga (814/33684) | 0.89 | 0.89 | 0.87 | 0.83 | 0.94 | 0.85 | 0.57 | 0.53 | 0.94 | 0.93 | 0.94 | 0.98 | 0.97 | 0.98 | ||||
| hsp90 (572/20983) | 0.58 | 0.56 | 0.67 | 0.81 | 0.70 | 0.58 | 0.84 | 0.59 | 0.89 | 0.79 | 0.52 | 0.66 | 0.58 | 0.61 | ||||
| inha (1782/71064) | 0.58 | 0.58 | 0.59 | 0.48 | 0.48 | 0.24 | 0.30 | 0.33 | 0.35 | 0.36 | 0.41 | 0.66 | 0.49 | 0.71 | ||||
| mr (79/10177) | 0.73 | 0.70 | 0.54 | 0.18 | 0.69 | 0.72 | 0.60 | 0.75 | 0.65 | 0.48 | 0.49 | 0.43 | 0.46 | 0.58 | ||||
| na (987/44278) | 0.92 | 0.92 | 0.79 | 0.91 | 0.39 | 0.37 | 0.33 | 0.87 | 0.79 | 0.84 | 0.72 | 0.88 | 0.74 | 0.72 | ||||
| p38 (7014/186992) | 0.54 | 0.51 | 0.57 | 0.76 | 0.69 | 0.62 | 0.38 | 0.87 | 0.82 | 0.88 | 0.48 | 0.60 | 0.38 | 0.59 | ||||
| parp (245/12871) | 0.97 | 0.97 | 0.93 | 0.93 | 0.73 | 0.55 | 0.53 | 0.97 | 0.93 | 0.96 | 0.83 | 0.89 | 0.74 | 0.89 | ||||
| pde5 (1751/48431) | 0.66 | 0.65 | 0.67 | 0.68 | 0.69 | 0.39 | 0.44 | 0.67 | 0.62 | 0.67 | 0.31 | 0.35 | 0.36 | 0.34 | ||||
| pdgfrb (3206/134591) | 0.51 | 0.52 | 0.49 | 0.46 | 0.49 | 0.61 | 0.57 | 0.49 | 0.44 | 0.42 | 0.34 | 0.46 | 0.36 | 0.50 | ||||
| pnp (560/16694) | 0.99 | 0.98 | 0.95 | 0.90 | 0.63 | 0.36 | 0.52 | 0.90 | 0.76 | 0.91 | 0.82 | 0.86 | 0.87 | 0.91 | ||||
| ppar | 0.91 | 0.90 | 0.88 | 0.89 | 0.89 | 0.70 | 0.21 | 0.25 | 0.92 | 0.92 | 0.86 | 0.76 | 0.86 | |||||
| pr (178/15966) | 0.70 | 0.70 | 0.37 | 0.37 | 0.48 | 0.67 | 0.30 | 0.19 | 0.73 | 0.56 | 0.61 | 0.66 | 0.64 | 0.63 | ||||
| rxr | 0.77 | 0.77 | 0.86 | 0.99 | 0.70 | 0.53 | 0.49 | 0.95 | 0.97 | 0.99 | 0.94 | 0.95 | 0.99 | 0.90 | ||||
| sahh (586/25622) | 0.98 | 0.98 | 0.96 | 0.57 | 0.71 | 0.78 | 0.65 | 0.96 | 0.93 | 0.95 | 0.89 | 0.95 | 0.90 | 0.96 | ||||
| src (2945/145751) | 0.90 | 0.90 | 0.51 | 0.47 | 0.65 | 0.69 | 0.53 | 0.60 | 0.43 | 0.36 | 0.30 | 0.41 | 0.26 | 0.40 | ||||
| thrombin (1576/57564) | 0.87 | 0.88 | 0.74 | 0.51 | 0.65 | 0.54 | 0.58 | 0.65 | 0.48 | 0.36 | 0.71 | 0.68 | 0.62 | 0.66 | ||||
| tk (379/15017) | 0.98 | 0.98 | 0.84 | 0.92 | 0.54 | 0.47 | 0.37 | 0.89 | 0.86 | 0.87 | 0.86 | 0.86 | 0.85 | 0.89 | ||||
| trypsin (1128/39065) | 0.90 | 0.90 | 0.64 | 0.29 | 0.45 | 0.46 | 0.28 | 0.30 | 0.29 | 0.41 | 0.72 | 0.80 | 0.69 | 0.81 | ||||
| vegfr2 (1604/66098) | 0.79 | 0.78 | 0.47 | 0.42 | 0.44 | 0.32 | 0.24 | 0.57 | 0.54 | 0.48 | 0.37 | 0.41 | 0.33 | 0.38 | ||||
| Average | 0.80 | 0.79 | 0.78 | 0.68 | 0.70 | 0.59 | 0.51 | 0.44 | 0.72 | 0.68 | 0.68 | 0.62 | 0.67 | 0.62 | 0.67 | |||
| Median | 0.85 | 0.84 | 0.81 | 0.71 | 0.74 | 0.63 | 0.49 | 0.44 | 0.76 | 0.72 | 0.69 | 0.62 | 0.66 | 0.62 | 0.67 | |||
| |Best method| | N/A | N/A | 17 | 5 | 3 | 0 | 1 | 0 | 3 | 3 | 2 | 4 | 4 | 2 | 3 |
For each of the 40 targets, the query was the last ligand in the ligands list of each target. For each target, the maximum AUC reached is underlined and in bold font. L = number of ligands; D = number of decoys. The ACPC1 and ACPC25 columns were computed and added for comparison. ACPC1 (resp. ACPC25) shows the average AUC reached when using all active ligands as queries for ACPC on “1conf” (resp. “25conf”). Their |best method | cells were not filled in since they concern different experiments than other columns.
Figure 5Average processing speed of , Pharao, Open Babel (OB) and Shape-it on the target with the most ligands and decoys (egfr).
Average number of distinct clusters found for active molecules among the top N ranked molecules for cox2 and egfr in “1conf” using 10 random queries
| 10 | 2.3 | 2.2 | 2.8 | 2.3 | 0.9 | 2.4 | 1.4 | 2.2 |
| 20 | 2.7 | 3.3 | 3.9 | 3.5 | 1.1 | 3.0 | 2.4 | 3.1 |
| 30 | 3.1 | 4.4 | 4.5 | 4.6 | 1.3 | 4.4 | 2.9 | 3.6 |
| 40 | 3.4 | 4.9 | 5.1 | 5.4 | 1.5 | 5.2 | 3.5 | 3.9 |
| 50 | 3.5 | 5.8 | 5.4 | 6.3 | 2.0 | 5.8 | 4.7 | 4.1 |