| Literature DB >> 27240346 |
Abstract
The Superposing Significant Interaction Rules (SSIR) method is described. It is a general combinatorial and symbolic procedure able to rank compounds belonging to combinatorial analogue series. The procedure generates structure-activity relationship (SAR) models and also serves as an inverse SAR tool. The method is fast and can deal with large databases. SSIR operates from statistical significances calculated from the available library of compounds and according to the previously attached molecular labels of interest or non-interest. The required symbolic codification allows dealing with almost any combinatorial data set, even in a confidential manner, if desired. The application example categorizes molecules as binding or non-binding, and consensus ranking SAR models are generated from training and two distinct cross-validation methods: leave-one-out and balanced leave-two-out (BL2O), the latter being suited for the treatment of binary properties.Entities:
Keywords: SAR; SSIR method; analogue series; balanced leave-two-out (BL2O) cross-validation; inverse SAR; ranking
Mesh:
Year: 2016 PMID: 27240346 PMCID: PMC4926361 DOI: 10.3390/ijms17060827
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Molecular substitution codifications. Note that each letter represents a distinct substituent depending on the substitution site.
| Code | R1 | R2 | R3 | R4 |
|---|---|---|---|---|
| A | 4-Methyl-1-cyclohexyl-methyl | |||
| B | Cyclohexylpropyl | |||
| C | Cyclohexylmethyl | |||
| D | Cyclopentylmethyl | |||
| E | Cycloheptylmethyl | |||
| F | Cyclobutylmethyl | |||
| G | 3-Methylpentyl | |||
| H | 2-Biphenyl-4-yl-ethyl | |||
| I | 4- | |||
| J | 2-(3-Methoxyphenyl)-ethyl | |||
| K | 2-(4-Isobutylphenyl)-propyl | |||
| L | ||||
| M | ||||
| N | 2-(4-Methoxyphenyl)-ethyl | |||
| O | 2-(4-Ethoxyphenyl)-ethyl | |||
| P | Phenethyl | |||
| Q | 3-(3,4-Dimethoxyphenyl)-propyl |
Codified analogues and original binding affinities (nM) for FPR1 (pKi1) and FPR2 (pKi2) properties. Values greater than 10,000 were set to pK = 4. Compounds declared of interest before application of the SSIR method are specified with asterisks.
| Item No. | Analogue | p | p |
|---|---|---|---|
| 1 | AAAA | 4.000 | 4.000 |
| 2 | AABB | 4.000 | 4.000 |
| 3 | AACA | 3.610 | 4.000 |
| 4 | ABAC | 4.000 | 4.000 |
| 5 | ABAA | 4.000 | 4.000 |
| 6 | ABBC | 4.000 | 4.000 |
| 7 | ABBB | 4.000 | 4.000 |
| 8 | ABCC | 4.000 | 4.000 |
| 9 | BBCA | 4.000 | 2.130 * |
| 10 | CBDD | 4.000 | 0.954 * |
| 11 | CBCE | 3.426 | 1.079 * |
| 12 | CBCF | 3.158 | 0.778 * |
| 13 | CCBG | 4.000 | 2.703 |
| 14 | CCBD | 4.000 | 2.262 * |
| 15 | CCBC | 4.000 | 1.839 * |
| 16 | CAAC | 2.877 | 1.000 * |
| 17 | CAAH | 2.550 * | 1.322 * |
| 18 | CABC | 3.527 | 0.778 * |
| 19 | CABH | 4.000 | 1.176 * |
| 20 | CACC | 2.978 | 0.699 * |
| 21 | CACH | 4.000 | 1.491 * |
| 22 | CDAC | 3.022 | 3.176 |
| 23 | CDAH | 2.519 * | 2.360 * |
| 24 | CDBC | 2.858 | 3.380 |
| 25 | CDBH | 1.663 * | 1.845 * |
| 26 | CDCC | 2.880 | 2.780 |
| 27 | CDCH | 2.446 * | 1.763 * |
| 28 | CBAC | 2.877 | 0.903 * |
| 29 | CBAH | 4.000 | 1.708 * |
| 30 | CBBC | 4.000 | 0.000 * |
| 31 | CBBH | 4.000 | 1.322 * |
| 32 | CBCC | 2.415 * | 0.000 * |
| 33 | CBCH | 2.822 | 1.041 * |
| 34 | BAAC | 2.585 * | 1.991 * |
| 35 | BAAH | 3.050 | 2.243 * |
| 36 | BABC | 2.639 | 2.021 * |
| 37 | BABH | 3.253 | 1.991 * |
| 38 | BACC | 3.126 | 2.212 * |
| 39 | BACH | 2.943 | 2.423 * |
| 40 | BDAC | 2.358 * | 4.000 |
| 41 | BDAH | 1.799 * | 2.772 |
| 42 | BDBC | 1.954 * | 4.000 |
| 43 | BDBH | 0.301 * | 2.210 * |
| 44 | BDCC | 2.985 | 3.778 |
| 45 | BDCH | 2.675 | 2.613 * |
| 46 | BBAC | 3.138 | 1.869 * |
| 47 | BBAH | 4.000 | 2.709 |
| 48 | BBBC | 3.366 | 1.519 * |
| 49 | BBBH | 4.000 | 2.648 |
| 50 | BBCC | 3.543 | 1.643 * |
| 51 | BBCH | 4.000 | 2.657 |
| 52 | DDBH | 0.301 * | 3.121 |
| 53 | EEEA | 3.472 | 4.000 |
| 54 | DEFA | 4.000 | 4.000 |
| 55 | DEGA | 4.000 | 4.000 |
| 56 | DDGA | 2.614 * | 3.930 |
| 57 | DEFC | 4.000 | 4.000 |
| 58 | DEFI | 3.266 | 4.000 |
| 59 | DDGF | 2.888 | 4.000 |
| 60 | DFGC | 3.368 | 4.000 |
| 61 | BEEC | 3.291 | 4.000 |
| 62 | BEEA | 3.349 | 4.000 |
| 63 | BEEH | 3.102 | 3.580 |
| 64 | DEEC | 3.740 | 4.000 |
| 65 | DEEA | 3.504 | 4.000 |
| 66 | DEEH | 3.177 | 4.000 |
| 67 | DEGH | 2.901 | 4.000 |
| 68 | BGEC | 2.941 | 4.000 |
| 69 | BGEA | 2.766 | 4.000 |
| 70 | BGEH | 2.083 * | 4.000 |
| 71 | BGGC | 2.748 | 4.000 |
| 72 | BGGA | 2.613 * | 4.000 |
| 73 | BGGH | 3.305 | 4.000 |
| 74 | BHEC | 3.788 | 3.513 |
| 75 | BHEA | 3.561 | 3.768 |
| 76 | BHEH | 2.822 | 4.000 |
| 77 | BHGC | 2.161 * | 4.000 |
| 78 | BHGA | 2.666 | 4.000 |
| 79 | BHGH | 3.054 | 4.000 |
| 80 | DGEC | 2.672 | 4.000 |
| 81 | DGEH | 1.716 * | 4.000 |
| 82 | DGGC | 2.574 * | 4.000 |
| 83 | DGGA | 2.336 * | 4.000 |
| 84 | DGGH | 2.775 | 4.000 |
| 85 | DHEC | 4.000 | 3.410 |
| 86 | DHEA | 3.226 | 4.000 |
| 87 | DHEH | 2.772 | 4.000 |
| 88 | DHGC | 2.238 * | 4.000 |
| 89 | DHGA | 2.708 | 4.000 |
| 90 | DHGH | 2.831 | 4.000 |
| 91 | EGEJ | 1.176 * | 4.000 |
| 92 | EGEK | 0.845 * | 4.000 |
| 93 | EGEH | 1.079 * | 4.000 |
| 94 | DGFJ | 1.924 * | 3.587 |
| 95 | DGEL | 1.301 * | 4.000 |
| 96 | DGEM | 1.568 * | 4.000 |
| 97 | DGEJ | 1.204 * | 4.000 |
| 98 | DGEN | 1.146 * | 4.000 |
| 99 | DGEO | 1.863 * | 4.000 |
| 100 | EGEP | 0.477 * | 4.000 |
| 101 | EGHQ | 2.691 | 4.000 |
| 102 | EGIP | 0.954 * | 4.000 |
| 103 | EGFP | 1.886 * | 4.000 |
| 104 | DDBA | 4.000 | 4.000 |
| 105 | DDBC | 1.447 * | 4.000 |
| 106 | BDBA | 4.000 | 4.000 |
a The 32 compounds of interest (Ki1 ≤ 411) are marked with an asterisk. b The 32 compounds of interest (Ki2 ≤ 410) are marked with an asterisk.
Figure 1Distribution of p-values for all the definable rules of order 4 (negations allowed) for (a) FPR1; and (b) FPR2 properties. Note the logarithmic scale in both axes.
Area under the receiver operating characteristic (AU-ROC) values for several calculations for properties FPR1 and FPR2. The threshold p value was set to 0.005 and negation terms were allowed in rules. The number of accepted rules along the loops is given in brackets. For the balanced leave-two-out (BL2O) cross-validation process, the number of well classified pairs, ties and bad pair rankings encountered along the cycles are indicated between slashes. See text for more details.
| Property | Rule Order | Overall Fit | L1O | BL2O |
|---|---|---|---|---|
| FPR1 | 1 | 0.768 (4) | 0.761 (6) | 0.607 (6) 1408/783/177 |
| 2 | 0.894 (117) | 0.792 (171) | 0.788 (174) 1917/96/355 | |
| 3 | 0.890 (960) | 0.802 (1379) | 0.777 (1433) 1909/2/457 | |
| FPR2 | 1 | 0.934 (16) | 0.933 (18) | 0.909 (18) 2106/199/63 |
| 2 | 0.958 (447) | 0.947 (478) | 0.948 (485) 2254/2/112 | |
| 3 | 0.967 (3428) | 0.950 (3756) | 0.947 (3811) 2253/0/115 |
Figure 2Receiver operating characteristic (ROC) curve and the area under it (AU-ROC) value for the FPR2 property calculated with the balanced leave-two-out (BL2O) cross-validation procedure (SSIR model involves rules of order 2, p = 0.005).
List of the 26 most significant rules (p < 10−5.5) of order 2 for the FPR1 property. The vertical bar stands for the negation operator. Each point stands for the X wildcard.
| Rule # | Vote | Rule | |||
|---|---|---|---|---|---|
| 1 | +1 | . | G | . | |C |
| 2 | +1 | |B | G | . | . |
| 3 | +1 | . | G | . | |Q |
| 4 | +1 | . | G | |H | . |
| 5 | −1 | . | |G | . | |K |
| 6 | +1 | . | G | . | |D |
| 7 | −1 | . | |G | |I | . |
| 8 | +1 | . | G | |C | . |
| 9 | +1 | . | G | |A | . |
| 10 | +1 | . | G | . | |F |
| 11 | +1 | . | G | . | |B |
| 12 | +1 | . | G | |D | . |
| 13 | +1 | . | G | |B | . |
| 14 | −1 | . | |G | . | |J |
| 15 | +1 | |A | G | . | . |
| 16 | −1 | . | |G | . | |P |
| 17 | +1 | |C | G | . | . |
| 18 | −1 | . | |G | . | |N |
| 19 | −1 | . | |G | . | |L |
| 20 | −1 | . | |G | . | |Q |
| 21 | +1 | . | G | . | |G |
| 22 | +1 | . | G | . | |I |
| 23 | −1 | . | |G | . | |M |
| 24 | +1 | . | G | . | |E |
| 25 | −1 | . | |G | |H | . |
| 26 | −1 | . | |G | . | |O |
List of the 31 most significant rules (p < 10−9.2) of order 2 for the FPR2 property. The vertical bar stands for the negation operator. The points stand for the X wildcard.
| Rule # | Vote | Rule | |||
|---|---|---|---|---|---|
| 1 | −1 | |C | |A | . | . |
| 2 | −1 | |C | . | |C | . |
| 3 | +1 | C | . | . | |G |
| 4 | +1 | |D | . | |E | . |
| 5 | +1 | |D | |G | . | . |
| 6 | +1 | C | |D | . | . |
| 7 | +1 | C | . | . | |M |
| 8 | +1 | C | . | . | |N |
| 9 | +1 | C | . | . | |O |
| 10 | +1 | C | |G | . | . |
| 11 | −1 | |C | . | . | |D |
| 12 | +1 | C | . | |E | . |
| 13 | +1 | C | . | . | |P |
| 14 | +1 | C | |H | . | . |
| 15 | −1 | |C | . | . | |E |
| 16 | +1 | C | . | |G | . |
| 17 | +1 | C | . | . | |A |
| 18 | +1 | C | . | . | |Q |
| 19 | −1 | |C | . | . | |G |
| 20 | +1 | C | . | |H | . |
| 21 | +1 | C | . | |F | . |
| 22 | +1 | C | . | . | |B |
| 23 | −1 | |C | |C | . | . |
| 24 | +1 | C | . | |I | . |
| 25 | +1 | C | . | . | |J |
| 26 | +1 | C | |F | . | . |
| 27 | +1 | C | . | . | |K |
| 28 | +1 | C | . | . | |I |
| 29 | +1 | C | . | . | |L |
| 30 | +1 | C | |E | . | . |
| 31 | −1 | |C | . | |D | . |
Figure 3Randomization test leave-one-out (L1O) predictions of AU-ROC values that could be obtained for the (a) FPR1; and (b) FPR2 properties after 1000 cycles. Horizontal axes (logarithmic units) show the number of rules entering in each SSIR model.
Figure 4Representation of the toy model of molecular scaffolding having three substitution sites that admit 2, 3 and 4 residues, respectively.