| Literature DB >> 35539618 |
Chuleeporn Phanus-Umporn1, Watshara Shoombuatong1, Veda Prachayasittikul1, Nuttapat Anuwongcharoen1, Chanin Nantasenamat1.
Abstract
Sickle cell disease (SCD), an autosomal recessive genetic disorder, has been recognized by the World Health Organization (WHO) as a major public health problem as it affects 300 000 individuals worldwide. Complications arising from SCD include anemia, microvascular occlusion, severe pain, stokes, renal dysfunction and infections. A lucrative therapeutic strategy is to employ anti-sickling agents that can disrupt the formation of the HbS polymer. This study therefore employed cheminformatic approaches, encompassing classification structure-activity relationship (CSAR) modeling, to deduce the privileged substructures giving rise to the anti-sickling activity of an investigated set of 115 compounds, followed by substructure analysis. Briefly, the compiled compounds were described by fingerprint descriptors and used in the construction of CSAR models via several machine learning algorithms. The modelability of the data set, as exemplified by the MODI index, was determined to be in the range of 0.70-0.84. The predictive performance was deduced by the accuracy, sensitivity, specificity and Matthews correlation coefficient, which was found to be statistically robust, whereby the former three parameters afforded values in excess of 0.7 while the latter statistical parameter provided a value greater than 0.5. An analysis of the top 20 important substructure descriptors for anti-sickling activity revealed that 10 important features were significant in the differentiation of actives from inactives, as illustrated by aromaticity/conjugation (e.g. SubFPC287, SubFPC171 and SubFPC5), carbonyl groups (e.g. SubFPC137, SubFPC139, SubFPC49 and SubFPC135) and miscellaneous groups (e.g. SubFPC303, SubFPC302 and SubFPC275). Furthermore, an analysis of the structure-activity relationship revealed that the length of alkyl chains, choice of functional moiety and position of substitution on the benzene ring may affect the anti-sickling activity of these compounds. Thus, this knowledge is anticipated to be useful for guiding the design of robust compounds against the gelling activity of HbS, as preliminarily demonstrated in the data-driven compound design presented herein. This journal is © The Royal Society of Chemistry.Entities:
Year: 2018 PMID: 35539618 PMCID: PMC9078244 DOI: 10.1039/c7ra12079f
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1Cartoon illustration of the mechanism of action of an anti-sickling agent in the disruption of the HbS polymer.
Fig. 2Workflow of CSAR modeling for investigating anti-sickling activity.
Summary of 12 sets of fingerprint descriptors
| Fingerprint | Number | Descriptors | Ref. |
|---|---|---|---|
| CDK | 1024 | Fingerprint with a length of 1024 and a search depth of 8 |
|
| CDK extended | 1024 | Extends CDK with additional bits describing ring features |
|
| CDK graph only | 1024 | Special version of CDK that does not account for bond orders |
|
| E-state | 79 | Electrotopological state for the electronic and topological characterization of atoms |
|
| MACCS | 116 | Binary representation of the chemical substructure by MACCS keys |
|
| PubChem | 881 | Binary representation of the PubChem fingerprint |
|
| Substructure | 307 | Presence of SMARTS patterns for functional group classification |
|
| Substructure count | 307 | Count of SMARTS Patterns for functional group classification |
|
| Klekota–Roth | 4860 | Presence of chemical substructures that enrich biological activity |
|
| Klekota–Roth count | 4860 | Count of chemical substructures that enrich biological activity |
|
| 2D atom pairs | 780 | Presence of atom pairs at various topological distances |
|
| 2D atom pair count | 780 | Count of atom pairs at various topological distances |
|
Performance summary of CSAR models for predicting anti-sickling agents
| Descriptor class |
| Training set | 5-Fold CV set | External set | Decoy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ac | Sn | Sp | MCC | Ac | Sn | Sp | MCC | Ac | Sn | Sp | MCC | |||
| CDK | 885 | 99.81 ± 0.67 | 99.92 ± 0.56 | 99.72 ± 1.15 | 1.00 ± 0.01 | 79.25 ± 5.91 | 78.06 ± 6.29 | 81.08 ± 6.78 | 0.59 ± 0.12 | 81.06 ± 10.77 | 80.50 ± 12.84 | 84.26 ± 12.83 | 0.63 ± 0.22 | 87.00 ± 3.42 |
| CDK extended | 892 | 99.88 ± 0.49 | 99.88 ± 0.68 | 99.88 ± 0.68 | 1.00 ± 0.01 | 79.71 ± 5.57 | 79.48 ± 6.41 | 80.43 ± 6.11 | 0.60 ± 0.11 | 80.19 ± 9.36 | 80.75 ± 10.95 | 82.50 ± 11.69 | 0.62 ± 0.18 | 87.59 ± 3.41 |
| CDK graph only | 441 | 96.52 ± 2.48 | 95.93 ± 3.51 | 97.29 ± 2.58 | 0.93 ± 0.05 | 77.92 ± 5.37 | 77.08 ± 6.03 | 79.38 ± 6.10 | 0.56 ± 0.11 | 77.63 ± 11.39 | 77.80 ± 13.23 | 79.89 ± 12.93 | 0.56 ± 0.23 | 84.71 ± 2.99 |
| E-state | 18 | 90.69 ± 3.01 | 90.28 ± 4.74 | 91.56 ± 3.69 | 0.82 ± 0.06 | 80.44 ± 6.69 | 79.06 ± 7.84 | 82.64 ± 6.51 | 0.61 ± 0.13 | 82.13 ± 8.62 | 81.12 ± 10.91 | 86.25 ± 11.03 | 0.66 ± 0.17 | 84.90 ± 2.40 |
| MACCS | 103 | 97.23 ± 2.02 | 98.17 ± 2.50 | 96.53 ± 3.30 | 0.95 ± 0.04 | 77.31 ± 5.91 | 77.21 ± 6.49 | 77.84 ± 6.24 | 0.55 ± 0.12 | 79.19 ± 9.31 | 80.47 ± 11.31 | 80.32 ± 10.88 | 0.60 ± 0.19 | 85.77 ± 4.13 |
| PubChem | 299 | 97.10 ± 2.48 | 97.06 ± 3.31 | 97.30 ± 2.78 | 0.94 ± 0.05 | 79.63 ± 4.75 | 78.10 ± 5.70 | 81.84 ± 5.02 | 0.60 ± 0.09 | 78.75 ± 9.52 | 77.79 ± 11.44 | 83.01 ± 11.91 | 0.59 ± 0.19 | 84.79 ± 3.00 |
| Substructure | 38 | 92.75 ± 3.14 | 95.30 ± 4.18 | 90.78 ± 3.93 | 0.86 ± 0.06 | 80.96 ± 5.26 | 81.68 ± 6.10 | 80.79 ± 5.75 | 0.62 ± 0.10 | 81.56 ± 8.86 | 82.96 ± 11.35 | 82.81 ± 10.97 | 0.64 ± 0.18 | 88.13 ± 3.12 |
| Substructure count | 45 | 95.58 ± 2.80 | 98.52 ± 2.43 | 93.15 ± 4.02 | 0.91 ± 0.05 | 82.50 ± 5.05 | 83.57 ± 5.59 | 81.85 ± 5.53 | 0.65 ± 0.10 | 82.38 ± 8.99 | 84.82 ± 11.32 | 83.27 ± 11.29 | 0.66 ± 0.17 | 85.93 ± 3.29 |
| Klekota–Roth | 340 | 98.00 ± 1.95 | 98.80 ± 1.96 | 97.32 ± 2.86 | 0.96 ± 0.04 | 79.31 ± 6.00 | 79.54 ± 7.62 | 79.74 ± 5.66 | 0.59 ± 0.12 | 79.19 ± 9.44 | 81.84 ± 12.08 | 79.89 ± 11.67 | 0.60 ± 0.19 | 87.23 ± 3.22 |
| Klekota–Roth count | 366 | 98.88 ± 1.49 | 99.31 ± 1.52 | 98.53 ± 2.40 | 0.98 ± 0.03 | 78.33 ± 5.26 | 78.72 ± 6.47 | 78.63 ± 5.80 | 0.57 ± 0.11 | 78.81 ± 9.60 | 80.71 ± 13.37 | 80.88 ± 11.22 | 0.59 ± 0.19 | 87.72 ± 3.26 |
| 2D atom pairs | 133 | 94.58 ± 3.45 | 94.94 ± 4.51 | 94.46 ± 3.56 | 0.89 ± 0.07 | 79.04 ± 4.80 | 79.04 ± 5.83 | 79.43 ± 4.78 | 0.58 ± 0.10 | 78.69 ± 10.20 | 78.57 ± 11.45 | 81.47 ± 12.54 | 0.59 ± 0.20 | 84.00 ± 3.10 |
| 2D atom pair count | 167 | 99.19 ± 1.02 | 99.96 ± 0.40 | 98.48 ± 1.95 | 0.98 ± 0.02 | 78.00 ± 4.58 | 78.53 ± 5.72 | 77.89 ± 4.79 | 0.56 ± 0.09 | 77.63 ± 10.28 | 79.17 ± 12.19 | 78.96 ± 11.92 | 0.57 ± 0.20 | 88.00 ± 2.64 |
Fig. 3Chemical space of the anti-sickling agents. Actives and inactives are shown in red and green, respectively.
Fig. 4Box plot of the anti-sickling agents using Lipinski’s rule-of-five descriptors. Asterisks (*) denote significance at p ≤ 0.05.
Fig. 5The applicability domain as analyzed using the PCA bounding box approach.
Fig. 6Descriptor importance of the substructure count fingerprints ranked by the mean decrease of Gini index.
List of the top substructure fingerprints and their descriptions
| Ranking | Fingerprints | Description |
|---|---|---|
| 1 | SubFP287 | Conjugated double bond |
| 2 | SubFP171 | Aryl chloride |
| 3 | SubFP303 | Michael acceptor |
| 4 | SubFP5 | Alkene |
| 5 | SubFP1 | Primary carbon |
| 6 | SubFP300 | 1,3-Tautomerizable |
| 7 | SubFP307 | Chiral center specified |
| 8 | SubFP301 | 1,5-Tautomerizable |
| 9 | SubFP16 | Dialkylether |
| 10 | SubFP173 | Arylbromide |
| 11 | SubFP302 | Rotatable bond |
| 12 | SubFP137 | Vinylogous ester |
| 13 | SubFP139 | Vinylogous halide |
| 14 | SubFP49 | Ketone |
| 15 | SubFP295 | C ONS bond |
| 16 | SubFP18 | Alkylarylether |
| 17 | SubFP2 | Secondary carbon |
| 18 | SubFP275 | Heterocyclic |
| 19 | SubFP135 | Vinylogous carbonyl |
| 20 | SubFP274 | Aromatic |
Fig. 7Box plots of anti-sickling agents using importance substructure fingerprints. A single asterisk (*) denotes significance at p ≤ 0.05, double asterisks (**) denote significance at p ≤ 0.001 and triple asterisks (***) denote significance at p ≤ 0.0001.
Fig. 8Chemical structures of the representative compounds as described in Table 4 from the analysis of the structure–activity relationship. It should be noted that the chemical structures of all compounds are provided in the ESI, Fig. 1–6.†
Fig. 9Chemical structures of the designed compounds. Six template compounds (the top row of each box) representing four chemotypes (ethacrynic acid, benzyloxyacetic acid, phenoxyacetic acid and aromatic amide) served as chemical starting points for designing novel analogs (the bottom row of each box). Green circles represent the original moieties of the template compounds and pink circles represent the replacement moieties of the designed compounds.
| Influential substructures | Chemotypes | |||||
|---|---|---|---|---|---|---|
| Ethacrynic acid | Benzyloxyacetic acid | Phenoxyacetic acid | Aromatic amide | Proline | 2,2-Dimethylchroman | |
| Alkyl chain length | • Short alkyl chain > long alkyl chain 3a > 4a and 24a > 25a | • Long alkyl chain > short alkyl chain 7d > 5d | • Long alkyl chain > short alkyl chain 5e > 8e | • Long alkyl chain > short alkyl chain 1f > 2f > 3f | ||
| • Long alkyl chain > short alkyl chain | ||||||
| Functional moiety | • Cyclopentane > benzene 24a > 26a | • 2,3–Dihydrobenzo- furan > indane 21b > 20b | • Addition of benzene Central benzene: ↓activity 6c > 7c Peripheral benzene: ↑activity 16c > 15c | • Benzene > alkyl chain 1d > 5d | • Alkyl chain > benzene 1e > 2e | |
| • Presence of vinyl moiety: ↑activity 1a > 8a and 1a > 16a | • CH3NO2 moiety: ↓activity 8e > 4e | |||||
| • CH3S moiety: ↓activity 16a > 17a | • C6H5COO moiety: ↑activity 5e > 7e | |||||
| • C6H5COO moiety: ↓activity 6e > 9e | ||||||
| Substitutions on benzene | • Halogen atoms: Br > Cl > I 14b > 6b > 18b | • Halogen atoms: Cl > Br > I 6c > 12c > 11c | • Cl substitution: di-Cl > mono-Cl > without Cl 3d > 2d > 1d | |||
| • CH3 substitution: Mono-CH3 > di-CH38b > 9b Without CH3 > di-CH314b > 16b | • Cl substitution: di-Cl > mono-Cl 8c ≈ 9c ≈ 10c > 6c | • Halogen atoms: Cl > Br 2d > 4d | ||||
| • CH3 substitution: Di-CH3 > mono-CH33c ≈ 4c > 1c | ||||||
| • Nitrogen containing substitution: (CH3)2NH > NH2 > NO229b > 28b > 27b | ||||||
| 2-(Benzylthio)acetic acid | 2-(Phenylthio)acetic acid | |
|---|---|---|
| Substitutions on benzene | • Cl substitution: mono-Cl > tri-Cl 31b > 32b | • Br > NH219c > 20c |
| • Halogen atoms: Cl > Br 31b > 33b |