| Literature DB >> 33430974 |
J Jesús Naveja1,2, B Angélica Pilón-Jiménez3, Jürgen Bajorath4, José L Medina-Franco5.
Abstract
Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure-activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied "single molecule-single scaffold" correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure-property relationships, and identification of analog series and ASBS. The molecule-core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure-property relationships analyses.Entities:
Keywords: Analog searching; Analog series-based scaffold; Core structure–property relationships (CSPR); RECAP; Scaffold; Virtual screening
Year: 2019 PMID: 33430974 PMCID: PMC6760108 DOI: 10.1186/s13321-019-0380-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Two scaffolds definitions are applied to two exemplary molecules (olanzapine and albendazole). a Bemis–Murcko scaffold; b putative cores
Comparison of the Bemis–Murcko scaffold and the core framework proposed in this work
| Feature | Bemis–Murcko scaffold | Core framework |
|---|---|---|
| Number of cores per molecule | 0 or 1 | 1 or more |
| Rings can be substituents | No | Yes |
| Considers retrosynthesis rules | No | Yes |
| The core is a major component of the molecule | Yes/no | Yes |
Fig. 2Construction of a core–molecule network for an exemplary dataset. Each molecule is connected to all of its putative cores. Thus, series can be formed if at least two molecules share a core. Note that not all molecules in a series need be pairwise analogs of each other, but a sequence of analogs must exist. For this example, only putative cores mapping to more than a single molecule are included
Fig. 3Algorithm steps for the generation of core–molecule associations
Core and Bemis–Murcko scaffold overlap of NuBBEDB vs BIOFACQUIM databases
| Measurement | BIOFACQUIM | NuBBEDB | Both | |
|---|---|---|---|---|
| Unique molecules intraDB | 399 | 2018 | 2417 | |
| Unique molecules interDB | 344 | 1963 | 2362 (55 shared) | |
| Cores | Cores intraDB | 1356 | 15,758 | 17,114 |
| Unique cores intraDB | 1153 | 11,738 | 12,289 | |
| Unique cores interDB | 1047 | 11,632 | 12,785 (106 shared) | |
| Bemis–Murcko scaffolds | Scaffolds intraDB | 396 | 1921 | 2317 |
| Unique scaffolds intraDB | 176 | 754 | 930 | |
| Unique scaffolds interDB | 127 | 705 | 881 (49 shared) |
Fig. 4Exemplary overlapping cores and scaffolds from two datasets. a For any overlapping core, an analog series can be found with the core itself as its ASBS; b This is not necessarily the case for overlapping Bemis–Murcko scaffolds
Fig. 5Core structure–activity relationship visualization for the largest series in a dataset of Akt2 inhibitors. a Molecule–core bipartite network. Molecules are shown as small red dots, while cores are represented as larger dots and colored by the median of the pIC50 of the molecules represented by them. b Core network obtained from the molecules-cores bipartite network. Nodes are putative cores and edges are drawn between nodes that share at least one compound in the dataset; c final CSAR visualization. Redundant cores were omitted and chemical structures were added to the core’s network