Literature DB >> 35730680

Rings in Clinical Trials and Drugs: Present and Future.

Jonathan Shearer¹, Jose L Castro¹, Alastair D G Lawson¹, Malcolm MacCoss², Richard D Taylor¹.

Abstract

We present a comprehensive analysis of all ring systems (both heterocyclic and nonheterocyclic) in clinical trial compounds and FDA-approved drugs. We show 67% of small molecules in clinical trials comprise only ring systems found in marketed drugs, which mirrors previously published findings for newly approved drugs. We also show there are approximately 450 000 unique ring systems derived from 2.24 billion molecules currently available in synthesized chemical space, and molecules in clinical trials utilize only 0.1% of this available pool. Moreover, there are fewer ring systems in drugs compared with those in clinical trials, but this is balanced by the drug ring systems being reused more often. Furthermore, systematic changes of up to two atoms on existing drug and clinical trial ring systems give a set of 3902 future clinical trial ring systems, which are predicted to cover approximately 50% of the novel ring systems entering clinical trials.

Entities: Chemical

Year: 2022 PMID： 35730680 PMCID： PMC9289879 DOI： 10.1021/acs.jmedchem.2c00473

Source DB: PubMed Journal: J Med Chem ISSN： 0022-2623 Impact factor: 8.039

Introduction

Drug-like chemical space is a phrase ubiquitous in drug discovery. It is of fundamental importance in small molecule drug discovery and impacts all stages of the design cycle from screening library design, through to reagent selection, hit to lead, and lead optimization. However, for such an extensively used description, which impacts most decisions in drug discovery, there is no accepted gold standard measure to delineate “drug-like chemical space” which is universally applicable and unambiguously encompasses, without exception, any molecule that is drug-like. The term drug-like is fraught with ambiguity; for example, it could mean (i) an exact substructure found in a drug, (ii) a closely related substructure, (iii) a closely related full molecule structure, or (iv) a molecule that has similar properties to known drugs. How to calculate what is close or similar can be achieved using a plethora of computational techniques ranging from 1-dimensional (1D) properties such as calculated logP (clogP), polar surface area (PSA), and hydrogen-bonding groups (H–B donors or acceptors) to 2-dimensional (2D) metrics such as fingerprint similarity, the presence or absence of functional groups, up to 3-dimensional (3D) metrics such as shape or molecular electrostatic potentials.[1,2] A further complication of drug likeness is the nonbinary nature of the description where we consider the drug likeness not to be true or false but defined on a continuum with a relative probability.[3] Many early attempts at estimating drug-like space use weighted combinations of whole molecule properties, often referred to as 1D properties, such as molecular weight or polar surface area.[4−6] In parallel, 2D and 3D descriptors have been used to identify molecules that are in drug-like space.[7,8] Similar analysis to drug-like space has been applied to molecules in clinical trials showing in some cases that molecules in clinical trials have significantly different property space compared with molecules that have transitioned successfully to a marketed drug.[9]

Size of Drug-Like Space

The success of combining 1D properties is in part due to the ease and speed of calculations, enabling rapid assessment of libraries of molecules, and these approaches clearly had a significant impact on the field of drug discovery. Notable examples are the Lipinski ubiquitous “Rule of 5” (Ro5),[10] the work of Veber[11] based around PSA, and many others including the “GSK 4/400” for lead-like molecules and “Rule of 3” for fragment molecules.[12−14] However, there are widely accepted drawbacks, namely, the problem of successful drugs that lie outside of these models, and as a result many seasoned practitioners in small molecule drug discovery would typically find utility of these methods as a probabilistic guide rather than a binary cutoff. Another less widely discussed difficulty centers around the size of drug-like chemical space. Even using guides such as the Ro5 the enormity of drug-like space within these guides is still problematic. This is summarized in Figure , which highlights the size of chemical space for drug discovery along with related data to give context to the data size. Moreover, we will show an estimate of currently available chemical ring systems using our previously described ring system definitions[15,16] is approximately 5 × 105 ring systems (excluding macrocycles), which is also included in Figure .

Figure 1

Summary of predicted size of chemical space and key comparisons. Some images were sourced from third parties: “Stars in universe”, permission given by NASA;[17] “Grains of sand on Earth”, permission given by FreeImages;[18] “FDA Approved Drugs”, permission given by FreeImages;[19] “Miles to Alpha Centauri”, permission given by European Southern Observatory (ESO), Davide De Martin.[20] Although we have the capabilities to experimentally screen multimillion compound libraries and virtual screens of billions of molecules have been reported,[21] the size of the predicted chemical space is still beyond the reach for routine screening and even virtual enumeration. Some examples of current state of the art libraries include Enamine Real Space[22] (2 × 1010), GalaXi/Wuxi LabNetwork (2 × 109), and GDB 17 (2 × 1011).[23] The obvious question then is how can we reduce this size further with a measure that is unambiguous, clearly defined, and easy to calculate to identify pockets of useful molecules in a space that is larger than the number of stars in the universe.[24] As previously described,[15] we chose substructure analysis based around ring systems and frameworks rather than simple whole molecule properties or similarity metrics based on 2D or 3D descriptors to explore drug-like chemical space.

History of Scaffold Analysis

Ring systems are highly influential in determining the shape, electrostatics, and often bioactivity of compounds. The concept of such “privileged” or bioactive scaffolds has been widely explored in the drug discovery field.[25−28] There are many ways to define a molecular scaffold, but arguably, the most widely used way would be the Bemis–Murcko (BM) scaffold.[29] The BM scaffold is obtained by removing all terminal acyclic groups from the ring systems and frameworks. A further simplification of the BM scaffold can be obtained by ignoring atom types and bond orders in the graph to give the cyclic skeleton (CSK). The CSK allows for a more basic comparison of the underlying shape of each scaffold. BM and CSK scaffolds have been shown to be powerful tools for analyzing the diversity within any compound collection. In the original paper by Bemis and Murcko it was found that one-half of all drug molecules could be represented with 32 frameworks. A more recent paper by Lipkus et al.[30] defined an indicator of the innovation present in a new drug by comparing their scaffolds to those already in existing drugs. In this work, they defined an innovative drug, or “Pioneer”, by whether both the scaffold and the molecular shape had not been observed in a previous drug molecule. Their analysis was carried out on all approved drugs over the last 80 years, and they showed that the percentage of new scaffolds combined with molecular shape has increased over time, where the scaffolds and shapes are defined using their reduced representations. A key use of scaffold generation has been to identify bioactive ring systems or frameworks. Techniques such as scaffold trees,[31,32] hierarchical scaffold clustering,[33] and match molecular pairs analysis[34] (MMPAs) are commonly used to identify structure–activity relationships (SAR) across a compound collection and thus identify potential scaffold hops.[35] Other work has combined the use of quantitative structure–activity relationship (QSAR) models[36,37] of full molecules and a molecule generator to identify bioactive scaffolds. Visini et al.[38] generated a database of all possible rings (1–4 rings, <30 atoms), resulting in around 1 million virtual ring systems, 98.6% of which had not been observed in any publicly available compound collections (ZINC,[39] PubChem,[40] ChEMBL,[41] and Reaxys[42]). It should be noted that these rings were not filtered by whether they were drug-like or synthetically feasible; thus, it is not clear how many of these rings are biologically relevant. An alternative analysis by Pitt et al.[43] estimated that there could be over 3000 ring systems that have not been reported in the literature but are synthetically tractable. Another study enumerated all possible fused rings up to 3 rings to give around 570 000 virtual ring systems.[44] These 570 000 ring systems were then cross referenced with those available in the Synthetically Accessible Virtual Inventory[45] (SAVI, 1.75 billion molecules) to identify 39 036 ring systems.[46] In this study, data from ChEMBL was used to identify bioactive ring systems that were selective against certain target classes or generally bioactive. The chemical space of these ring systems was visualized by applying PCA on key scaffold descriptors; the scaffold descriptors are described in previous work.[47] The resulting analysis showed that bioactive scaffolds were spread across chemical space but with local regions of high density. It was hypothesized that the regions of high density could be used to identify future bioactive scaffolds. The identification of such bioactive islands in chemical space has been explored in the literature for both scaffolds[44,46] and drug-like compounds.[23,48] Analysis of kinase inhibitors by Zhao and Caflisch[48] demonstrated that a huge fraction of synthesizable kinase-relevant chemical space has been completely unexplored. By identifying biologically relevant areas of chemical space, the goal is to reduce the effective chemical space for hit discovery and increase hit quality while still generating novel chemical matter. We previously[15,16] chose ring systems and frameworks to analyze drugs and drug-like space based on the seminal work of Murcko et al.,[29] whereby we performed both an exhaustive and a recursive breakdown of molecules and systematically analyzed both the individual components and combinations of scaffolds. It was found that each year 70% of drugs are comprised of only ring systems found in previously marketed drugs. Most of the remaining drugs contain only one newly utilized ring system that has not been seen in marketed drugs. This observation has held true year on year for the last 30 years. This gives rise to the question facing seasoned practitioners of drug discovery: how much chemical novelty is required for a patentable and effective drug, and how is this novelty achieved? Since 70% of new drugs coming onto the market each year only contain ring systems from previously patented drug molecules then most new drugs achieve novel patent space through either the utilization of new growth vectors and/or novel combinations of growth vectors or simply novel combinations of drug ring systems. This suggests novel ring systems are not a prerequisite for new patent positions or to tackle new drug targets since we have also shown previously that known drug rings have been applied across different therapeutic targets and therapeutic areas.[15] In this work, we address the question of novelty and how it applies to new candidates by studying molecules before they make it as drugs, i.e., those in different clinical phases.

Clinical Trial Ring Systems

Assessing compounds that are earlier in the drug discovery cycle that have not yet made it to market and are currently in clinical trials can give a further insight into the successful design of future drug molecules. This is the focus of this study, whereby we analyzed the chemical novelty in clinical trials using an extended methodology that we previously applied to drug molecules to answer the following questions. Is the amount of molecular novelty (where novelty is assessed by new chemical ring systems) in drugs reflected in clinical trials or is there an attrition in clinical trials? Is the amount of novelty different across the different clinical phases? How important are new chemical ring systems to justify clinical investment and overall clinical success, and how do we use these data? Can we use the novelty from clinical trials to predict future drug ring systems and prioritize new areas of available chemical space?

Fragmentation Methodology and Classifications

To answer these questions, we used the same methodology as described in previous work[15,16] to deconstruct molecules, whereby the fragmentation algorithm recursively breaks each molecule into ring systems and frameworks with exocyclic double bonds retained while recording the growth vectors of each ring system from the frameworks (see Figure ). We applied this algorithm to a snapshot of molecules in Phase 1, 2, and 3 clinical trials from January 2020 and our updated drug data set reported in the US FDA Orange Book up to January 2020. This gave us an updated ring system from drugs and/or clinical trials and associated frequencies and growth vectors. The fragmentation workflow was implemented using a combination of RDKit[49] and Pipeline Pilot.[50]

Figure 2

Example of rings, ring systems, and frameworks for Chlorthalidone.

Example of rings, ring systems, and frameworks for Chlorthalidone. We filtered the sets so that they contain less than 10 bonds in a ring, no metal-containing molecules, and a molecular weight less than 1000. The molecular weight cutoff was chosen to capture larger small molecules that fall outside of Ro5 while removing excessively large molecules. Previous studies[51,52] have shown that applying a hard cutoff on weight in line with Ro5 can lead to the loss of a number oral drugs or clinical candidates for difficult target classes that may contain innovative scaffolds of interest to our analysis. We also ensured that the molecules in the different phases must not be present in higher phases or drugs. Using the sets of different ring systems from molecules in clinical trials and drug molecules, we propose a new classification system based on the ring systems (or scaffolds) present in a molecule. This classification is to give a simple and clear description of a molecule to show the degree of chemical novelty and the importance of reuse of existing scaffolds vs new scaffolds. This classification is given in Table , and as a point of clarification, a new ring system is one that has not been used previously in any other drug that has made it to market. For this analysis, we have not focused on molecules that do not contain any ring systems (Class 5), which typically accounts for less than 10% of drugs.

Table 1

Molecule Classification System Based on Historical Context of Ring Systems

molecule class	description of ring systems contained in the molecule
Class 1	only known drug ring systems, combined in a novel way
Class 2	known drug ring systems combined with only 1 new ring system
Class 3	known drug ring systems combined with more than 1 new ring system
Class 4a	only one ring system in the whole molecule, and that ring system is new
Class 4b	only new ring systems where the total number of ring systems is more than 1
Class 5	no ring systems present

Using the classification from Table , our previous work showed 70% of drugs each year come under the Class 1 molecules; the remaining drugs are formed predominantly from Class 2 molecules. Even with a significant representation of Class 2, Class 3, and Class 4 in commercially available molecules and literature-reported molecules, Class 1 molecules are still the most successful and dominant class in marketed drugs and have been, year on year, for the last 30 years. The benefit of Class 1 is that we can map out the space and cleanly define this data set, which we have previously enumerated, combined, and analyzed.[16] Class 2, Class 3, and Class 4 can be estimated, but full and complete enumeration is a significant computational undertaking. Using these classifications, we analyzed clinical trial compounds to see whether this distribution is the same for molecules in U.S. clinical trials.

Classification of Clinical Trial Molecules

The analysis of ring systems present in molecules in the different phases of clinical trials (for January 2020) is shown in Table . It can be seen from this analysis that most molecules still fall in the Class 1 bracket with an average of 67% of the compounds across all phases. This mirrors the 70% of Class 1 molecules in drugs. Although there is a slight increase in the different clinical phases, we do not believe this is significant. In terms of novelty in clinical trials, where novelty is assessed by the ring systems and the associated ratio of new vs old, it seems there is little difference between molecules that have made it as a drug and those in clinical trials.

Table 2

Classification of Molecules in Different Clinical Phases (January 2020)

clinical trials status	no. of compounds	Class 1(only drug ring systems)	Class 2(drug ring systems and 1 new ring system)	Class 3(drug ring systems and 2+ new ring systems)	Class 4a(single nondrug ring systems)	Class 4b(only 2+ nondrug ring systems)
Phase 1	277	178 (64%)	86 (31%)	4 (1%)	9 (3%)	0 (0%)
Phase 2	525	359 (68%)	123 (23%)	10 (2%)	32 (6%)	1 (<1%)
Phase 3	232	159 (69%)	44 (19%)	8 (3%)	17 (7%)	4 (2%)
all phases	1034	696 (67%)	253 (24%)	22 (2%)	58 (6%)	5 (<1%)

It is an interesting observation that the percentage of molecules containing just drug rings (Class 1) is broadly the same across different clinical trials, and this mirrors new drugs coming onto the market each year. One might have expected there to be more novelty in clinical trials, and this novelty may be reduced through the clinical trials, but this is not the case when novelty is defined by the ring system chemistry. A conclusion from these observations is that using just drug rings and combining them in novel ways is the principal strategy employed historically to generate most compounds for both new drugs and molecules that make it into clinical trials. Moreover, since on average it takes 9 years for a molecule to pass through clinical trials[53] to make it to market, this data set will encompass all new drug ring systems for the next 9 years.

Analysis of Ring Systems in Different Clinical Phases

We have demonstrated a classification of molecules based on ring systems and whether those ring systems have been used previously in drugs. To extend this analysis, we have taken the updated list of drugs and clinical trial molecules and analyzed the complete database of ring systems that are present in these molecules (see Table ). There are 378 ring systems used in drugs and 450 unique ring systems in clinical trials; 280 (62%) of these clinical trial ring systems have not been used in drugs before. This gives a total of 658 unique ring systems covering all clinical trial molecules and drugs. The fact that there are more ring systems in clinical trials than in drugs demonstrates that there is still a significant investment in new ring systems in drug discovery. However, what is clear is how they are assembled in real molecules is of equal importance when assessing novelty, and new ring systems are typically partnered with known drug ring systems for both clinical trial compounds and drugs. The top 100 ring systems in drugs and top 100 new ring systems in clinical trials are shown in Tables and 5, respectively. The complete lists, ordered by frequency, are available as a pdf download and smiles download from Zenodo (10.5281/zenodo.6556751).

Table 3

Classification of Ring Systems in Different Clinical Phases and in Drugs

status	total no. of unique ring systems	new ring systems not in higher phases nor in drugs	ring systems from drugs	ring systems present in higher phases
Phase 1	191	71 (37%)	101 (53%)	19 (10%)
Phase 2	278	131 (47%)	128 (46%)	19 (7%)
Phase 3	169	78 (46%)	91 (54%)
all phases	450	280 (62%)	170 (38%)
drugs	378

Table 4

Top 100 Most Frequently Used Ring Systems from Small Molecule Drugs Listed in the FDA Orange Book before January 2020 Sorted by Descending Frequency (f) and Then Ascending Molecular Weight

Table 5

Top 100 Most Frequently Used Ring Systems in U.S. Clinical Trial Compounds in January 2020 That Were Not Present in Drugs Sorted by Descending Frequency (f) and Then Ascending Molecular Weight

Table shows that the sets of ring systems in Phase 2 and Phase 3 have similar distributions between the new ring systems and the ring systems seen in drugs or higher clinical phases. Phase 1 is slightly lower, which could be accounted for by not all structures in Phase 1 being available, and typically structures are not released until Phase 2 or higher. From the pool of ring systems derived from clinical trial molecules, there are more new ring systems being used than ring systems from drugs (62% compared with 38%, respectively). However, the drug ring systems are used in multiple molecules and typically have a higher frequency in clinical trial molecules than the new ring systems. Thus, in the final molecules the drug ring systems dominate through reuse even though it is a smaller pool of ring systems, indicating how important this set is. If 170 drug ring systems are used in clinical trials, this means that 98 drug ring systems are currently not being utilized in clinical trial molecules. From this analysis, several questions arise. Is there a difference in those drug ring systems that are being reused and those that are not? Can we learn anything from those drug ring systems that have not found utility in current clinical trials? Moreover, is there a systematic difference between the properties of ring systems in drugs and those newer ring systems only found in clinical trials?

Comparison of Properties for Novel Clinical Trial Rings Compared with Drug Rings

To answer the questions regarding the differences between ring systems in clinical trials and drugs we calculated the distributions of ring sizes, the number of nitrogens, oxygens, sulfurs, and combined heteroatoms, as well as the number of sp3 centers. Many of these 1d properties are presented as percentages to allow for the comparison of ring systems with disparate sizes. For this analysis, we separated the ring systems into the following categories: ring systems from drugs, ring systems only in Phase 3, Phase 2, or Phase 1 (and not in higher phases or drugs), all ring systems combined from all clinical phases (but not in drugs), ring systems just from drugs that are not being used in clinical trials. The property distributions in Figure highlight that compounds from clinical trials and drugs tend to have quite similar 1d property distributions. We can use these distribution plots when predicting future ring systems to filter out ring systems where the heteroatom ratio is significantly outside of the typical distribution plots for drugs and clinical trials. For example, it is unlikely for a ring system to contain more than 20% sulfur atoms. Likewise, there is no bias toward the different percentages of sp3 centers. Furthermore, the most prevalent ring system size for both drugs and clinical trials is bicycles.

Figure 3

Histogram comparisons of ring systems in drugs, clinical trials, and those exclusively in drugs that are not currently in clinical trials for (a) number of rings per ring system, (b) percentage of heteroatoms per ring system, (c) percentage of nitrogens per ring system, (d) percentage of oxygens per ring system, (e) percentage of sulfurs per ring system, and (f) percentage of sp3 centers per ring system.

Future of Ring Systems in Drugs and Clinical Trials

Using our database of clinical trials and drug ring systems we can make a prediction of future clinical trial ring systems and possible new ring systems that will make it as drugs. By simple visual inspection it became apparent that many of the ring systems in clinical trials are very small changes on known drug rings, which could reflect the constraints of biological space and how we choose to effectively navigate chemical patent space or for synthetic tractability reasons. We explored this observation systematically by first restricting the ring systems to a maximum of five rings per ring system. We then assessed the relationship between the drug ring systems and new ring systems in clinical trials. Our systematic approach changes the drug rings by no more than two atoms (N + 2) where a change is a single atom substitution, for example, C to N, or the addition or removal of a new exocyclic double bond. We made these changes for all drug ring systems and then compared the N + 2 changes to see how many of the ring systems in clinical trials are covered by this simple change on the drug set. We also implemented valence bond checks and substructure-based stability filters on these sets. The results for this analysis are given in Table , where overall 36% of the new ring systems in clinical trials are single atom changes on ring systems in drugs. However, more importantly, approximately one-half of the new ring systems in clinical trials (47%) are at most two atom changes on ring systems in existing drugs. We can use this information to predict the ring systems that will make it into future drugs by applying the same two atom changes to all ring systems from drugs and clinical trials. From this we derived a set of future clinical trial ring systems. This means that from the 30% of drugs that are Class 2 or above (i.e., contain at least one novel ring system), approximately one-half of the molecules could be predicted to have the novel ring systems from our future clinical trial set. However, to ensure that these molecules are reasonable, we wanted to compare these ring systems to those that have been synthesized or reported in the literature, and so we required a full database of ring systems for the currently available chemical space.

Table 6

Analysis of New Ring Systems and Overlap with Drug Ring Systems after Applying One or Two Atom Changesa

status	no. of new ring systems	single atom change overlap with drug ring systems	two atom changes overlap with drug ring systems
Phase 1	71	31 (44%)	35 (49%)
Phase 2	129	36 (28%)	55 (43%)
Phase 3	76	31 (41%)	40 (53%)
all phases	276	98 (36%)	130 (47%)

For ring systems containing less than 6 rings.

Available Chemical Scaffold Space: RINGO Database

To fully understand the magnitude of chemical space associated with ring systems (or chemical scaffolds), we created a database of ring systems from real compounds that are either commercially available and/or synthesized and reported in the literature or patents. Our internal RINGO database of all available ring systems uses ring systems from a data set of approximately 2.24 billion unique molecules taken from commercial, literature, and academic sources including ChEMBL,[41] eMolecules,[54] Enamine Real,[22] SureChEMBL,[55] etc. We have not included the virtual databases from GDB[23,56] (Enamine Real is included as each compound has an associated synthetic route and corresponding reagents). The full molecules are first preprocessed and charges and tautomers are calculated for all 2.24 billion molecules using the tautomer and protonation plugins within Chemaxon.[57] This is an important step since the fragmentation rules for molecules require sp3 and sp2 centers to be correctly defined, for example, keto vs enol forms. These molecules are then recursively fragmented into the individual ring systems retaining the growth vectors from the original molecule and the frequency for each ring system from the 2.24 billion molecules. From this computation we derived our RINGO database of 458 748 ring systems which covers the known chemical ring space. It is worth noting that 167 668 of the ring systems in RINGO have only been recorded in one compound across our public and commercial sources. Over 80% of these singletons were predominately from compounds in the public database PubChem or the patent database SureChEMBL. Ring systems with high frequencies in RINGO are likely to be synthetically tractable, and while the reverse statement will not always be true, ring system frequency is another factor by which we can filter scaffold space. We then cross referenced the future clinical trial ring systems against our internal RINGO database of over 458 748 ring systems. This allows us to check whether these ring systems of two atom changes have ever been included in a synthesized molecule and are reported in the data set over 100 times. We also used our previous analysis of the content of drug rings to reduce this set further by applying cut offs for the maximum number of nitrogens, oxygens, and sulfurs in drug ring systems. This gives a “future clinical trials” set of 3902 ring systems out of a possible 458 748. We therefore reduced the set of ring systems to a focused set of around 0.85% of the available chemical ring systems. This can be compared with the drug ring systems which are a privileged set of approximately 0.082% of the chemical scaffolds available. We can thus predict 85% of all new drugs (70% of drugs being Class 1 and one-half of the remaining 30% of Class 2 and above) will come from a combination of ring systems from drugs (378), clinical trials (280), and future clinical trials (3902), which is approximately 1% of the currently reported ring systems. This analysis thus has practical application to library design and is summarized in Figure .

Figure 4

Summary of molecular ring systems (scaffolds) in drugs, clinical trials, reported/synthesized chemical space, and predicted future clinical trial ring systems.

Analysis of Growth Vectors

An additional layer of complexity to add to the previously defined classification system is not only the content of the ring systems but also how they are combined and the associated linking vectors, i.e., the points of attachment from the ring systems. There are subsets of Class 1, Class 2, and Class 3 where the known ring systems use either (i) only the known vectors for linking and growth or (ii) a combination of novel and known vectors or (iii) novel vectors only. From our database of clinical trials and drug molecules we recorded all unique growth vector combinations for each ring system when we generate the frameworks. We collated this set and compared the clinical growth vectors with those of drugs (see Table ), and in both cases, we recorded the enantiomeric form of each growth vector. There is approximately a 40% overlap of total ring space of clinical trial ring space with drug ring space, but if the growth vector combinations are included in the comparison then the overlap drops by about 15%, implying that for the drug rings being reused in clinical trials one-third of those utilize a different set of growth vectors which would achieve additional novelty. This suggests that around three-quarters of the drug rings that are used in clinical trials are not only the same rings but also the same points of attachment.

Table 7

Summary of Growth Vectors Used in Ring Systems from Molecules in Drugs and Clinical Trials

status	total no. of unique rings	rings from drugs	no. of unique rings and vector combinations	overlap of vectors and ring systems with drug vectors
Phase 1	191	101 (53%)	377	129 (34%)
Phase 2	278	128 (46%)	583	177 (30%)
Phase 3	169	91 (54%)	290	30 (37%)
all phases	450	170 (38%)	1003	239 (24%)
drugs	378		909

Another area of interest is the total number of growth vectors used per ring system within drugs and clinical trials. The number of vectors per scaffold can be used to guide how molecules can be assembled along with a suggestion for the number of growth vectors that are typically used for new ring systems. The average growth vector per ring per ring system was determined across “drugs”, “all phases”, and “drugs and clinical” sets (Figure ). One key observation was that the number of growth vectors per ring was disproportionately higher for monocycles (around 2.5). It is unclear whether the preceding observation just reflected the known chemistry and available synthetic handles on certain monocycles, or an increase in complexity in the rings is often balanced by a decrease in complexity for substitution patterns, or a more physical justification exists (e.g., ortho substitution in aromatic rings to influence rotamer populations or specific protein target interactions based on structure-based design or to prevent a metabolic process). For bicycles and above, our analysis suggested that ring systems in drugs and clinical trials usually have around 1 vector per ring, e.g., a bicycle ring system typically has 2 vectors and a tricycle has 3 vectors.

Figure 5

Average number of vectors used per ring vs number of rings per ring system for sets of molecules from drugs and clinical trials. Error bars are the standard deviation of each distribution.

Combining Ring Systems and Networks

We have shown that current drug and clinical trial ring systems have a combined total of 678 ring systems out of a possible 458 748 ring systems in available and reported chemical space. The potential number of combinations of these ring systems is huge as we have previously demonstrated by just combining two drug ring systems. A key question when optimizing compounds or indeed designing a library is which pool of ring systems should you pick from to maximize the probability of success while enabling a patent position. Furthermore, this question highlights the importance of the underlying ring systems as if they do not bind to the target or they have an intrinsic liability, i.e., are not productive, then the combinations of rings may also be nonproductive but on a much larger scale. In this section, we investigated how ring systems are combined to form molecules from drugs or clinical trials. To analyze how ring systems are combined, graph theory was used whereby a series of graph networks were built for each clinical phase, a combined clinical set, and drugs. Each node in the network graph represented a ring system, and if two ring systems were in the same molecule then they were directly connected in the graph. These network diagrams can be used to identify patterns in how ring systems are assembled to form full compounds that are present in the clinic or drugs. This can subsequently be used to bias the design of virtual or screening libraries to clinically relevant chemical space. Similar graph networks are used in social network analysis, and a common way to derive patterns across such complex networks is to cluster the network into sets of nodes (i.e., ring systems) that are densely connected. In this work, the goal of clustering each network was to identify groups of ring systems that frequently occurred together in drugs and/or clinical trials. All networks in this paper were built and then clustered with the Girvan–Newman algorithm[58] within NetworkX[59] and then visualized with Cytoscape.[60] The largest cluster in each network was positioned at the top left of each diagram. Common statistics for each network are outlined in Table , including the graph density and the isolated fraction. The graph density is the fraction of connections in the network compared to whether all nodes were connected, and the isolated fraction is the fraction of nodes that are not connected to anything, i.e., the fraction of molecules where the ring system is not connected to any other ring systems.

Table 8

Network Statistics for the Compounds in Each Clinical Phase (Phases 1–3), Combined Clinical Set, and Drug Compounds

status	no. of nodes	no. of edges	isolated fraction	density (×10^–2)
Phase 1	191	452	0.08	2.45
Phase 2	278	710	0.15	1.83
Phase 3	169	320	0.15	2.22
all phases	450	1184	0.14	1.17
drugs	377	590	0.26	0.83

The clustering of nodes within Phases 1–3 was quite similar, but the densities varied a lot. The density of the drug network was much smaller than that of the combined clinical set and had double the fraction of isolated nodes. It seems that the ring systems in drugs are more sparsely connected than their clinical trial counterparts. This implies greater complexity in connections for clinical trial rings. Next, the property space of the three largest clusters in the drug network (Figure S1) were compared to the overall scaffold space. The property distributions (see Figure S2) for ring count, the number of nitrogens, oxygens, sulfurs, heteroatoms, and the number of sp3 centers were calculated for the three largest clusters in the drugs network. There were not many significant differences between the properties of each cluster. Most notably, one cluster (cluster 3) was very deficient in oxygen atoms and monocycles, which could suggest that ring systems with high oxygen contents are not often paired with other ring systems. The property distributions of the three largest clusters in the clinical trial network (Figure a) were then determined (Figure S3). There were a few large differences between clusters in clinical trials and drugs: (1) a lower proportion of monocycles in clinical trials and 2) a greater proportion of nitrogen atoms in the clinical ring systems.

Figure 6

Network diagram showing how ring systems are connected in (a) all compounds in clinical trials. Color of the node represents the highest phase that each ring system can be found. Key: Phase 1 = blue, Phase 2 = green, Phase 3 = purple; drugs = black. (b) Network diagram depicting ring systems in clinical trials that are commonly found in kinase inhibitors (blue nodes); remaining black nodes are all other ring systems present in clinical trials. Central node of the top left cluster (largest cluster) in each subfigure represents benzene. One topic of interest was to determine how target-specific ring systems were distributed across the scaffold network. For the sake of simplicity, we focused on the distribution of “kinase-specific” ring systems within the clinical network. Here, kinase-specific referred to any ring system for which over one-half of the compounds it appeared in were kinase inhibitors. It can be seen in Figure b that the kinase-specific scaffolds were distributed across the entire network. In general, one kinase-specific ring system would be connected to a popular nonspecific kinase scaffold, e.g., benzene (top left cluster). However, there are a few small clusters in which kinase-specific ring systems are connected to each other. This network clearly identifies privileged ring system pairs that usually appear together in kinase inhibitors. These findings can be used to guide the design of target-specific virtual libraries. Graph topology measurements, such as how well connected each node is (degree centrality), can be used to identify key ring systems in the drug network (Figure S1). The overlap between the top 10, 20, and 50 scaffolds by frequency and degree centrality was 80%, 75%, and 68%, respectively. These results show that ordering nodes by graph topology measurements has some correlation to ordering by frequency, but these two approaches do not give identical results. For the sake of comparison, the top 10 ring systems by degree centrality are shown in Table . The frequency of a ring system in drug compounds gives an idea of how “privileged” a particular scaffold is with respect to “drug space”. However, a ring system could be frequently occurring in drugs but has only been combined with a limited number of ring systems. It could be argued that the design of a general hit ID library should be centered around scaffolds that frequently occur in drugs and those that have appeared in a variety of contexts. The use of network graphs allows for the simple identification of scaffold “hubs” that are frequently occurring and have been reported in a diverse range of compounds. There are a few other graph metrics of potential interest to library design: (1) prioritize scaffolds by how well connected the scaffold and its nearest neighbors are (eigenvalue centrality) or (2) identify scaffolds that connect different clusters (betweenness centrality).

Table 9

Top 10 Most Frequently Connected Ring Systems from Small Molecule Drugs Listed in the FDA Orange Book before January 2020 Sorted by Descending Frequency of Connections (fc) and Then Ascending Molecular Weight

The potential practical uses of these graphs fall into a few camps: library visualization, library generation, and compound design. The main utility of these graphs is likely for library visualization and interactive analysis. The graphs provide a visual way to determine how central each ring system is to a compound collection, not just by a simple count but by how many other scaffolds it is connected to and thus how integral each ring system is to the chemical diversity of the compound library. Furthermore, these graphs show how different regions of scaffold space are connected and the density of such connections.

Conclusion

In this work, an in-depth analysis of the scaffold chemical space of compounds in clinical trials has been carried out, and the results have been compared to ring systems in FDA drugs. It was found that around 70% of all clinical trial compounds only contain ring systems that are present in drugs, and we introduced a new classification system for these molecules based on the ring system origins, i.e., Class 1. This result mirrored findings from previous work[15] in which 70% of all newly released drugs were shown to only contain rings already in drugs. While we may have expected higher novelty in clinical trials when using our classification of molecules through the origin of the ring systems, this was not seen. However, when considering the complete set of ring systems used across all molecules in clinical trials there is a different conclusion in that the overall pool of new ring systems in clinical trials is greater than those ring systems from drugs; therefore, we are introducing more new ring systems in clinical trials. However, this is balanced by more frequent use of known drug ring systems compared with the new ring systems along with different growth vectors and combinations. One area we have not explored in this work is what ring systems are present in compounds that failed in the clinic. While failures in the clinic are of great interest to the field, the data presents a few issues that prohibit drawing meaningful conclusions. Compounds do not always fail at the clinic for scientific reasons and the reason for failure is often not included in clinical databases. Thus, any trends we could draw from failed compounds and the derived scaffolds are not necessarily a direct result of any underlying chemical issue. It was noted that many novel clinical ring systems were closely related to existing ring systems in drugs. To test this hypothesis, up to two atom changes were performed on all drug rings and the enumerated rings matched to novel ring systems. It was found that around 50% of novel ring systems in clinical trials were within two atom changes of an existing drug ring system. We carried out one of the largest recursive fragmentation protocols to date on over 2.24 billion compounds that cover all available public and commercial compounds. This “real” chemical space contained over 450 000 unique ring systems (named RINGO database). This data set provides an estimation for all of the rings that are available in synthesizable chemical space, where previous work in the literature has focused on virtual space[38,44] or bioactive ring space.[44,46] This data set builds on the work of others[44,46] to generate drug-like rings via virtual enumeration. Given our earlier observations that around 50% of future clinical trials scaffolds will be within two atom changes of rings in drugs or clinical trials, from 458 748 ring systems, 3902 ring systems were prioritized as future clinical trial scaffolds using not only the two atom change methodology but also heteroatom ratios derived from drugs and prevalence in public and commercial libraries. Using these simple ligand-based rules, we predicted around 1% of the “real” scaffold chemical space will encompass the ring systems used in 85% of new drugs. Moreover, we would highly recommend that the ratios of heteroatoms in ring systems and simple atom changes be used to help prioritize new ring systems that fall outside of the current analysis. Several analyses were performed to compare growth vectors in drugs and clinical trials and how compounds were built up. There was a 40% overlap between the rings in clinical trials and drugs, but if the growth vector combinations were included in the comparison then the overlap dropped by about 10%. This implied that around one-quarter of all drug ring systems in clinical trials explored novel growth vector combinations. To analyze how compounds were built up in drugs or clinical trials, a graph was built for each collection in which scaffolds that appeared in the same compound were connected by an edge. This analysis showed that, on average, ring systems in clinical trials had been combined with a much wider variety of scaffolds and were half as likely to have never been combined with another unique scaffold. These observations suggested that a greater variety of vector and scaffold combinations are used in clinical trials compared to drugs. This could be symptomatic of the introduction of more structure-based methods and modern synthetic routes in newer compounds found in clinical trials. The number of vectors per ring system as a function of ring systems remained the same in drug and clinical trials. It was noted that ring systems with more than one ring had an average of one growth vector per ring. We believe these observations on vector count per ring system are useful in focusing the directions for not only the optimization of molecules but also the number of vectors used during synthesis of novel ring systems in molecular libraries. Over the course of the work guidance on what clinically relevant clinical space is most likely to look like has been provided. The authors believe that the analysis described here will provide value through efficiently directed synthesis of clinical candidate molecules which feature fewer liabilities, reducing unknown risk in drug discovery.

47 in total

Review 1. Similarity searching using 2D structural fingerprints.

Authors: Peter Willett
Journal: Methods Mol Biol Date: 2011

2. Photometry-based estimation of the total number of stars in the Universe.

Authors: Lazo M Manojlović
Journal: Appl Opt Date: 2015-07-20 Impact factor: 1.980

3. The scaffold tree--visualization of the scaffold universe by hierarchical scaffold classification.

Authors: Ansgar Schuffenhauer; Peter Ertl; Silvio Roggo; Stefan Wetzel; Marcus A Koch; Herbert Waldmann
Journal: J Chem Inf Model Date: 2007 Jan-Feb Impact factor: 4.956

4. Generation of a set of simple, interpretable ADMET rules of thumb.

Authors: M Paul Gleeson
Journal: J Med Chem Date: 2008-01-31 Impact factor: 7.446

5. The 'rule of three' for fragment-based drug discovery: where are we now?

Authors: Harren Jhoti; Glyn Williams; David C Rees; Christopher W Murray
Journal: Nat Rev Drug Discov Date: 2013-07-12 Impact factor: 84.694

6. The properties of known drugs. 1. Molecular frameworks.

Authors: G W Bemis; M A Murcko
Journal: J Med Chem Date: 1996-07-19 Impact factor: 7.446

Review 7. Combining Molecular Scaffolds from FDA Approved Drugs: Application to Drug Discovery.

Authors: Richard D Taylor; Malcolm MacCoss; Alastair D G Lawson
Journal: J Med Chem Date: 2016-12-23 Impact factor: 7.446

8. Molecular properties that influence the oral bioavailability of drug candidates.

Authors: Daniel F Veber; Stephen R Johnson; Hung-Yuan Cheng; Brian R Smith; Keith W Ward; Kenneth D Kopple
Journal: J Med Chem Date: 2002-06-06 Impact factor: 7.446

9. Ultra-large library docking for discovering new chemotypes.

Authors: Jiankun Lyu; Sheng Wang; Trent E Balius; Isha Singh; Anat Levit; Yurii S Moroz; Matthew J O'Meara; Tao Che; Enkhjargal Algaa; Kateryna Tolmachova; Andrey A Tolmachev; Brian K Shoichet; Bryan L Roth; John J Irwin
Journal: Nature Date: 2019-02-06 Impact factor: 49.962

10. ZINC: a free tool to discover chemistry for biology.

Authors: John J Irwin; Teague Sterling; Michael M Mysinger; Erin S Bolstad; Ryan G Coleman
Journal: J Chem Inf Model Date: 2012-06-15 Impact factor: 4.956

1 in total

1. A pocket-based 3D molecule generative model fueled by experimental electron density.

Authors: Lvwei Wang; Rong Bai; Xiaoxuan Shi; Wei Zhang; Yinuo Cui; Xiaoman Wang; Cheng Wang; Haoyu Chang; Yingsheng Zhang; Jielong Zhou; Wei Peng; Wenbiao Zhou; Bo Huang
Journal: Sci Rep Date: 2022-09-06 Impact factor: 4.996

1 in total