| Literature DB >> 33145486 |
Oleksandr O Grygorenko1,2, Dmytro S Radchenko1,2, Igor Dziuba3, Alexander Chuprina4, Kateryna E Gubina2, Yurii S Moroz2,3.
Abstract
An approach to the generation of ultra-large chemical libraries of readily accessible ("REAL") compounds is described. The strategy is based on the use of two- or three-step three-component reaction sequences and available starting materials with pre-validated chemical reactivity. After the preliminary parallel experiments, the methods with at least ∼80% synthesis success rate (such as acylation - deprotection - acylation of monoprotected diamines or amide formation - click reaction with functionalized azides) can be selected and used to generate the target chemical space. It is shown that by using only on the two aforementioned reaction sequences, a nearly 29-billion compound library is easily obtained. According to the predicted physico-chemical descriptor values, the generated chemical space contains large fractions of both drug-like and "beyond rule-of-five" members, whereas the strictest lead-likeness criteria (the so-called Churcher's rules) are met by the lesser part, which still exceeds 22 million.Entities:
Keywords: Chemical Compound; Cheminformatics; Computational Chemistry by Subject
Year: 2020 PMID: 33145486 PMCID: PMC7593547 DOI: 10.1016/j.isci.2020.101681
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1A General Principle of the REAL Database Generation Using One-Step Two-Component Reactions
Figure 2An Approach to the Generation of Ultra-large Chemical Space Described in this Work
Scheme 1Parallel Reaction Sequences Studied in This Work.
See also Tables S1 and S2, Figures S1 and S2.
Validation Experiments for the Parallel Synthesis of Libraries 11–15
| # | Method | Conditions | Library | Success Rate, % | Average Yield, % | |
|---|---|---|---|---|---|---|
| All Experiments | Successful Experiments | |||||
| 1 | 1. HATU, | 77 | 34 | 44 | ||
| 2 | 1-2. Same as for | 60 | 26 | 43 | ||
| 3 | 1-2. Same as for | 53 | 16 | 31 | ||
| 4 | 1. | 81 | 30 | 38 | ||
| 5 | 1. HATU, | 80 | 41 | 51 | ||
See Also Tables S1 and S2, Figures S1 and S2.
Fraction of 400 experiments that allowed for the preparation of the target product.
NMP, N-methyl-2-pyrrolidone.
Figure 3Examples of Reagents 1–3 Showing Excellent and Poor Efficiency in the Methods Studied (Relative Configurations are Shown)
Figure 4Examples of Synthons Generated from Reagents 1, 2, 4, 5, 8, and 10 (in the Corresponding SMILES Representations, Uncommon [“Dummy”] Atoms are Used Instead of the Colored Asterisks [∗] to Denote Different Types of the Variation Points)
Scheme 2Virtual Coupling of the Synthons Shown in Figure 4 (the Variation Points [∗] Are Connected according to Their Types)
Number of Various Synthons Types Generated for Reaction Sequences and
| # | Method | Reagents | Number of Synthons | ||
|---|---|---|---|---|---|
| No Reactivity Features | With Steric Features | Total | |||
| 1 | 467 | 196 | 663 | ||
| 2 | 6,706 | 1,271 | 7,977 | ||
| 3 | 5,451 | 1,063 | 6,514 | ||
| 4 | 41 | 0 | 41 | ||
| 5 | 52 | 0 | 52 | ||
| 6 | 17,944 | 550 | 18,494 | ||
| 7 | 26,434 | 646 | 27,080 | ||
| 8 | 807 | 0 | 807 | ||
With steric hindrance at the free amino group (103), the protected amino group (82), or both (11).
Figure 5The Workflow of the Multibillion Chemical Space Generation
Results of the Multibillion Chemical Space Generation
| # | Method | No. of Synthons | No. of Library Members after | ||
|---|---|---|---|---|---|
| Virtual Coupling | Exclusion Filters | Duplicate Removal | |||
| 1 | 15,154 | 34,450,924,014 | 32,733,348,058 | 27,297,397,644 | |
| 2 | 46,474 | 1,748,296,098 | 1,748,296,098 | 1,563,752,616 | |
| 3 | Total | 60,431 | 36,199,220,112 | 34,481,644,156 | |
Figure 6Distribution of Physico-Chemical Descriptors Predicted for the Generated Chemical Space and Approved Drugs
See also Table S3.
Average Values of Physico-Chemical Descriptors Predicted for the Generated Chemical Space and Approved Drugs
| # | Method | MW | sLogP | HAcc | HDon | TPSA, Å2 | RotB | F |
|---|---|---|---|---|---|---|---|---|
| 1 | 440 | 2.61 | 5.3 | 1.5 | 93.7 | 6.4 | 0.56 | |
| 2 | 502 | 3.15 | 7.8 | 1.3 | 112.1 | 8.3 | 0.51 | |
| 3 | Total | 444 | 2.64 | 5.4 | 1.5 | 94.6 | 6.5 | 0.55 |
| 4 | DrugBank | 395 | 2.05 | 5.1 | 2.4 | 96.9 | 6.4 | 0.47 |
Data for 2,470 drugs deposited in DrugBank (as of September 2020).
Fractions of the Generated Chemical Space (%) Compliant with the Drug- and Lead-likeness Rules
| # | Method | Rule of 5 | + Veber's Rules | Rule of 4.5 | Rule of 4 | Churcher's Rules |
|---|---|---|---|---|---|---|
| 1 | 89.1 | 82.6 | 56.9 | 16.9 | 0.08 (21,167,934) | |
| 2 | 48.4 | 40.4 | 24.3 | 7.4 | 0.06 (952,402) | |
| 3 | Total | 86.9 | 80.3 | 55.1 | 16.5 | 0.08 (22,120,336) |
MW < 500, LogP<5, HAcc≤10, HDon≤5 (Lipinski et al., 1997).
RotB ≤10, TPSA <140 (Veber et al., 2002).
MW < 450, LogP<4.5 (Oprea et al., 2001).
MW < 400, LogP<4 (Hann and Oprea, 2004).
MW 200 … 350, LogP −1 … 3 (Nadin et al., 2012).
Absolute numbers of the library members are given in brackets.
Figure 7Relationship between the Size of the Generated Databases and the Size of the Synthon Subsets
Obtained by random selections from the initial synthon set for Method ; average from three independent selections; see also Table S4.
Figure 8Properties of the Generated Chemical Space as a Function of the Molecular Weight Cut-offs Applied to the Initial Synthon Sets for Method
(A and B) (A) The size of the generated databases. (B) Distribution of physico-chemical descriptors (MW and sLogP) for the generated chemical space.
See also Table S5.
Selected Approaches to Generate (Ultra-)large Virtual Chemical Space
| Feature | Approach Described in This Work | Previous Feasibility-Based approaches | Recent AI-Based approaches |
|---|---|---|---|
| Virtual chemical space | Multibillion (over 3 × 1010) | Large (~109) | Varied but typically less than 109 |
| Synthetic methods | Experimentally validated three-component two- or three-step reaction sequences | Experimentally validated two-component one-step reactions (mostly) | Various; typically based on the literature data (not always validated experimentally) |
| Algorithm | Very straightforward | Sophisticated | |
| Synthetic feasibility | Average value for each method or synthon, described as average synthesis success rate | Varied; from unknown to predicted for each particular member | |
| Building block reactivity assessment | Semi-qualitative; by a chemical expert aided by a computer | Typically quantitative; by AI | |
Previous version of our REAL methodology is referred here; much larger datasets were also generated internally within big pharma companies (Hoffmann and Gastreich, 2019).
The subject was reviewed and critically accessed in a number of recent publications (Schneider, 2018; Schwaller and Laino, 2019; Brown et al., 2020; Lemonick, 2020).