| Literature DB >> 23327565 |
Vincent Le Guilloux1, Alban Arrault, Lionel Colliandre, Stéphane Bourg, Philippe Vayer, Luc Morin-Allory.
Abstract
BACKGROUND: High-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened.Entities:
Year: 2012 PMID: 23327565 PMCID: PMC3547782 DOI: 10.1186/1758-2946-4-20
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1Full workflow of the SDF import process in SA2.
List of properties and flags that are automatically calculated when importing new molecules
| SMARTS | Reactive | Reactive compounds (15) | [ |
| | Warhead | Warhead compounds (20) | [ |
| | PAINS <15 | Pan Assay Interference Compounds (409) | [ |
| | PAINS <150 | Pan Assay Interference Compounds (55) | [ |
| | PAINS >150 | Pan Assay Interference Compounds (16) | [ |
| Flags | RO5 | Lipinski’s rule of 5 * | [ |
| | RO3 | Fragment rule of 3 * | [ |
| | Exotic | Unrecognised atom type * | - |
| | Salt | Disconnected structures | - |
| Descriptors | Weight | Molecular weight | - |
| | LogP | Caculated logP * | - |
| | Heavy atoms | Number of heavy atoms | - |
| | HBA | Number of hydrogen bond acceptors * | - |
| | HBD | Number of hydrogen bond donors * | - |
| | Halogens | Number of halogen atoms | - |
| | Rot. Bonds | Number of rotatable bonds * | - |
| | Ring count | Number of rings (SSSR) | - |
| Max Ring size | Maximum size of rings | - |
The definition of descriptors marked by an asterisk is handler-specific.
Figure 2Graphical User Interface overview. Three types of windows are highlighted here: S1 and S2 in blue show a set of Singleton windows that display the information on single molecules. G1 and G2 (red) are two group windows. G1 is an interactive X-Y plot where the Chembridge (dark gray) and AMRI kinase libraries (blue) are plotted. G2 represents the list of selected molecules that are highlighted in G1. D in (green) is a database window that displays the list of providers.
Scaffold and framework representativity
| | ||||
|---|---|---|---|---|
| Database | 1 084 411 | 15.24% | 247 689 | 3.47% |
| Singleton | 647 260 | 9% | 104 250 | 1.5% |
| Cumulative freq. (50%) | 11 565 | 1.06% | 447 | 0.18% |
| Cumulative freq. (80%) | 150 777 | 13.87% | 7 741 | 3.13% |
Singletons are core structures that are associated with only one molecule. Percentages for Database and Singleton rows are expressed as the number of scaffolds (Count column) divided by the total number of molecules in the database. Cumulative frequency values represent the number (resp. proportion) of scaffolds that are needed to obtain a certain percentage of compounds (in brackets) in the database.
Summary of the scaffold composition and unicity analysis
| | | ||||
|---|---|---|---|---|---|
| Min | 0% | 6.3% | 0% | 2.2% | 0% |
| Max | 100% | 84.2% | 87.9% | 57.8% | 49% |
| Average | 24.9% | 24.7% | 13.3% | 11.6% | 5.9% |
| > 10% | 37 | 69 | 27 | 26 | 13 |
| > 20% | 33 | 37 | 17 | 11 | 8 |
| > 50% | 15 | 4 | 4 | 1 | 0 |
Unicity is defined as the proportion of molecules (or scaffolds / frameworks) that are exclusive to a given provider, i.e. that cannot be found in any other provider. The proportion of scaffolds / frameworks are expressed as the number of molecules divided by the number of scaffolds / frameworks associated with a given provider. The minimum, maximum and average values through all vendors are given in this table. The number of vendors having one of these indices up to a given threshold is given in the second part of the table.
Summary of the drug-like analysis
| | | | | | |
|---|---|---|---|---|---|
| Min | 1.3% | 0% | 0% | 0% | 0% |
| Max | 86.7% | 25.6% | 19.5% | 27.5% | 29% |
| Average | 27.7% | 5.7% | 6.9% | 5.9% | 12.3% |
| > 5% | 59 | 36 | 36 | 35 | 66 |
| > 10% | 47 | 13 | 21 | 14 | 48 |
| > 20% | 34 | 1 | 0 | 1 | 7 |
Percentages are expressed as the minimum, maximum and average proportion of molecules that are flagged for each criterion, computed over each provider. The number of providers having one of these indices up to a given threshold is given in the second part of the table.
Figure 3List of the 20 most populated scaffolds (A) and frameworks (B) in the database of 6.7M compounds. The number of compounds associated with each core structure are displayed in brackets. These pictures were generated using the Scaffold report of SA2.
Figure 4Comparative distribution of some physico-chemical properties. Chembridge Kinaset (red) and the AMRI kinase library (blue). HBA (resp. HBD) stands for Hydrogen Bond Acceptor (resp. Donor). These histograms can be obtained by simply clicking on the property to analyse in the Properties window of SA2.
Scaffold, framework and compound originality of the AMRI and Chembridge kinase libraries
| | ||||
|---|---|---|---|---|
| Frameworks | 873 | 27.0% | 2 204 | 19.2% |
| Frameworks unicity | 747 | 85.6% | 2 078 | 94.3% |
| Scaffolds | 1 053 | 32.6% | 4 036 | 35.1% |
| Scaffolds unicity | 1 008 | 95.7% | 3 991 | 98.9% |
| Overlap (compounds) | 1 molecule | |||
The proportion of scaffolds (resp. frameworks) is expressed as the number of unique scaffolds (resp. frameworks) divided by the number of molecules in the library. The scaffolds (resp. frameworks) unicity is expressed as the number of scaffolds (resp. frameworks) unique to the library divided by the total number of scaffolds (resp. frameworks) in the library.
Figure 5PCA projections of the Chembridge (black dots) and AMRI (red dots) libraries. The first reduced space (A) has been computed within SA2 on the entire kinase database using the CDK BCUT descriptors which were computed upon import. The second reduced space (B) is the DRCS-MOE2D space which is already available in new SA2 databases, and for which descriptor values were imported. The contour shown in black encompasses the densest region spanned by HTS compounds (see [44] for details).
Figure 6PCA projections of the diverse subset (red dots) in the two spaces described previously. The remaining molecules (AMRI + Chembridge) are drawn in black. The figure has been generated with the DRCS plot window of SA2.
Diversity evaluation for diverse and random subsets
| Scaffold% | 84% | 61% | 63% |
| Framework% | 52% | 44% | 44% |
| MACCS | | | |
| Avg. pairwise | 0.44 | 0.48 | 0.48 |
| Avg. NN | 0.76 | 0.88 | 0.88 |
| Max. sim. | 0.80 | 1.00 | 1.00 |
| Pubchem | | | |
| Avg. pairwise | 0.48 | 0.50 | 0.50 |
| Avg. NN | 0.82 | 0.87 | 0.87 |
| Max. sim. | 0.98 | 1.00 | 1.00 |
| Indigo | | | |
| Avg. pairwise | 0.26 | 0.29 | 0.29 |
| Avg. NN | 0.70 | 0.81 | 0.81 |
| Max. sim. | 1.00 | 1.00 | 1.00 |
The percentage of scaffolds and frameworks are reported for each library. The Tanimoto metric and three different fingerprints were also used to compute average pairwise similarity (Avg. pairwise), average nearest neighbor similarity (Avg. NN), and maximum pairwise similarity (Max. sim) within each library, using 3 different fingerprints that can be computed directly within SA2. These data were generated with the Similarity report and the Scaffold report of SA2.