| Literature DB >> 28794856 |
J Jesús Naveja1,2, José L Medina-Franco1.
Abstract
We present a novel approach called ChemMaps for visualizing chemical space based on the similarity matrix of compound datasets generated with molecular fingerprints' similarity. The method uses a 'satellites' approach, where satellites are, in principle, molecules whose similarity to the rest of the molecules in the database provides sufficient information for generating a visualization of the chemical space. Such an approach could help make chemical space visualizations more efficient. We hereby describe a proof-of-principle application of the method to various databases that have different diversity measures. Unsurprisingly, we found the method works better with databases that have low 2D diversity. 3D diversity played a secondary role, although it seems to be more relevant as 2D diversity increases. For less diverse datasets, taking as few as 25% satellites seems to be sufficient for a fair depiction of the chemical space. We propose to iteratively increase the satellites number by a factor of 5% relative to the whole database, and stop when the new and the prior chemical space correlate highly. This Research Note represents a first exploratory step, prior to the full application of this method for several datasets.Entities:
Keywords: chemical space; data visualization; epigenetics; principal components analysis; similarity matrix
Year: 2017 PMID: 28794856 PMCID: PMC5538041 DOI: 10.12688/f1000research.12095.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Compound data sets used in the study.
| Dataset | Description | Size | 2D
| 2D
| 3D
|
|---|---|---|---|---|---|
| DNMT1 inhibitors | DNA-methyltransferase | 244 | 0.44 | 0.12 | 0.16 |
| SMARCA2 inhibitors | Chromatin remodeller | 220 | 0.51 | 0.15 | 0.23 |
| CREBBP inhibitors | Histone acetyltransferase | 178 | 0.67 | 0.22 | 0.16 |
| L3MBTL3 inhibitors | Histone methylation
| 115 | 0.77 | 0.41 | 0.03 |
| HDAC1 inhibitors | Histone acetyltransferase | 3,257 | 0.49 | 0.16 | 0.12 |
| DrugBank | Approved drugs | 1,900 | 0.35 | NC | NC |
aMedian of Tanimoto/MACCS similarity; bMedian of Tanimoto/ECFP4 similarity; cMedian of OMEGA-ROCS similarity; NC: not calculated
Benchmark with larger databases.
| Database | Gold standard
| Satellites
| Correlation |
|---|---|---|---|
| DrugBank | 162 | 147 | 0.92 |
| HDAC1 | 406 | 287 | 0.99 |
Figure 1. Backwards analysis with 2PCs picking satellites by diversity.
The correlation with the results from the whole matrix was calculated with increasing numbers of satellites. Each colored line represents one of the five iterations.
Figure 2. Backwards analysis with 2PCs picking satellites at random.
The correlation with the results from the whole matrix was calculated with increasing numbers of satellites. Each colored line represents one of the five iterations.
Figure 3. Forward analysis with 2PCs picking satellites at random step sizes of 5%.
Figure 4. Chemical space of DrugBank using ( A) the adaptive satellites approach or ( B) the gold standard. As well as for HDAC1 using ( C) the adaptive satellites approach or ( D) the gold standard.