| Literature DB >> 27334348 |
Jaroslaw Polanski1, Urszula Kucia1, Roksana Duszkiewicz1, Agata Kurczyk2, Tomasz Magdziarz3, Johann Gasteiger4.
Abstract
The relationship between the structure aical">nd a property of a chemical compound is an essential concept in chemistry guiding, for example, drug design. Actually, however, we need economic considerations to fully understand the fate of drugs on the market. We are performing here for the first time the exploration of quantitative structure-economy relationships (QSER) for a large dataset of a commercial building block library of over 2.2 million chemicals. This investigation provided molecular statistics that shows that on average what we are paying for is the quantity of matter. On the other side, the influence of synthetic availability scores is also revealed. Finally, we are buying substances by looking at the molecular graphs or molecular formulas. Thus, those molecules that have a higher number of atoms look more attractive and are, on average, also more expensive. Our study shows how data binning could be used as an informative method when analyzing big data in chemistry.Entities:
Year: 2016 PMID: 27334348 PMCID: PMC4917835 DOI: 10.1038/srep28521
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparison of the weight (WBM) vs. molar based metrics (MBM).
(a) MBM: the Avogadro number of molecules (higher weight for higher MW molecules); (b) WBM: a unit weight can contain either a larger number of lower MW molecules or a lower number of the higher MW molecules. Thus, the ratio of the benzene to [18]annulene molecules in the same weight unit will amount to 1 to 3 in the WBM.
Figure 2Binned MW statistics of chemical and economical data of the commercial library of ca. 2.2 mln compounds.
(a) Frequency distribution of MWs of the MW between 218.26 to 738.74 Da. The majority of the compounds are below a MW of 350 (96%) or 400 Da (99%). (b) Mean price (WBM) vs. MW bins. (c) Average atom counts vs. MW bins. (d) Mean price (MBM) vs. MW bins. (e) Random numbers (0–1) and (f) randomly shuffled mean WBM prices; plotted vs. MW bins. In (a–f) data were rendered at a single Da MW unit. (g) Mean price (WBM) vs. MW bins with a rendering resolution of 2 Da units. The comparison of the binning mode in (g vs. b) does not reveal any substantial changes in the range below ca. 400 Da.
Correlation coefficients between molecular descriptors, WBM and MBM pricesa.
| P1 | P2 | MW | AC | SAS1 | |
|---|---|---|---|---|---|
| P1 | 1 | 0.857 | −0.033 | 0.171 | 0.341 |
| P2 | 0.857 | 1 | 0.474 | 0.045 | 0.239 |
| MW | −0.033 | 0.474 | 1 | −0.213 | −0.116 |
| AC | 0.171 | 0.045 | −0.213 | 1 | 0.380 |
| SAS1 | 0.341 | 0.239 | −0.116 | 0.380 | 1 |
aStatistical credibility tests indicate much lower correlation values (R = 0.000526) for the shuffled price values, compare Methods an supplementary materials for additional data.
bP1 - Price (WBM).
cP2 - Molar price (MBM).
dMW - molecular weight.
eAC - Atom count.
fSAS1 - synthetic accessibility score.
Figure 3Binned MW statistics for the nitrogen sub-library.
(a) Mean price (WBM) vs. MW bins; (b) mean number of nitrogen atoms in a molecule plotted against MW bins and (c) mean price (MBM) plotted against MW bins.
Figure 4Mean prices vs. mean atom counts.
(a) WBM; (b) MBM and (c) bin SAS1 statistics.