| Literature DB >> 28650584 |
Michael Withnall1, Hongming Chen2, Igor V Tetko1,3,4.
Abstract
A matched molecular pair (MMP) analysis was used to examine the change in melting point (MP) between pairs of similar molecules in a set of ∼275k compounds. We found many cases in which the change in MP (ΔMP) of compounds correlates with changes in functional groups. In line with the results of a previous study, correlations between ΔMP and simple molecular descriptors, such as the number of hydrogen bond donors, were identified. In using a larger dataset, covering a wider chemical space and range of melting points, we observed that this method remains stable and scales well with larger datasets. This MMP-based method could find use as a simple privacy-preserving technique to analyze large proprietary databases and share findings between participating research groups.Entities:
Keywords: OCHEM; general solubility equation; matched molecular pairs; melting points
Mesh:
Year: 2017 PMID: 28650584 PMCID: PMC5900986 DOI: 10.1002/cmdc.201700303
Source DB: PubMed Journal: ChemMedChem ISSN: 1860-7179 Impact factor: 3.466
Figure 1An example of a matched molecular pair: the structures differ by a hydroxy group (highlighted).
Descriptor results for all compounds.
|
| # of samples | Mean descriptor change | Δ | ±SEM [°C] |
|
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[a] The PATENTS dataset comprises the ∼275 000 compound dataset we used in the study; the Schultes In‐House and Karthikeyan datasets are those used in the Schultes study.16 [b] n.s.=non‐significant (p>0.05).
CSP3 results for all compounds in the PATENTS database.[a]
| Experiment | Unconstrained descriptors | Descriptors unchanged | # of samples | Mean CSP3 change [%] | Δ | ±SEM [°C] |
|---|---|---|---|---|---|---|
| 1 | CSP3 | nRot | 29 874 | 8 | −7.3 | 5.6 |
| 2 | CSP3 | Halogen | 80 284 | 8 | −14 | 3.5 |
| 3 | CSP3 | nRot | 46 893 | 8 | −8.6 | 4.3 |
| 4 | CSP3 | nRot | 38 154 | 8 | −13 | 5.2 |
| 5 | CSP3 | nRot | 45 495 | 8 | −12 | 4.7 |
| 6 | CSP3 | nRot | 49 267 | 9 | −8.5 | 4.1 |
| 7 | CSP3 | n.a. | 641 192 | 10 | −19 | 1 |
[a] p values: <0.0001. The CSP3 fraction was considered to have changed when the difference was ≥2 % (0.02). logP calc was considered to be constrained if the change was ≤0.5. n.a.: not available.
Figure 2Correlation between predicted and observed ΔlogS, and the results of a consensus model of the two approaches.
The most common and most influential results of the functional group analyses.
| # of samples | Mean Δ | ±SEM [°C] |
| ||
|---|---|---|---|---|---|
|
| |||||
|
|
| ||||
| sulfonamides | sulfonic acids | 39 | 90 | ±17 | <0.0001 |
| phosphonic acid esters | phosphonic acids | 37 | 85 | ±9 | <0.0001 |
| thiocarboxylic acid esters | thiocarboxylic acid amides | 22 | 73 | ±7 | <0.0001 |
| dialkyl ethers | carboxylic acid secondary amides | 20 | 72 | ±10 | <0.0001 |
| carboxylic acid esters | carboxylic acid primary amides | 176 | 68 | ±4 | <0.0001 |
|
| |||||
|
|
| ||||
| carboxylic acid esters | carboxylic acids | 7056 | 65 | ±0.6 | <0.0001 |
| aryl fluorides | aryl chlorides | 6039 | 7.0 | ±0.5 | <0.0001 |
| aryl chlorides | aryl bromides | 3322 | 5.1 | ±0.6 | <0.0001 |
| aryl fluorides | aryl bromides | 1883 | 13 | ±0.8 | <0.0001 |
| carboxylic acid tertiary amides | carboxylic acid secondary amides | 1570 | 31 | ±1.4 | <0.0001 |
|
| |||||
|
| |||||
| pyrazoles (HS)[a] | 21 | −70 | ±17 | <0.0001 | |
| sulfenic acid derivatives | 49 | −55 | ±6 | <0.0001 | |
| thiocarboxylic acids | 25 | 52 | ±8 | <0.0001 | |
| 1,3‐diphenols | 22 | 51 | ±8.5 | <0.0001 | |
| alkyl iodides | 21 | 48 | ±13 | <0.0005 | |
|
| |||||
|
| |||||
| nitriles | 4618 | 18 | ±0.8 | <0.0001 | |
| arenes | 4278 | 7.3 | ±0.7 | <0.0001 | |
| nitro compounds | 3842 | 22 | ±0.8 | <0.0001 | |
| aryl chlorides | 3499 | 6.2 | ±0.8 | <0.0001 | |
| carboxylic acid esters | 3486 | −18 | ±0.9 | <0.0001 | |
[a] HS: shows high specificity, indicating that fusion with other rings is disallowed.
Figure 3Examples of functional group transformations.
Figure 4A histogram of the melting points of all compounds used in the study. The majority of compounds involved were in the drug‐like range of 50–250 °C.
Figure 5A PCA plot of the two first principal components of the eight descriptors used in the analysis. The change of color from blue to red indicates increasing compound melting point. The PCA plot was generated using the PAST35 software package.