| Literature DB >> 30504922 |
Stephen W C Walker1, Ahdia Anwar1, Jarrod M Psutka1, Jeff Crouse1, Chang Liu2, J C Yves Le Blanc2, Justin Montgomery3, Gilles H Goetz3, John S Janiszewski3, J Larry Campbell4,5, W Scott Hopkins6.
Abstract
The fast and accurate determination of molecular properties is highly desirable for many facets of chemical research, particularly in drug discovery where pre-clinical assays play an important role in paring down large sets of drug candidates. Here, we present the use of supervised machine learning to treat differential mobility spectrometry - mass spectrometry data for ten topological classes of drug candidates. We demonstrate that the gas-phase clustering behavior probed in our experiments can be used to predict the candidates' condensed phase molecular properties, such as cell permeability, solubility, polar surface area, and water/octanol distribution coefficient. All of these measurements are performed in minutes and require mere nanograms of each drug examined. Moreover, by tuning gas temperature within the differential mobility spectrometer, one can fine tune the extent of ion-solvent clustering to separate subtly different molecular geometries and to discriminate molecules of very similar physicochemical properties.Entities:
Year: 2018 PMID: 30504922 PMCID: PMC6269546 DOI: 10.1038/s41467-018-07616-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1DMS dispersion plots of molecule set A1. Measurements were recorded in an N2 environment seeded with methanol vapor (1.5% mole ratio) at T = 150 °C and T = 300 °C. (Inset) Molecular topology A, set #1. Dispersion data for A1a and A1b are plotted in black and red, respectively. Compound A1a exhibits an IMHB (highlighted in green, whereas compound A1b does not). Protonation sites, as determined by DFT calculations, are highlighted in red. In A1a, the proton is shared between the carbonyl oxygen atom and ring nitrogen atom in the protonated form. Error bars (2σ) indicate the standard deviation of the Gaussian fit to the CV peak
Fig. 2DMS dispersion plots of molecule set C3. Measurements were recorded in an N2 environment seeded with methanol vapor (1.5% mole ratio) at T = 150 °C, T = 225 °C, and T = 300 °C. The dispersion data of C3a are plotted in black and that of C3b is plotted in red. Molecule C3a exhibits an IMHB (highlighted in green), whereas molecule C3b does not. Protonation sites, as determined by DFT calculations, are highlighted in red. Errors are calculated as in Fig. 1, but are omitted for clarity
Fig. 3Random forest regression ML fits for CCS and pKb along with learning curves. Learning curves for CCS a and pKb c show the evolution of mean absolute error for test set (black) and training set (red) as the size of the ML training set is increased. MAE for test set is shown in black while that for training set is shown in red. Random forest results for CCS and pKb are shown in b and d, respectively. Individual points are colored based upon the molecular set as indicated in the figure. The red solid line in b and d is a plot of y = x. Error bars (2σ) are calculated from the standard deviation of the fitted parameter