Alexander L Perryman1, Thomas P Stratton2, Sean Ekins3,4, Joel S Freundlich5,6. 1. Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA. 2. Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA. 3. Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA. 4. Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA. 5. Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA. freundjs@rutgers.edu. 6. Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA. freundjs@rutgers.edu.
Abstract
PURPOSE: Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS: Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS: "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS: Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
PURPOSE:Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS: Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS: "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS: Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Authors: Andreas Bender; Josef Scheiber; Meir Glick; John W Davies; Kamal Azzaoui; Jacques Hamon; Laszlo Urban; Steven Whitebread; Jeremy L Jenkins Journal: ChemMedChem Date: 2007-06 Impact factor: 3.466
Authors: Alexander L Perryman; Weixuan Yu; Xin Wang; Sean Ekins; Stefano Forli; Shao-Gang Li; Joel S Freundlich; Peter J Tonge; Arthur J Olson Journal: J Chem Inf Model Date: 2015-02-17 Impact factor: 4.956
Authors: Sean Ekins; Justin Bradford; Krishna Dole; Anna Spektor; Kellan Gregory; David Blondeau; Moses Hohman; Barry A Bunin Journal: Mol Biosyst Date: 2010-02-09
Authors: Scott G Franzblau; Mary Ann DeGroote; Sang Hyun Cho; Koen Andries; Eric Nuermberger; Ian M Orme; Khisimuzi Mdluli; Iñigo Angulo-Barturen; Thomas Dick; Veronique Dartois; Anne J Lenaerts Journal: Tuberculosis (Edinb) Date: 2012-08-30 Impact factor: 3.131
Authors: Alexander L Perryman; Jimmy S Patel; Riccardo Russo; Eric Singleton; Nancy Connell; Sean Ekins; Joel S Freundlich Journal: Pharm Res Date: 2018-06-29 Impact factor: 4.200
Authors: Janaina Cruz Pereira; Samer S Daher; Kimberley M Zorn; Matthew Sherwood; Riccardo Russo; Alexander L Perryman; Xin Wang; Madeleine J Freundlich; Sean Ekins; Joel S Freundlich Journal: Pharm Res Date: 2020-07-13 Impact factor: 4.200
Authors: Thomas P Stratton; Alexander L Perryman; Catherine Vilchèze; Riccardo Russo; Shao-Gang Li; Jimmy S Patel; Eric Singleton; Sean Ekins; Nancy Connell; William R Jacobs; Joel S Freundlich Journal: ACS Med Chem Lett Date: 2017-09-14 Impact factor: 4.345
Authors: Kimberley M Zorn; Shengxi Sun; Cecelia L McConnon; Kelley Ma; Eric K Chen; Daniel H Foil; Thomas R Lane; Lawrence J Liu; Nelly El-Sakkary; Danielle E Skinner; Sean Ekins; Conor R Caffrey Journal: ACS Infect Dis Date: 2021-01-12 Impact factor: 5.084
Authors: Sean Ekins; Alex M Clark; Krishna Dole; Kellan Gregory; Andrew M Mcnutt; Anna Coulon Spektor; Charlie Weatherall; Nadia K Litterman; Barry A Bunin Journal: Methods Mol Biol Date: 2018
Authors: Sean Ekins; Ana C Puhl; Kimberley M Zorn; Thomas R Lane; Daniel P Russo; Jennifer J Klein; Anthony J Hickey; Alex M Clark Journal: Nat Mater Date: 2019-04-18 Impact factor: 43.841