Literature DB >> 25860834

pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures.

Douglas E V Pires^1,2, Tom L Blundell¹, David B Ascher¹.

Abstract

Drug development has a high attrition rate, with poor pharmacokinetic and safety properties a significant hurdle. Computational approaches may help minimize these risks. We have developed a novel approach (pkCSM) which uses graph-based signatures to develop predictive models of central ADMET properties for drug development. pkCSM performs as well or better than current methods. A freely accessible web server (http://structure.bioc.cam.ac.uk/pkcsm), which retains no information submitted to it, provides an integrated platform to rapidly evaluate pharmacokinetic and toxicity properties.

Entities: Chemical

Mesh：

Substances：
Small Molecule Libraries

Year: 2015 PMID： 25860834 PMCID： PMC4434528 DOI： 10.1021/acs.jmedchem.5b00104

Source DB: PubMed Journal: J Med Chem ISSN： 0022-2623 Impact factor: 7.446

Introduction

Developing new drugs has become an increasingly challenging, costly, and risky endeavor with a low success rate. The vast majority of drugs evaluated in clinical trials do not reach the market due to either a lack of efficacy or unacceptable side effects.[1] Drug development is a fine balance of optimizing drug like properties to maximize efficacy, safety, and pharmacokinetics. Many early stage drug discovery programs focus on identifying molecules that bind to a target of interest. While potency is a driving factor in these early stages, ultimately the pharmacokinetic and toxicity properties dictate whether it will ever advance its effectiveness and success therapeutically. The interaction between pharmacokinetics, toxicity, and potency is crucial for effective drugs. The pharmacokinetic profile of a compound defines its absorption, distribution, metabolism, and excretion (ADME) properties. While optimal binding properties of a new drug to the therapeutic target are crucial, ensuring that it can reach the target site in sufficient concentrations to produce the physiological effect safely is essential for the introduction into the clinic. Appreciation of the importance of ADMET properties has led to their consideration in early stage drug development, leading to a significant reduction in the number of compounds that failed in clinical trials due to poor ADMET properties.[2−6] One strategy that has been widely employed is the introduction of physicochemical filters, such as Lipinski’s “Rule of 5”[7] or the PAINs filters[8] as guidelines for what may constitute a successful drug. These try to identify broad chemical properties that may increase a molecules chances to reach the market, however, presenting the converse effect of limiting potential unexplored chemical space, from which successful drugs have been originated from.[9] Even using the extensive data available within pharmaceutical companies can lead to conflicting rules,[10,11] highlighting the difficulty associated with applying these filters. Ultimately, irrespective of filters, the early ADMET profiling of drug candidates is a crucial component in determining the potential success of a new compound and when integrated into the drug development process can hopefully mitigate the risk of attrition. Experimental evaluation of small-molecule ADMET properties is both time-consuming and expensive and does not always scale between animal models and humans. The evolution of computational approaches to optimize pharmacokinetic and toxicity properties may enable the progression of discovery leads effectively and swiftly to drug candidates. The prediction of ADMET-associated properties of new chemicals, however, is a challenging task with only tenuous links between many physicochemical characteristics and pharmacokinetic and toxicity properties. This has led to a need for novel approaches to understand, explore, and predict ADMET properties of small molecules as a way to improve compound quality and success rate.[12] Many in silico approaches for predicting pharmacokinetic and toxicity properties of compounds from their chemical structure have been developed,[13] ranging from data-based approaches such as quantitative structure–activity relationship (QSAR),[14,15] similarity searches,[16,17] and 3-dimensional QSAR,[18] to structure-based methods such as ligand–protein docking[19] and pharmacophore modeling.[20] Many of these are unfortunately not freely available, which limits their utility for the scientific community. Numerous databases of experimentally measured ADMET properties have been compiled,[21−30] some of which are freely available. Using these databases a number of QSAR models have been generated to predict some of these properties.[22,31−36] The problem with these methods is that they tend to focus on recognition of certain substructure elements and are prone to be of limited use when exploring novel chemical entities beyond the scope of the experimental data used to generate the original models. Machine learning approaches, however, rely upon learning patterns between chemical composition, similarity, and pharmacokinetic and safety properties in order to build predictive models capable of generalization, i.e., discovering implicit patterns consistent and valid for unseen data. Here we use the concept of graph-based structural signatures to study and predict a range of ADMET properties for novel chemical entities. We show that these signatures can be used successfully to train predictive models for a variety of ADMET properties. The approach, called pkCSM, also provides a platform for the analysis and optimization of pharmacokinetic and toxicity properties implemented in a user-friendly, freely available web interface (http://structure.bioc.cam.ac.uk/pkcsm), a valuable tool to help medicinal chemists find the balance between potency, safety, and pharmacokinetic properties. We have conducted a series of comparative experiments that indicate that pkCSM performs as well as or better than several other widely used methods.

Results

pkCSM: Graph-Based Signatures

Graph modeling is an intuitive and well established mathematical representation of chemical entities, from which different descriptors encompassing both molecule structure and chemistry can be extracted. An intuitive graph representation of a compound can be achieved by representing atoms as nodes and their covalent bonds as edges. This simple representation can be decorated with labels denoting, for instance, physicochemical properties of atom and bonds, from which structural patterns could be prospected. Substructure matching, implemented for instance as a toxicophore search,[37] frequent subgraph mining,[38] and graph kernels,[39] are examples of approaches for extracting patterns from these graphs. Together with experimental data on particular properties of interest (e.g., ADMET properties), these descriptors can then be used as evidence to train highly accurate predictive models via machine learning methods. Such a predictive capability may be an essential computational tool for property optimization and to guide screening initiatives. An alternative way of extracting relevant patterns from molecular graphs is using the concept of structural signatures. In da Silveira et al.,[40] we introduced the Cutoff Scanning algorithm to extract distance patterns from protein structure graphs and summarized them into a signature vector. These signatures have been shown to be a general, powerful, and scalable way to represent geometry and physicochemical properties of protein structures and have been successfully adapted and employed for different purposes, including protein structural classification and function prediction,[41] receptor-based ligand prediction,[42] and more recently, as a component of structure-based mutation analysis approaches.[43−47] Here, we propose pkCSM, a novel method for predicting and optimizing small-molecule pharmacokinetic and toxicity properties which relies on distance-based graph signatures. We adapted the Cutoff Scanning concept to represent small-molecule structure and chemistry (expressed as atomic pharmacophores–node labels) in order to represent and predict their pharmacokinetic and toxicity properties, building 30 predictors divided into five major classes: absorption (seven predictors), distribution (four predictors), metabolism (seven predictors), excretion (two predictors), and toxicity (10 predictors). Figure 1 shows the pkCSM workflow. Given a set of input molecules, two main sets of descriptors are calculated and combined to be used in the subsequent machine learning step: general molecule properties and a distance-based graph signature.

Figure 1

pkCSM workflow. Given an input molecule, two main sources of information are used to train and test machine learning-based predictors: compound general properties (including molecular properties, toxicophores and pharmacophore) and distance-based graph signatures.

The first major component of the pkCSM signature refers to molecular properties, which include: A toxicophore fingerprint; Atomic pharmacophore frequency count; General molecular properties including lipophilicity (log P), molecular weight, surface area, number of rotatable bonds, among others. pkCSM workflow. Given an input molecule, two main sources of information are used to train and test machine learning-based predictors: compound general properties (including molecular properties, toxicophores and pharmacophore) and distance-based graph signatures. The toxicophore fingerprint was calculated based on substructure matching from SMARTS queries proposed in ref (37) originally as potential indicators of AMES mutagenicity (available as Supporting Information). The toxicophore substructure matching, molecular properties, and pharmacophore calculations were obtained using the RDkit cheminformatics toolkit. A complete list of calculated properties can be found in the Supporting Information (Table S1). Six nonexclusive pharmacophore classes are considered (i.e., an atom can belong to more than one class): hydrophobic, aromatic, hydrogen acceptor, hydrogen donor, positive ionizable, and negative ionizable. The second major component are distance-based patterns, represented as a cumulative distribution function, encoded in a small-molecule graph-based signature, which was adapted from the Cutoff Scanning algorithm.[41,42] This way, each dimension of the signature denotes the number of atoms (categorized by pharmacophore type) within a certain distance in the molecular graph. The distance between any two nodes of the graph is given by the cost of their shortest path, calculated by Johnson’s algorithm.[48] The cost of a shortest path is the sum of the weights of the edges on this path. We consider all the edges to have unitary weight. Thus, the cost of the shortest path is the number of edges in it.

Predicting Small-Molecule ADMET Properties

To build pkCSM, we performed a careful selection of data sets and recently published methods available in the literature. The validation methods chosen for each data set is consistent to the original work for comparison purposes and are available in Table S2 of Supporting Information. The pkCSM platform for ADMET properties prediction can be divided in two groups of highly predictive models: (a) 14 regression models that aim to predict a numeric quantification of the pharmacokinetic or toxicity property and (b) 16 classification models, which categorize the outcome into two classes. A description of the models in pkCSM and how to interpret their predictions can be found in the Supporting Information. Table 1 shows the comparative prediction performance for the regression models. Further information on the data sets used, number of data points, reference, and their validation procedure (i.e., cross-validation, external test set) can also be found in Supporting Information (Table S2). The performance for the classification models can be found in Table 2. pkCSM outperformed well established tools. For example, pkCSM AMES test achieved an accuracy of 83.8% compared to ToxTree[49] (which achieved an accuracy of 75.8%).

Table 1

Comparative Regression Performance between pkCSM and Other Available Methods

	previous methods				pkCSM
data set	method	ref	std error	R²	std error	R²
water solubility	admetSAR	(22)	0.823	0.810	0.692/0.497	0.943/0.967a
Caco2 permeability	admetSAR	(22)	0.339	0.564	0.605/0.466	0.733/0.828a
intestinal absorption- human	Hou et al.	(50)	10.28	0.890	12.80/9.51	0.846/0.902
skin permeability	Alves et al.	(51)	0.490	0.720d	0.758/0.539	0.683/0.801
steady state volume of distribution	Berellini et al.	(52)	1.287	0.613	1.104/0.803	0.637/0.706
fraction unbound- human (Fu)	Del Amo et al.	(53)	NA	0.737	0.248/0.189	0.693/0.824
blood–brain barrier permeability	Suenderhauf et al.	(54)	0.580	0.900a	0.379/0.287	0.807/0.862
CNS Permeability	Suenderhauf et al.	(54)	NA	NAc	0.825/0.665	0.690/0.794
total clearance	Yap et al.	(55)	NA	0.636	0.300/0.245	0.600/0.755
maximum recommended tolerated dose (MRTD)-human	Liu et al.	(56)	0.560	0.790a,b	0.885/0.641	0.633/0.741
oral rat accute toxicity (LD₅₀)	admetSAR	(22)	0.324	0.613	0.683/0.470	0.663/0.779a
oral rat chronic toxicity-lowest observed adverse effect (LOAEL)	Mazzatorta et al.	(57)	0.727	0.500	0.744/0.591	0.683/0.776a
T. Pyriformis toxicity	admetSAR	(22)	0.256	0.761	0.535/0.349	0.855/0.933a
flathead minnow toxicity (LC₅₀)	admetSAR	(22)	0.666	0.574	0.836/0.587	0.743/0.853a

Results for 40-fold cross-validation.

Only classification methods were available.

Results reported for 0.77 data set coverage.

Table 2

Comparative Classification Performance between pkCSM and Related Methods

	previous method				pkCSM
data set	method	ref	Q	AUC	Q	AUC
P-glycoprotein substrate	admetSAR	(22)	0.735	0.768	0.780	0.814
P-glycoprotein inhibitor I	admetSAR	(22)	0.786	0.853	0.844	0.906a
P-glycoprotein inhibitor II	admetSAR	(22)	0.866	0.922	0.898	0.948a
CYP450 1A2 inhibitor	admetSAR	(22)	0.815	0.815	0.802	0.876a
CYP450 C19 inhibitor	admetSAR	(22)	0.805	0.805	0.808	0.879a
CYP450 2C9 inhibitor	admetSAR	(22)	0.802	0.802	0.807	0.868a
CYP450 2D6 inhibitor	admetSAR	(22)	0.855	0.855a	0.853	0.843
CYP450 3A4 inhibitor	admetSAR	(22)	0.645	0.848	0.780	0.847
CYP450 2D6 substrate	admetSAR	(22)	0.759	0.759	0.766	0.787
CYP450 3A4 substrate	admetSAR	(22)	0.638	0.638	0.656	0.676
hERG I inhibitor	admetSAR	(22)	0.870	0.820	0.853	0.881
hERG II inhibitor	admetSAR	(22)	0.784	0.849	0.813	0.876
renal organic cation transporter	admetSAR	(22)	0.795	0.807	0.797	0.810
AMES toxicity	admetSAR	(22)	0.851	0.908	0.838	0.909
AMES toxicity	ToxTree	(49)	0.758	NA	0.838	0.909
hepatotoxicity	Fourches et al.	(58)	0.639a	NA	0.658	0.687
skin sensitization	Alves et al.	(59)	NA	0.820	0.810	0.850

Denotes a statistically significant performance difference calculated by nonparametric Wilcoxon statistic,[60] using a threshold of ≤0.05 for significance.

Denotes a statistically significant performance difference obtained via a Fisher r–to–z transformation, by calculating the z value, using a threshold of p ≤ 0.05 for significance. Two values are shown per column for pkCSM, denoting the performance on the entire data set and the performance after 10% outlier removal. NA: not available. Results for 40-fold cross-validation. Only classification methods were available. Results reported for 0.77 data set coverage. Denotes a statistically significant performance difference calculated by nonparametric Wilcoxon statistic,[60] using a threshold of ≤0.05 for significance. pkCSM regression models presented a range of Pearson correlation coefficients ranging from 0.6 to 0.9, using both cross-validation schemes and external validation data sets. In comparison with available methods, for most data sets, it presents a statistically significant improvement in predictive power. Compounds were ranked based on the absolute prediction error, and the worst 10% were considered outliers for regression analysis purposes. It is interesting to note the increase in performance when 10% of the outliers are removed. For instance, pkCSM is able to achieve a correlation of R2 = 0.779 in 90% of the data for rat toxicity and R2 = 0.828 for Caco2 permeability, a significant improvement in comparison with the correlations for the whole data sets (R2 = 0.663 and R2 = 0.733, respectively). In cases where previous methods exhibit a better correlation coefficient than pkCSM, we observed that, after removing the outliers, pkCSM presented a comparable performance and/or a lower standard error, such as the case for the blood–brain barrier permeability data set (BBB). No distinguishable trends were identified in the analysis of physicochemical properties of outlier compounds in comparison with the remaining data set. Figure 2 shows the plots between experimental and predicted values for regression absorption predictors. Figures S1 and S2 of Supporting Information, depict results for distribution and toxicity predictors, respectively. The pkCSM models were able to achieve good correlations despite the variability in data set sizes and distribution of experimental values.

Figure 2

Regression analysis for absorption predictors considering cross-validation schemes. Pearson’s correlation coefficients and standard error are also shown at the top-left corner. The left graph shows the correlation between experimental and predicted values for Caco2 permeability, while the graph on the right for water solubility.

An external validation data set available for volume of distribution at steady state (VDss) presented a correlation of R2 = 0.637 (R2 = 0.706, after 10% outlier removal), performance compatible with the cross-validation results obtained, depicted in the left graph of Figure S1 of Supporting Information (R2 = 0.66). Regression analysis for absorption predictors considering cross-validation schemes. Pearson’s correlation coefficients and standard error are also shown at the top-left corner. The left graph shows the correlation between experimental and predicted values for Caco2 permeability, while the graph on the right for water solubility.

Discussion and Conclusions

In summary, we have described here a novel approach to predicting pharmacokinetic and toxicology outcomes using graph-based signatures to represent small-molecule chemistry and topology. Using these signatures we have developed and implemented 14 quantitative regression models with actual numeric outputs and 16 predictive classification models with categorical outputs for predicting a wide arrange of ADMET properties for novel diverse molecules. We show pkCSM achieved a performance as good as or better than similar methods currently available, presenting a significant improve in performance for 11 data sets (water solubility, Caco2 permeability, rat, Tetrahymena pyriformis, and minnow toxicity, P-glycoprotein inhibitors, and CYP450 1A2, C19, and 2C9 inhibitors). While chemical modifications and drug carriers can improve a compounds ADMET properties,[61−64] pkCSM provides a rapid and easy method to for early evaluation of compounds. In the Supporting Information, we apply these predictive models to understanding the pharmacokinetic and toxicity properties of diverse, challenging chemical sets, including macrocycles and antineoplastic drugs. Another interesting aspect of pkCSM is its scalability, translated into an ability to handle large data sets, an important requirement for its application as a filter in screening initiatives. Over 10000 molecules compose the rat toxicity data set (prediction correlation depicted in the right graph of Figure S2 of Supporting Information) and up to 18000 compounds for the metabolism classifiers. We have implemented a user-friendly web server that will enable researchers to freely predict ADMET properties for their molecules of interest, including in large batch formats. Considering the sensitive nature of many medicinal chemistry projects, the web server does not retain any information submitted to it. This will hopefully facilitate the drug development process by enabling the rapid design, evaluation, and prioritization of compounds.

Experimental Section

Available in the Supporting Information.

61 in total

Review 1. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.

Authors: C A Lipinski; F Lombardo; B W Dominy; P J Feeney
Journal: Adv Drug Deliv Rev Date: 2001-03-01 Impact factor: 15.470

2. Screening for human ADME/Tox drug properties in drug discovery.

Authors: A P. Li
Journal: Drug Discov Today Date: 2001-04-01 Impact factor: 7.851

3. ADME/PK as part of a rational approach to drug discovery.

Authors:
Journal: Drug Discov Today Date: 2000-09 Impact factor: 7.851

Review 4. Early ADME in support of drug discovery: the role of metabolic stability studies.

Authors: T N Thompson
Journal: Curr Drug Metab Date: 2000-11 Impact factor: 3.731

Review 5. ADMET in silico modelling: towards prediction paradise?

Authors: Han van de Waterbeemd; Eric Gifford
Journal: Nat Rev Drug Discov Date: 2003-03 Impact factor: 84.694

Review 6. The role of absorption, distribution, metabolism, excretion and toxicity in drug discovery.

Authors: Jing Lin; Diana C Sahakian; Sonia M F de Morais; Jinghai J Xu; Robert J Polzer; Steven M Winter
Journal: Curr Top Med Chem Date: 2003 Impact factor: 3.295

7. ADME-AP: a database of ADME associated proteins.

Authors: L Z Sun; Z L Ji; X Chen; J F Wang; Y Z Chen
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

Review 8. Can the pharmaceutical industry reduce attrition rates?

Authors: Ismail Kola; John Landis
Journal: Nat Rev Drug Discov Date: 2004-08 Impact factor: 84.694

9. Derivation and validation of toxicophores for mutagenicity prediction.

Authors: Jeroen Kazius; Ross McGuire; Roberta Bursi
Journal: J Med Chem Date: 2005-01-13 Impact factor: 7.446

10. Quantitative structure-pharmacokinetic relationships for drug clearance by using statistical learning methods.

Authors: C W Yap; Z R Li; Y Z Chen
Journal: J Mol Graph Model Date: 2005-11-14 Impact factor: 2.518

369 in total

1. Neuroprotective Effect of 3-[(4-Chlorophenyl)selanyl]-1-methyl-1H-indole on Hydrogen Peroxide-Induced Oxidative Stress in SH-SY5Y Cells.

Authors: Angela Maria Casaril; Natália Segatto; Lucas Simões; Júlia Paschoal; Micaela Domingues; Beatriz Vieira; Fernanda S S Sousa; Eder João Lenardão; Fabiana K Seixas; Tiago Collares; Lucielli Savegnago
Journal: Neurochem Res Date: 2021-02-06 Impact factor: 3.996

2. Structure activity relationship of 2-arylalkynyl-adenine derivatives as human A₃ adenosine receptor antagonists.

Authors: Jinha Yu; Philip Mannes; Young-Hwan Jung; Antonella Ciancetta; Amelia Bitant; David I Lieberman; Sami Khaznadar; John A Auchampach; Zhan-Guo Gao; Kenneth A Jacobson
Journal: Medchemcomm Date: 2018-10-18 Impact factor: 3.597

Review 3. Anti-leishmanial and anti-trypanosomal natural products from endophytes.

Authors: Jean Baptiste Hzounda Fokou; Darline Dize; Gisele Marguerite Etame Loe; Moise Henri Julien Nko'o; Jean Pierre Ngene; Charles Christian Ngoule; Fabrice Fekam Boyom
Journal: Parasitol Res Date: 2021-01-06 Impact factor: 2.289

4. mCSM-PPI2: predicting the effects of mutations on protein-protein interactions.

Authors: Carlos H M Rodrigues; Yoochan Myung; Douglas E V Pires; David B Ascher
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

5. In Vitro and In Vivo Studies of the Trypanocidal Effect of Novel Quinolines.

Authors: A S G Nefertiti; M M Batista; P B Da Silva; D G J Batista; C F Da Silva; R B Peres; E C Torres-Santos; E F Cunha-Junior; E Holt; D W Boykin; R Brun; T Wenzler; M N C Soeiro
Journal: Antimicrob Agents Chemother Date: 2018-01-25 Impact factor: 5.191

6. Vernonia Amygdalina Del. stimulated glucose uptake in brain tissues enhances antioxidative activities; and modulates functional chemistry and dysregulated metabolic pathways.

Authors: Ochuko L Erukainure; Olajumoke A Oyebode; Collins U Ibeji; Neil A Koorbanally; Md Shahidul Islam
Journal: Metab Brain Dis Date: 2019-01-03 Impact factor: 3.584

7. Structure-activity relationship studies on 2,5,6-trisubstituted benzimidazoles targeting Mtb-FtsZ as antitubercular agents.

Authors: Krupanandan Haranahalli; Simon Tong; Saerom Kim; Monaf Awwa; Lei Chen; Susan E Knudson; Richard A Slayden; Eric Singleton; Riccardo Russo; Nancy Connell; Iwao Ojima
Journal: RSC Med Chem Date: 2020-10-16

8. Activation of adenosine A_2A or A_2B receptors causes hypothermia in mice.

Authors: Jesse Lea Carlin; Shalini Jain; Romain Duroux; R Rama Suresh; Cuiying Xiao; John A Auchampach; Kenneth A Jacobson; Oksana Gavrilova; Marc L Reitman
Journal: Neuropharmacology Date: 2018-03-13 Impact factor: 5.250

9. Unravelling the anticancer potency of 1,2,4-triazole-N-arylamide hybrids through inhibition of STAT3: synthesis and in silico mechanistic studies.

Authors: Abdallah Turky; Ashraf H Bayoumi; Farag F Sherbiny; Khaled El-Adl; Hamada S Abulkhair
Journal: Mol Divers Date: 2020-08-23 Impact factor: 2.943

Review 10. Discovery and development of natural product oridonin-inspired anticancer agents.

Authors: Ye Ding; Chunyong Ding; Na Ye; Zhiqing Liu; Eric A Wold; Haiying Chen; Christopher Wild; Qiang Shen; Jia Zhou
Journal: Eur J Med Chem Date: 2016-06-13 Impact factor: 6.514