Literature DB >> 35993595

Predicting Total Drug Clearance and Volumes of Distribution Using the Machine Learning-Mediated Multimodal Method through the Imputation of Various Nonclinical Data.

Hiroaki Iwata¹, Tatsuru Matsuo², Hideaki Mamada³, Takahisa Motomura⁴, Mayumi Matsushita², Takeshi Fujiwara¹, Kazuya Maeda⁵, Koichi Handa⁶.

Abstract

Pharmacokinetic research plays an important role in the development of new drugs. Accurate predictions of human pharmacokinetic parameters are essential for the success of clinical trials. Clearance (CL) and volume of distribution (Vd) are important factors for evaluating pharmacokinetic properties, and many previous studies have attempted to use computational methods to extrapolate these values from nonclinical laboratory animal models to human subjects. However, it is difficult to obtain sufficient, comprehensive experimental data from these animal models, and many studies are missing critical values. This means that studies using nonclinical data as explanatory variables can only apply a small number of compounds to their model training. In this study, we perform missing-value imputation and feature selection on nonclinical data to increase the number of training compounds and nonclinical datasets available for these kinds of studies. We could obtain novel models for total body clearance (CLtot) and steady-state Vd (Vdss) (CLtot: geometric mean fold error [GMFE], 1.92; percentage within 2-fold error, 66.5%; Vdss: GMFE, 1.64; percentage within 2-fold error, 71.1%). These accuracies were comparable to the conventional animal scale-up models. Then, this method differs from animal scale-up methods because it does not require animal experiments, which continue to become more strictly regulated as time passes.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 35993595 PMCID： PMC9472274 DOI： 10.1021/acs.jcim.2c00318

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 6.162

Introduction

Pharmacokinetic evaluations play an important role in the development of new drugs throughout the entire process.[1] Clinical trials are particularly important in drug development, and improving the success rate of these requires the estimation of effective clinical dosages that produce the best drug effect profile. Therefore, it is necessary to accurately predict human pharmacokinetic parameters from nonclinical experimental data before transitioning to human clinical trials.[2] In general, the parameters that have a large effect on the blood concentration profile of a drug during intravenous administration are the volume of distribution (Vd), which quantifies the distribution of the drug inside the human body, and total body clearance (CLtot), which shows the drug processing capacity within the body as a whole. Vd is determined by the physical properties of the drug, such as protein binding and membrane permeability, and predictions from machine learning models using chemical structures (CS) have been relatively accurate.[3] When nonclinical animal experimental values are available, the difference between the predicted and experimental values is maintained within approximately a 2-fold error using animal scale-up methods.[4] However, since this kind of highly accurate prediction method uses data from large animals such as dogs and monkeys, it is difficult to use this approach because of their high cost and ethical implications of large animal models.[5] Predicting CLtot is much more difficult than Vd because there are multiple drug CL pathways, including metabolism mainly by the liver and gastrointestinal tract, bile excretion of the unmetabolized drug, and its excretion in the urine. In one method, the intrinsic CL obtained in in vitro studies using human hepatocytes and microsomes was scaled up to determine hepatic CL. However, in many cases, the data cannot be accurately scaled because of issues around differences in the experimental systems and variations in lots between human specimens. Furthermore, there are currently no suitable in vitro experimental systems for other organs.[6] The empirical CLtot predictions method using the animal weight power law showed accurate at an average of approximately 2-fold error. However, verification has not been performed on external datasets.[7,8] For predicting CLtot, several studies have already investigated using machine learning.[9] Then, some reports used related experimental values to CLtot as explanatory variables.[10] We proposed using a machine learning method based on multimodal learning that takes the CS and nonclinical data for predicting human CLtot.[11] The main point of this method to note is that the human CLtot prediction accuracy is increased using both CS data and animal experimental data, suggesting that it may be possible to further improve human CLtot prediction accuracy using not only rat CLtot but also the CLtot values from various animals (e.g., dogs and monkeys) and in vitro experimental values such as the protein binding ratio for each animal species as explanatory variables. However, these experimental values are often missing from the compound datasets. Missing-value imputation is a well-known method for resolving this issue. Methods that use machine learning, such as kNNimputation,[12] multivariate imputation by chained equations,[13] and missForest,[14] are established imputation methods known for their ability to provide improved accuracy in these types of applications. The prediction of drug repositioning with high accuracy has been made possible by the addition of missing-value imputation based on the similarity of compound structures.[15] Missing data related to activity values for different compound targets were predicted using the Random Forest method, and a QSAR model was constructed, which uses the data from these imputed missing values as explanatory variables.[16] In this study, first, we constructed machine learning models for predicting the missing nonclinical data using chemical compound descriptors. Next, we predicted the missing nonclinical data and then we constructed our machine learning model for predicting human CLtot and Vdss, which used the nonclinical data with imputed missing values and CS data as explanatory variables. XGBoost and Deep Tensor,[17] a deep learning method that can learn graph data, were used as the bases for these machine learning models. As a result, the prediction accuracy of this method is comparable with the many animal scale-up methods. Different from the conventional methods, since these models do not need new experimental data, it seems to be appropriate for predicting the human parameters in not only the clinical stage but also the early drug discovery stage.

Materials and Methods

Workflow

The workflow used in this study consisted of the following three steps: (i) gathering of the chemical compound and nonclinical data; (ii) imputing the missing values in the nonclinical data using ADMEWORKS (https://www.fujitsu.com/jp/solutions/business-technology/tc/sol/admeworks/index.html); and (iii) selecting features by XGBoost or Random Forest and constructing the prediction model (Figure ).

Figure 1

Workflow of our novel human CLtot and Vdss prediction method. (A) CLtot analysis flow. (i) There were 741 compounds with human CLtot data and 46 that had values for all 11 features. (ii) All feature values were estimated via prediction using ADMEWORKS. (iii) Feature extraction was performed using XGBoost or Random Forest, and a prediction model was constructed. (B) Vdss analysis flow. (i) There were 751 compounds with human Vdss data and 46 that had values for all 11 features. (ii) All feature values were estimated via prediction using ADMEWORKS. (iii) Feature extraction was performed using XGBoost or Random Forest, and a prediction model was constructed.

Gathering Chemical Compounds, Nonclinical Data, and Data Preprocessing

We obtained 741 sets of human CLtot data with CS data, and 751 sets of human Vdss data with CS data from JCP2013[4,7] and ChEMBL23.[18] We also obtained various sets of animal experimental data (CLtot, Vdss, and fraction-unbound data for rats, dogs, and monkeys) and human fu data for each of these compounds.[4,7] In addition, we collected the pKa acid, pKa base, solubility, and caco-2 permeability data including the calculated values for each compound from PubChem[19] and DrugBank.[20] Caco-2 permeability is a positive/negative binary value, and the values denoted as predicted values were also collected (Table ). This left us with 46 CLtot and 45 Vdss compounds that recorded all 11 data items for CLtot: rat CLtot, dog CLtot, monkey CLtot, human fu, rat fu, dog fu, monkey fu, pKa acid, pKa base, solubility, and caco-2 permeability; and Vdss: rat Vdss, dog Vdss, monkey Vdss, human fu, rat fu, dog fu, monkey fu, pKa acid, pKa base, solubility, and caco-2 permeability (hereinafter defined as “nonclinical data”). These sets of compounds were labeled the “evaluation dataset” and the sets of compounds not in the “evaluation dataset” were labeled the “training dataset” for human CLtot or Vdss (dataset.xlsx). In addition, the set of compounds formed by removing the “evaluation dataset” from the set of compounds that had rat data were labeled “training dataset (rat)” for the human CLtot and Vdss data. Note that although the excretion pathways have not been identified in most of the compounds used in this study, when the data from Lombardo et al., who gathered drug data,[7] and Varma et al.,[21] who investigated human kidney excretion, were compared, kidney excretion of 50% or less of the CLtot was found in 157 of the 231 remaining compounds. Furthermore, although data were gathered during intravenous administration of eptaloprost, there are errors in the cited references, and it has been shown that the human data are for oral administration.

Table 1

Details of the Compound Data

feature	number of compounds	source
human CL_tot	741	JCP2013, ChEMBL23
rat CL_tot	387	JCP2013, ChEMBL23
dog CL_tot	284	JCP2013, ChEMBL23
monkey CL_tot	129	JCP2013, ChEMBL23
human Vd_ss	751	JCP2013, ChEMBL23
rat Vd_ss	351	JCP2013, ChEMBL23
dog Vd_ss	274	JCP2013, ChEMBL23
monkey Vd_ss	125	JCP2013, ChEMBL23
human fu	577	JCP2013, ChEMBL23
rat fu	237	JCP2013, ChEMBL23
dog fu	179	JCP2013, ChEMBL23
monkey fu	88	JCP2013, ChEMBL23
pK_a acid	334	Pubchem, DrugBank
pK_a base	335	Pubchem, DrugBank
solubility	339	Pubchem, DrugBank
caco-2 permeability	307	Pubchem, DrugBank

The gathered data were then preprocessed as follows before applying each of the machine learning calculations described in the next section. First, the CLtot, Vdss, and solubility data underwent a logarithmic transformation and then all of the data except the caco-2 permeability values were standardized. It is worth noting that the data used in the animal scale-up method evaluations were the raw data without data preprocessing.

Missing-Value Imputation Using ADMEWORKS

Some of the 11 features of the nonclinical data were missing for some of the compounds in the “training dataset” (Table ). Therefore, we created machine learning models using existing data for each item and predicted the unknown nonclinical values for these features using these models. ADMEWORKS was used to complete the descriptor calculations, descriptor extractions, prediction model construction, and prediction. First, compound descriptors with a total of 1465 dimensions were calculated from descriptors with 555 dimensions identified from their atom/bond-related parameters, topology-related parameters, and physiochemical parameters, and descriptors with 910 dimensions were obtained by counting a partial structure search of CS. Next, descriptors containing missing values, calculation errors, and descriptors with a correlation coefficient of 0.9 or higher (the default setting in ADMEWORKS) were excluded. In addition, high-level feature extraction (particle swarm optimization)[22] was performed, and a model was constructed using the remaining descriptors. The machine learning model that maximized the percentage within a 2-fold error in 5-fold cross-validation (maximized the two-class accuracy of caco-2 permeability only) was then adopted for downstream analyses. The nonclinical data for each compound were then predicted using each prediction model and the machine learning methods used for each item and the learning model accuracies are listed in Table S1.

Feature Selection and Prediction Model Construction

Feature Selection

Given the possibility that nonclinical data may not be that useful for prediction and the possibility of inappropriate missing-value imputation having an adverse effect on prediction, specific items among the 11 nonclinical data categories were selected as explanatory variables. This selection was performed based on the importance of the explanatory variable as determined during the construction of the prediction model using the XGBoost or Random Forest method.[23] First, the prediction models for human CLtot and Vdss were constructed with “training dataset” by the XGBoost or Random Forest method using all 11 items from the nonclinical data as explanatory variables. This allowed us to evaluate the importance of each of the 11 variables within the model. Next, the top k nonclinical data of importance were selected as explanatory variables. The best k was determined by the search using the initial k value where the total of the top k importance exceeded 0.5 with each k evaluated in single-value increments. For each k, 5-fold cross-validation using the “training dataset” was performed using the multimodal model described below, whose explanatory variables included the CS and the top k nonclinical data of importance. This was evaluated using the geometric mean fold error (GMFE) and percentage within a 2-fold error, which are described below. When either the GMFE or percentage within a 2-fold error became worse than the previous value of k (i.e., k – 1), the search was complete. Finally, the k that gave the best GMFE and percentage with a 2-fold error within the search range was considered the best k, and the nonclinical data at the best k were selected for further evaluation.

Deep Tensor Model

To evaluate the effectiveness of the missing-value imputation, a prediction model that uses only the CS data as explanatory variables was constructed using Deep Tensor for comparison. The “training dataset” was used to construct this prediction model and the other conditions were set to the same values as those used in the multimodal Deep Tensor model described below.

Multimodal Deep Tensor Model

Prediction models for human CLtot and Vdss were constructed using the CS and nonclinical data as explanatory variables using the previously described method[11] for Deep Tensor,[17] a deep learning technology for structured graph data (Figure ). For CLtot and Vdss, four combinations of the nonclinical data were used as explanatory variables with the training dataset used to construct the prediction model: (1) rat data only + “training dataset (rat)” (as CS + rat CLtot or Vdss in Table S2); (2) rat data only + “training dataset” (as CS + rat CLtot or Vdss imputed in Table S2); (3) all 11 nonclinical data points + “training dataset” (as CS + 11 features in Table S3); and (4) nonclinical data selected by the feature selection described above + “training dataset” (as CS + selected features in Table S3). The core tensor size was set at 50 × 50 and the neural network structure consisted of two intermediate layers, 1000 neurons in each layer, and one neuron in the output layer. The ReLU function[24] was used as the activation function, and batch normalization[25] with a decay rate of moving average = 0.9, epsilon value = 2 × 10–5, and dropout[26] at a rate of 0.5 was applied to produce the intermediate layers. Then, during training, the number of epochs was set to 50, and the minibatch size was set to 100.

Figure 2

Overview of the multimodal Deep Tensor model.

XGBoost Model

To evaluate the effectiveness of the missing-value imputation, we performed XGBoost calculation as a traditional machine learning model. XGBoost was implemented using scikit-learn of Python language.[27] Prediction models that used only the CS data and the CS data and nonclinical data as explanatory variables were constructed. We transformed the CS data into the extended connectivity fingerprint with bond diameter four (ECFP4). The ECFP4 compound descriptor was calculated using RDKit with parameters of radius 2 and 2048 dimensions.

Animal Scale-Up and Conventional Machine Learning Methods

The single-species allometric scaling (SSS) method for CLtot, which uses the CLtot values of any single model, rats, dogs, or monkeys, the simple allometry (SA) method, which uses all three species, and the fraction-unbound corrected intercept method (FCIM)[28] are often implemented as the conventional method of human CLtot prediction. For the construction of each model of SSS, the compounds that have the value needed for each model are used from “training dataset.” Then, each model is evaluated by the “evaluation dataset”. For SA and FCIM that have no training process by the training dataset, the parameters are tuned by the data of each compound to be predicted. The number of compounds in the training process for parameter selection for each method and the equations for each method are shown in Table S4. The SSS and SA method for Vdss uses the Vdss values similar to the CLtot prediction models described above. Then, the Øie–Tozer[29] method was also used as the conventional method of human Vdss prediction. The process of construction and evaluation of each model of SA is similar to CLtot. For the SA and Øie–Tozer method that have no training process by the training dataset, the parameters are tuned by the data of each compound to be predicted. The number of compounds in the training dataset for each parameter and each method and the equations for each method are shown in Table S5.

Performance Evaluation

GMFE and percentage within a 2-fold error were used as indicators for evaluating the prediction accuracy of the method. When GMFE = X, the mean error between the measured and predicted values can be interpreted as an X-fold error. GMFE is expressed by the following equationwhere GMFE values closer to 1 indicate improved accuracy. Furthermore, a percentage within a 2-fold error indicates the proportion of data that are within a 2-fold error (1/2 × correct value ≤ predicted value ≤ 2 × correct value). Values of percentage within a 2-fold error closer to 100% indicate better accuracy. When the evaluation results did not match for both indicators, GMFE was used as the primary predictor of accuracy as this is the more comprehensive indicator value.

Results

Evaluation of the Usefulness of Missing-Value Imputation

To clarify the effectiveness of missing-value imputation, the accuracy was evaluated using rat data, which had the fewest missing variables. More specifically, three models were created: a model trained using only CS data, a multimodal model using CS and rat CLtot or rat Vdss (CS + rat CLtot or rat Vdss) data, and a multimodal model using CS and rat CLtot or rat Vdss imputed data using predicted values (CS + rat CLtot imputed or rat Vdss imputed), and the effectiveness of the missing-value imputation was evaluated (Table S2). Evaluation was performed using the evaluation dataset that included established values for human, rat, dog, and monkey CLtot or Vdss for 45 compounds for the CL prediction or 46 compounds for the Vdss data predictions. Note that variation was inhibited in these evaluations due to the limited number of compounds in the evaluation dataset; construction of the models and evaluation using the evaluation dataset were completed five times, and the mean values were used for the evaluation. Table shows the results of the missing-value imputation for CLtot prediction. First, the accuracies were compared for the model trained using only CS and the multimodal model using CS and rat CLtot (CS + rat CLtot). The training data for the model using only CS consisted of 695 compounds excluding those present in the evaluation dataset (only CS), the GMFE was XGBoost: 2.53 and Deep Tensor: 2.44, and the percentage within the 2-fold error was XGBoost: 45.7% and Deep Tensor: 45.7%. For the multimodal model including CS and rat CLtot, taking the 343 compounds that had a rat CLtot value from the 695 compounds as training data (CS + rat CLtot), the GMFE was XGBoost: 2.15 and Deep Tensor: 2.15, and the percentage within 2-fold error was XGBoost: 52.2% and Deep Tensor: 54.8%. This finding confirmed that the accuracy was improved by introducing rat CLtot values to the CLtot prediction. This result is consistent with our previous report.[11] Next, although accuracy is generally increased by increasing the amount of training data, there were fewer compounds for which nonclinical data, such as rat CLtot values, were measured. Therefore, we performed prediction using ADMEWORKS for the compounds with no rat CLtot value data (see the Materials and Methods for more details). This meant that we could then train the multimodal model using the CS and rat CLtot imputed data with 695 compounds (CS + rat CLtot imputed). This model produced a GMFE of XGBoost: 2.09 and Deep Tensor: 2.09, and the percentage within 2-fold error was XGBoost: 54.3% and Deep Tensor: 54.3%, making it the most accurate model. These results suggest that model accuracy can be improved by increasing the data used during training, even if this is imputed data.

Table 2

Results of the Accuracy Evaluations for Imputations of Rat CLtot Data

	method	GMFE	% of 2-fold error
CL_tot	695 compounds	2.53	45.7
	XGBoost: only CS	2.53	45.7
	343 compounds	2.15	52.2
	XGBoost: CS + rat CL_tot	2.15	52.2
	695 compounds	2.09	54.3
	XGBoost: CS + rat CL_tot imputed	2.09	54.3
	695 compounds	2.44	45.7
	Deep Tensor: only CS	2.44	45.7
	343 compounds	2.15	54.8
	Deep Tensor: CS + rat CL_tot	2.15	54.8
	695 compounds	2.09	54.3
	Deep Tensor: CS + rat CL_tot imputed	2.09	54.3
Vd_ss	706 compounds	1.66	82.2
	XGBoost: only CS	1.66	82.2
	306 compounds	1.72	75.6
	XGBoost: CS + rat Vd_ss	1.72	75.6
	706 compounds	1.73	68.9
	XGBoost: CS + rat Vd_ss imputed	1.73	68.9
	706 compounds	1.85	62.2
	Deep Tensor: only CS	1.85	62.2
	306 compounds	1.89	56.9
	Deep Tensor: CS + rat Vd_ss	1.89	56.9
	706 compounds	1.75	64.4
	Deep Tensor: CS + rat Vd_ss imputed	1.75	64.4

Next, Table also shows the results of the missing-value imputation for Vdss. When a comparison was performed using training data for 706 compounds, excluding the evaluation dataset, the model trained using only CS (only CS) and the multimodal model using CS and rat Vdss (CS + rat Vdss, using 306 compounds) had GMFEs of XGBoost: 1.66 and 1.72, Deep Tenor: 1.85 and 1.89, and percentages within 2-fold errors of XGBoost: 82.2% and 75.6% and Deep Tensor: 62.2% and 56.9%, respectively. This means that the model using only CS (only CS) was more accurate than the multimodal model using CS and actual nonclinical data (CS + rat Vdss). It is known that the structure–activity relationship is stronger for Vdss than CLtot for prediction from structure information,[3] and this suggests that the increase in structure alone in the training data increases the accuracy of the prediction model. Furthermore, when we evaluated the model produced using the training data of 706 compounds with imputed nonclinical data where there was no rat Vdss value, the multimodal model using CS and rat Vdss imputed using predicted values (CS + rat Vdss imputed) had a GMFE of XGBoost: 1.73 and Deep Tensor: 1.75, and a percentage within 2-fold error of XGBoost: 68.9% and Deep Tensor: 64.4%, making it the most accurate model.

Improving Accuracy Using Feature Selection

Improved accuracy was achieved using multimodal machine learning models and imputed values for the missing rat CLtot/Vdss data. We then went on to evaluate the addition of multimodal machine learning models designed to select specific features from the 11 items of nonclinical data in each dataset (Table S3). We first created a set of learning models for each of the 11 items in the nonclinical data using ADMEWORKS, and missing-value imputation was performed by prediction. Details of the prediction model and model accuracy for each item of the nonclinical data are shown in Table S1. Multiple models were constructed for each item, and missing-value imputation was performed by predicting the unknown clinical data using the model with the highest accuracy. We went on to complete feature selection using these 11 items with their imputed data for any missing values. The importance of each item in the nonclinical data is shown in Figures S1–S4. We then used their importance to select the four items (rat CLtot, dog CLtot, human fu, and pKa acid) critical for CLtot prediction (Figure S1 and Table S6) and the five items (rat Vdss, dog Vdss, pKa acid, pKa base, and human fu) identified for Vdss prediction (Figure S2 and Table S7) from the nonclinical data using the XGBoost algorithm. We then used their importance to select the four items (rat CLtot, dog CLtot, human fu, and pKa acid) critical for CLtot prediction (Figure S3 and Table S8) and the six items (dog Vdss, rat Vdss, pKa acid, pKa base, solubility, and human fu) identified for Vdss prediction (Figure S4 and Table S9) from the nonclinical data using the Random Forest algorithm. Finally, multimodal machine learning models were constructed using the selected nonclinical and CS data. The accuracies of this model, five different conventional methods, the model using only CS, and the multimodal model using CS and all 11 items of nonclinical data were evaluated using the “evaluation dataset.” We then repeated the dataset evaluations five times for both the multimodal machine learning model and the model using CS only, and the mean value was used for evaluation, similar to the evaluation of the usefulness of missing-value imputation. Table shows the results of the CLtot prediction using eight different models. Among the five types of conventional models, models that use monkey CLtot data, such as SSS monkey (GMFE: 1.93, percentage within 2-fold error: 58.7%) and FCIM (GMFE: 1.99, percentage within 2-fold error: 52.2%) were the most accurate. Among the multimodal models using nonclinical data with missing-value imputation and CS as proposed in this research, the model using all 11 items (CS + 11 features) presented with a GMFE of XGBoost: 2.06 and Deep Tensor: 2.11 and a percentage within 2-fold error of XGBoost: 58.7% and Deep Tensor: 52.2% that was equivalent to the conventional methods, while the model using feature selection (CS + selected features) was shown to be the most accurate with a GMFE of XGBoost: 1.98 and Deep Tensor: 1.92 and a percentage within a 2-fold error of XGBoost: 50.0% and Deep Tensor: 66.5%. These results indicate that predictive models can be improved by increasing the number of compounds used for training and that these can be enhanced by first imputing any missing data using predictive values. In addition, these results suggest that a better model can be constructed by performing feature selection and training using only the important features from the nonclinical data.

Table 3

Results of Accuracy Evaluations

	methoda	GMFEb	% of 2-fold error
CL_tot	SSS rat	2.36	43.5
	SSS dog	2.30	39.1
	SSS monkey	1.93	58.7
	SA	2.33	45.7
	FCIM	1.99	52.2
	XGBoost: only CS	2.40	50.0
	XGBoost: CS + 11 features	2.06	58.7
	XGBoost: CS + selected features	1.98	50.0
	Deep Tensor: only CS	2.44	45.7
	Deep Tensor: CS + 11 features	2.11	52.2
	Deep Tensor: CS + selected features	1.92	66.5
Vd_ss	SSS rat	1.91	62.2
	SSS dog	1.93	71.1
	SSS monkey	1.60	80.0
	SA	2.07	68.9
	Øie–Tozer	1.46	84.4
	XGBoost: only CS	1.70	77.8
	XGBoost: CS + 11 features	1.64	71.1
	XGBoost: CS + selected features	1.66	71.1
	Deep Tensor: only CS	1.85	62.2
	Deep Tensor: CS + 11 features	1.75	69.8
	Deep Tensor: CS + selected features	1.74	74.2

SSS: single-species allometric scaling; SA: simple allometry; FCIM: fu-corrected intercept method; CS: chemical structure.

GMFE: geometric mean fold error.

SSS: single-species allometric scaling; SA: simple allometry; FCIM: fu-corrected intercept method; CS: chemical structure. GMFE: geometric mean fold error. Table also summarizes the results from the Vdss predictions. First, the Øie–Tozer method (GMFE: 1.46, percentage within 2-fold error: 84.4%) was shown to be the most accurate of the conventional methods evaluated and was closely followed by SSS monkey (GMFE: 1.60, percentage within 2-fold error: 80.0%). However, there were no significant differences among the multimodal models using nonclinical data with missing-value imputation and CS data, with CS + 11 features producing a GMFE of XGBoost: 1.64 and Deep Tensor: 1.75 and a percentage within 2-fold error of XGBoost: 71.1% and Deep Tensor: 69.8% and the feature selection model (CS + selected features) producing a GMFE of XGBoost: 1.66 and Deep Tensor: 1.74 and a percentage within 2-fold error of XGBoost: 71.1% and Deep Tensor: 74.2%, despite a small improvement in the overall accuracy as described by the percentage within 2-fold error value. Unlike in CLtot prediction, Vdss prediction was still more accurate using the conventional Øie–Tozer method, which is based on animal scale-up data.

Discussion

Missing-Value Imputation (NA Imputation)

This research demonstrates that high-accuracy predictions of CLtot and Vdss can be achieved via data extension facilitated by missing-value imputation. Since this method does not require new experimental values, it can be used from the initial stages of drug development. Although the effectiveness of imputation using this method has been confirmed for rat data, it had not been evaluated for other nonclinical data. Therefore, nonclinical data probably exists where imputation is not effective, but rather reduces prediction accuracy. Furthermore, nonclinical data may also exist where imputation is appropriate but not effective for prediction. With these possibilities in mind, we were careful to select the explanatory variables in this study. Given this, it was necessary to evaluate all combinations of the 12 candidate explanatory variables, consisting of the CS data and 11 items of nonclinical data, to select the explanatory variables that are truly optimal for developing a multimodal Deep Tensor model. However, since this would take a huge amount of time, this study used a simplified version of these evaluations to select the explanatory variables. More specifically, the CS data were taken as always selected, and the required items from among the 11 nonclinical data items were differentially applied. When selecting the nonclinical data features, we determined that the number of explanatory variables that should be selected were evaluated based on the importance of the explanatory variables (i.e., nonclinical data), as obtained from the XGBoost or Random Forest evaluations (Figures S1–S4 and Tables S6–S9). This allowed us to reduce the number of combinations of explanatory variables that needed to be evaluated. Note that the importance of the nonclinical data obtained from the XGBoost or Random Forest method assumes that a prediction model is constructed using the XGBoost or Random Forest method. As a result, it is possible that these do not match the importance of the multimodal Deep Tensor model. Furthermore, in the XGBoost or Random Forest method, the importance of each of the explanatory variables was calculated using only the nonclinical data. As a result, when these are used together with the CS data in the multimodal Deep Tensor model, it is possible that their importance may change. Therefore, we cannot definitively say whether we selected the best explanatory variables.

Selected Explanatory Variables

The explanatory variables selected for CLtot prediction (CS + selected features), which had the highest prediction accuracy were CLtot rat, CLtot dog, human fu, and pKa acid in both XGBoost and Deep Tensor (Figures S1 and S3 and Tables S6 and S8). We believe that the selection of CLtot for multiple species may help to accurately reflect the inherent metabolic differences between species.[30] However, the fact that monkey CLtot was not selected could possibly be due to problems with imputation accuracy since the number of datasets used for missing-value imputation for the monkey values was significantly smaller than that of the other species. In addition, we believe that these selections are valid as human fu was selected and this is a common consideration for human CLtot prediction.[7] The addition of the pKa acid variable may help to accurately reflect the allocation of the metabolism/excretion pathways depending on the compound’s physical properties.[31] The explanatory variables selected for the most accurate prediction of Vdss (CS + selected features) included rat Vdss, dog Vdss, human fu, pKa acid, and pKa base in XGBoost (Figure S2 and Table S7) and rat Vdss, dog Vdss, human fu, pKa acid, pKa base, and solubility in Deep Tensor (Figure S4 and Table S9). The Vd of multiple species is likely to reflect the inherent differences in the Vd pathways between species[32] and the validity of these selections was similarly supported by the addition of the human fu parameters.[4] However, significantly more physical property parameters (e.g., pKa acid, pKa base, solubility) were included in the Vdss evaluations; this is likely designed to reflect the fact that the Vd of various compounds is determined by interactions between the compound and the constituent components of the tissue (e.g., lipids, phospholipids, acidic glycoprotein).[33]

Comparison with Animal Scale-Up Methods

Conventional animal scale-up methods are known to be useful, offer good accuracy, and are often used for human pharmacokinetic parameter prediction during drug development.[4,7] Here, we showed that SSS monkey, which uses monkey data, had the next highest accuracy for CLtot prediction after applying a multimodal model with missing-value imputation + feature selection (Deep Tensor: CS + selected features). This was followed by the FCIM method, which uses data from rats, dogs, and monkeys and XGBoost: CS + selected features. Among the Vdss prediction methods, the Øie–Tozer method, which is calculated based on the plasma and tissue binding rates in rats, dogs, and monkeys, had the highest accuracy, followed by an animal scale-up method using monkey (SSS monkey) data and the proposed multimodal models with missing-value imputation and features (XGBoost: CS + 11 features and Deep Tensor: CS + selected features), which had slightly worse accuracy (Table ). Given this, the prediction accuracy of human CLtot and Vdss using monkey data can be said to be valid for humans given the close genetic relationship between these species.[34,35] However, when the experimental costs and ethical aspects of performing animal experiments are considered, since monkey experiments tend to be completed at later stages in the nonclinical development stage, employing these in the initial stages of drug development is difficult.[34,36] Therefore, the proposed multimodal model that includes missing-value imputation and feature selection using existing data can be applied in the initial stages of drug development and is expected to contribute substantially to efficient drug development.

Conclusions

This study constructed a set of high-accuracy CLtot and Vdss prediction models using missing-value imputation and feature selection for nonclinical data. Previous evaluations using nonclinical data as explanatory variables were shown to be less effective as the number of missing data points meant that the final number of evaluated compounds was too small for accurate machine learning. Therefore, we confirmed that the accuracy of these models is improved as a result of increasing the number of compounds used for training and increasing the number of (nonclinical data) explanatory variables that can be used by performing missing-value imputation on nonclinical data. This method differs from animal scale-up methods in that it does not require animal experiments, which have become more strictly regulated in recent years. Although we used XGBoost and Deep Tensor algorithms in this research, the other machine learning algorithms could be applied because this proposed method of imputation has no preference in machine learning algorithms. The increased accuracy of the CLtot and Vdss predictions produced by this method are expected to facilitate the evaluation and identification of candidate structures with improved pharmacokinetic properties at the earlier stages of drug discovery.

39 in total

Review 10. Non-clinical studies in the process of new drug development - Part II: Good laboratory practice, metabolism, pharmacokinetics, safety and dose translation to clinical studies.

Authors: E L Andrade; A F Bento; J Cavalli; S K Oliveira; R C Schwanke; J M Siqueira; C S Freitas; R Marcon; J B Calixto
Journal: Braz J Med Biol Res Date: 2016-12-12 Impact factor: 2.590