Literature DB >> 35425351

Predicting hERG channel blockers with directed message passing neural networks.

Mengyi Shan¹, Chen Jiang^1,2, Jing Chen^1,3, Lu-Ping Qin¹, Jiang-Jiang Qin⁴, Gang Cheng¹.

Abstract

Compounds with human ether-à-go-go related gene (hERG) blockade activity may cause severe cardiotoxicity. Assessing the hERG liability in the early stages of the drug discovery process is important, and the in silico methods for predicting hERG channel blockers are actively pursued. In the present study, the directed message passing neural network (D-MPNN) was applied to construct classification models for identifying hERG blockers based on diverse datasets. Several descriptors and fingerprints were tested along with the D-MPNN model. Among all these combinations, D-MPNN with the moe206 descriptors generated from MOE (D-MPNN + moe206) showed significantly improved performances. The AUC-ROC values of the D-MPNN + moe206 model reached 0.956 ± 0.005 under random split and 0.922 ± 0.015 under scaffold split on Cai's hERG dataset, respectively. Moreover, the comparisons between our models and several recently reported machine learning models were made based on various datasets. Our results indicated that the D-MPNN + moe206 model is among the best classification models. Overall, the excellent performance of the DMPNN + moe206 model achieved in this study highlights its potential application in the discovery of novel and effective hERG blockers. This journal is © The Royal Society of Chemistry.

Entities: Chemical

Year: 2022 PMID： 35425351 PMCID： PMC8979305 DOI： 10.1039/d1ra07956e

Source DB: PubMed Journal: RSC Adv ISSN： 2046-2069 Impact factor: 3.361

Introduction

The human ether-a-go-go-related gene (hERG) encodes a voltage-gated potassium channel, which plays a major role in cardiac action potential.[1] Blockage of this hERG potassium ion channel will cause a prolonged QT interval and lead to severe cardiotoxicity such as cardiac arrhythmia.[2] Several marketed drugs, including astemizole,[3] terfenadine,[4] and cisapride,[5] have been withdrawn due to their unintended hERG-related cardiotoxicity. Consequently, the assessment of hERG-blocking activity is essential in drug discovery projects. Various in vitro and in vivo assays, including fluorescent measurements,[6] radioligand binding assay,[7] patch-clamp electrophysiology,[8] and QTc assays,[9] have been developed to evaluate the inhibitory effects of small molecules on hERG channel. However, these assays are usually time-consuming and expensive. Thus, it is important to develop reliable in silico methods for the prediction of hERG inhibition in an early stage of the drug discovery and development process. Over the past decade, many in silico models for predicting hERG channel inhibition have been reported.[10] Most of these methods can be categorized as structure-based methods (including quantitative structure–activity relationships (QSAR) and pharmacophore searching) and machine learning methods (such as support vector machine (SVM) and random forest (RF)).[11-13] More recently, as the accumulation of data about hERG inhibition of small molecules or their binding affinity to hERG, several large-scale datasets consisting of thousands of hERG blockers and hERG non-blockers are now publicly available.[14-16] Utilizing these datasets, many machine learning (ML) algorithms have been utilized for predicting hERG blocking activity. For instance, Hou et al. have developed pharmacophore-based models with Naive Bayes (NB) and SVM, and their best SVM model achieved the prediction accuracy of 84.7% for the training set and 82.1% for the external test set.[17] Siramshetty et al. have employed three machine learning methods k-nearest neighbors (kNN), RF, and SVM with different molecular descriptors, and the area under the receiver operating characteristic curve (AUC-ROC) value of the best model reached 0.94.[18] Ogura et al. have established an SVM classification model with the descriptor selected by non-dominated Sorting Genetic Algorithm-II (NSGA-II), and the accuracy reached out to 0.984 for their test dataset.[19] With the development of the deep learning technique, Cai et al. have collected 7889 compounds with well-defined experimental data on hERG, built multi-task deep learning-based prediction models (deephERG) based on the DeepChem[20] open-source platform, and obtained accuracy of 0.93 and AUC-ROC of 0.967 for the best model.[21] In addition, multiple other deep learning-based models have been reported, including Capsule Networks and platforms such as deepHit,[22] hERG-Att,[23] hERG-ml,[24] and CardioTox net.[25] Despite the above substantial progress in hERG blockers prediction, the accuracy performance needs to be further improved in real drug discovery settings. Recently, Yang et al. have proposed a Directed Message Passing Neural Network (D-MPNN),[26] which showed excellent performances across 19 public datasets, including QM7, QM8, QM9, ESOL, FreeSolv, lipophilicity, BBBP, PDBbind-F, PCBA, BACE, Tox21, and ClinTox. Moreover, it is worth noting that Stokes et al. have successfully applied D-MPNN in identifying novel antibacterial compounds from the ZINC15 database, which strongly proved the superiority of D-MPNN.[27] In this study, we applied the D-MPNN model on hERG datasets for predicting the hERG blockers. We tested different combinations of D-MPNN with several fingerprints or descriptors (ECFP4,[28] ECFP6,[28] FCFP4,[28] MACCS,[29] PubchemFP,[30] RDkit 2D normalized,[31] Mol2vec,[32] MOE53,[33] and moe206 [34] (206 2D descriptors generated by MOE)), and found that some of these descriptors can significantly boost the performance of D-MPNN. The best combination is D-MPNN incorporated with moe206 descriptors (D-MPNN + moe206), which obtained AUC-ROC values up to 0.956 ± 0.005 and 0.922 ± 0.015 in the five-fold cross-validation on random split and scaffold split, respectively. Various comparisons were also made between our models and those developed by other machine learning methods on their datasets. All the results confirmed that the D-MPNN + moe206 model is one of the best classification models for predicting hERG blockers.

Experimental section

Dataset

Initially, Cai's hERG dataset[21] was first used to evaluate the D-MPNN model and search the premium descriptors combinations with the D-MPNN model. It contains a total of 7889 compounds and has 6 thresholds (10 μM, 20 μM, 40 μM, 60 μM, 80 μM, and 100 μM) for distinguishing hERG blockers from non-blockers. To compare the performance of our best model (D-MPNN + moe206) with other machine learning models, we used several other hERG datasets reported by Doddareddy,[11] Hou,[17] Ogura,[19] Siramshetty,[24] and Karim's.[25] Additionally, we conducted principal component analysis (PCA) on Cai's,[21] Siramshetty's,[24] and Hou's[17] datasets in Table S1, Fig. S1 and S2.†

D-MPNN model

Yang et al. have described the directed-message passing neural network (D-MPNN) in detail and built the open-source package Chemprop for implementation of D-MPNN (see more in https://github.com/chemprop/chemprop).[26] Similar to other message passing neural networks (MPNN), D-MPNN constructs a learned molecular representation by operating on the graph structure of the molecule. Different from the generic MPNN that uses an atom-dependent neural network to pass a message, D-MPNN uses the edge-dependent neural network to pass a message. In Chemprop, initially, each SMILES string was converted to their corresponding molecular graph (G) via the open-source package RDKit,[35] where the atoms in the chemical structures were described as nodes (v), and the bonds between atoms were considered as edges. Next, node (atom) features x and edge (bond) features evw were computed. Prior to the first step of message passing, edge hidden states h0vw were initialized with the following equation:where τ is the rectified linear activation function (ReLU), Wi is a learned matrix, and cat(xv, evw) represents the concatenation of the atom features x for atom v and the bond features evw for bond vw. Afterward, the D-MPNN functions on hidden states hvw and messages mvw to spread messages based on the graph structure. On each directed message passing step t, each bond's featurization is updated by summing the featurization of neighboring bonds, concatenating the current bond's featurization with the sum. The calculations are according to the following equations:where N(v) is the set of neighbors of v in graph G, t ∈ {1, …, T} and T is total steps of the message passing phase, and Wm ∈ R is a learned matrix. Then each atom representation of the molecule was calculated by aggregating the incoming bond features according to In the readout phase of D-MPNN, the feature vector h for the whole molecule was obtained by summing the hidden states of all atoms. Finally, the feature vector h is fed through a feed-forward neural network to generate property predictions.

D-MPNN with molecular characterization

As noted by Yang et al., the original D-MPNN model might not be so efficient at extracting global molecular features, especially for large molecules. To enhance the D-MPNN performance, various molecular features[26,27] could be incorporated into the D-MPNN model. Here, 9 types of molecular descriptors or fingerprints are used to represent the characteristics of the compounds, including mol2vec,[32] RDkit 2d normalized,[31] MACCS,[29] PubChemFP,[30] ECFP,[28] ECFP6,[28] FCFP4,[28] MOE53,[33] and moe206.[34] The first 8 descriptors or fingerprints were computed by RDkit,[31] Molmap,[36] or mol2vec.[32] The last one, moe206, was calculated by Molecular Operating Environment (MOE),[34] representing the 206 2D descriptors for each molecule, mainly composed of physicochemical properties, kappa shape index, surface area, atomic and bond number, etc. The list of descriptors is shown in Table S2.†

Model assessment

The model performance is mainly measured by AUC-ROC. In addition to AUC-ROC, sensitivity (SE), specificity (SP), Negative Predictive Value (NPV), Positive Predictive Value (PPV), Accuracy (ACC), Balanced Accuracy (BA), and Matthews Correlation Coefficient (MCC) were evaluated. The formula for calculating the index is as follow: True positive (TP): the ratio correctly judged to be positive in all actually positive samples. True negative (TN): a ratio that is correctly predicted to be negative in all actually negative samples. False positive (FP): the rate of being wrongly judged positive in all actually negative samples. False negative (FN): the ratio of false predictions to negative in all actually positive samples.

Hyperparameter optimization

Like other machine learning models, hyperparameter optimization might be an important step for the D-MPNN model. Bayesian hyperparameter optimization[37] implemented on chemprop[26] was adapted to search the optimum parameters including the number of the model, the hidden size of the neural network, the dropout probability, the number of feed-forward network layers.

Results

Performance on Cai's dataset

Recently, Cai et al. collected an hERG dataset containing 7889 compounds and proposed a deep learning model (deephERG) based on DeepChem module for hERG blockers prediction.[21] Initially, we compared the performance between the D-MPNN model and the deephERG model. To make a fair comparison, we used the same datasets to train and test the D-MPNN models. Using the same valid dataset and evaluation metric as those reported in Cai's, the D-MPNN model (AUC range from 0.989 to 0.998) showed an improved performance than their deephERG (AUC range from 0.883 to 0.958) (Fig. 1). The distribution of the prediction probabilities by the D-MPNN model for all compounds in Cai's dataset at a threshold of 10 μM were plotted (Fig S3†). It showed that most non-blockers (red color) were predicted as non-blockers whose probabilities are around zero, while a majority of blockers (blue color) have probabilities near 100%.

Fig. 1

Comparison of the AUC-ROC values of D-MPNN and deephERG model across different decoy thresholds on Cai's validation set.

Inspired by the excellent performances of the original D-MPNN, we next evaluated the performance of the D-MPNN model coupled with 9 different molecular descriptors (including ECFP, ECFP4, ECFP6, FCFP4, MACCS, PubchemFP, RDkit 2D normalized, MOE53, moe206, mol2vec, MOE53 + mol2vec, and MOE53 + RDkit 2D normalized). At this time, we also used the same total of 7889 compounds from Cai's dataset but applied a five-fold cross-validation mode instead of using the pre-splitted data for training and testing. In the five-fold cross-validation mode, the whole 7889 compounds were divided into five subsets by random select or scaffold split. In each run, four of the five subsets were selected as the training data, and the remaining subset was used as the testing data for evaluating performance. This process was repeated 5 times until each of the five subsets has used as the testing data once. The average testing result from the five runs was used to calculate the final AUC-ROC. The five-fold cross-validation results based on the random split are given in Fig. 2A. It is believed that k-fold cross-validation[38] can effectively avoid overfitting and underfitting. As expected, D-MPNN using the five-fold cross-validation (set as the control in Fig. 2A) showed a significantly poor performance than the above D-MPNN using pre-split data (AUC-ROC: 0.947 ± 0.005 versus 0.996 ± 0.004). And with the incorporation of the molecular characterizations or their combinations, the D-MPNN performances have about 2% fluctuation (AUC-ROC from 0.935 to 0.956). The D-MPNN coupled with moe206 descriptor (D-MPNN + moe206) provides optimal performance with AUC-ROC reaching 0.956 ± 0.005, which is better than the original D-MPNN (AUC-ROC 0.947 ± 0.005). The combination of D-MPNN with FCFP4 fingerprint provided the worst results.

Fig. 2

(A) Performances of the D-MPNN with the single molecular characterization and the combinations of molecular characterizations (control, ECFP4, ECFP6, FCFP4, MACCS, PubchemFP, RDkit 2D normalized, MOE53, moe206, mol2vec, MOE53 + mol2vec, MOE53 + RDkit 2D normalized) on the validation set under random split. (B) Performances on the validation set under scaffold balanced split.

Similarly, all these models with molecular characterizations were tested using five-fold cross-validation under the scaffold split mode (Fig. 2B). In this circumstance, D-MPNN with MOE53, moe206, and RDkit2D obtained slightly higher performance than others, with AUC-ROC reaching from 0.922 to 0.925. Since AUC-ROC on random split has less fluctuation than that on scaffold split, so the five-fold cross-validation random split mode was chosen for further investigation. D-MPNN + moe206 was considered to be the best incorporated descriptor for D-MPNN.

Performance comparison with other models

To compare the performance of D-MPNN + moe206 with other machine learning methods reported in the literature, we applied the D-MPNN + moe206 model on the same datasets as those methods including Hou et al.,[17,39] Zhang et al.,[40] and Sun et al.[41] To make a fair comparison, the datasets (including training datasets, valid datasets, and test datasets) were kept the same as the reference methods in each comparison case. AUC-ROC, SE, SP, NPV, PPV, ACC, BA, and MCC were selected as performance indexes. To our delight, D-MPNN + moe206 show generally better or comparable performance over all these models and (Table 1), and it is significantly better than the original D-MPNN in all 6 entries.

Comparison the performance of D-MPNN + moe206 model with other best models

	Model	Training set	Test set	AUC-ROC	SE	SP	ACC
1a	SVM + FCFP	D3 training	D3 test	0.93	0.81	0.89	0.86
	D-MPNN + moe206			0.958 ± 0.005	0.900 ± 0.019	0.913 ± 0.016	0.907 ± 0.010
	D-MPNN			0.955 ± 0.005	0.881 ± 0.032	0.907 ± 0.027	0.896 ± 0.002
2b	Consensus model	Training	Validation	NAc	0.74	0.86	NAc
	D-MPNN + moe206			0.864 ± 0.021	0.808 ± 0.077	0.798 ± 0.039	0.798 ± 0.033
	D-MPNN			0.819 ± 0.012	0.638 ± 0.065	0.844 ± 0.037	0.831 ± 0.031
3b	Consensus model	Training	FDA-1	0.79	0.71	0.78	NAc
	D-MPNN + moe206			0.882 ± 0.013	0.613 ± 0.110	0.856 ± 0.023	0.835 ± 0.018
	D-MPNN			0.813 ± 0.032	0.413 ± 0.099	0.884 ± 0.019	0.844 ± 0.013
4d	SVM	Training I	Test I	0.842	0.907	0.652	0.821
	D-MPNN + moe206			0.871 ± 0.010	0.916 ± 0.014	0.667 ± 0.049	0.832 ± 0.010
	D-MPNN			0.776 ± 0.026	0.907 ± 0.031	0.539 ± 0.065	0.783 ± 0.021
5e	SVM	Training II	Test II	0.839	0.850	0.745	0.821
	D-MPNN + moe206			0.876 ± 0.015	0.890 ± 0.025	0.676 ± 0.039	0.830 ± 0.010
	D-MPNN			0.806 ± 0.010	0.909 ± 0.037	0.553 ± 0.040	0.808 ± 0.018
6f	SVM + 72descriptors + ECFP4	Training	Test	0.962	0.670	0.995	0.984
	D-MPNN + moe206			0.968 ± 0.001	0.656 ± 0.033	0.994 ± 0.001	0.983 ± 0.001
	D-MPNN			0.954 ± 0.001	0.627 ± 0.038	0.992 ± 0.001	0.979 ± 0.000

Doddareddy's dataset.

Siramshetty's dataset.

Not available, this value can NOT be found in the original literature.

Hou's training set I and test set I.

Hou's training set II and test set II.

Ogura's training and test dataset. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed.

Doddareddy's dataset. Siramshetty's dataset. Not available, this value can NOT be found in the original literature. Hou's training set I and test set I. Hou's training set II and test set II. Ogura's training and test dataset. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed. Table 1 entry 1 used the dataset from Doddareddy[11]et al., since their representative model performed better on Dataset3 (D3), which contains training set (blockers:1004, non-blockers:1385) and test set (blockers:108, non-blockers: 147). The blockers are defined with IC50 values <10 μM, non-blockers are with IC50 values >30 μM, and the optimal model of Doddareddy et al. is the SVM model with Functional Class Fingerprints (FCFP6), which gave the AUC-ROC of 0.93 and ACC of 0.86 under five-fold cross-validation. Our D-MPNN + moe206 model showed improved prediction accuracy under the exact training and test dataset, achieving AUC-ROC of 0.958 ± 0.005 and ACC of 0.907 ± 0.010, respectively. While the original D-MPNN model gave a slightly poor performance with AUC-ROC of 0.955 ± 0.005 and ACC of 0.896 ± 0.002, confirming moe206 descriptor incorporation is important for boosting the prediction power of D-MPNN. Table 1 entry 2 listed a result from a recent article by Siramshetty[24]et al. They combined the best models of RF + RDKit, XGBoost + RDKit, DNN + MorganFP, and long short-term memory (LSTM) + ATN-SMILES with their own merits into a consensus model, providing superior performance. Siramshetty's dataset consists of the training set (blockers: 2164, non-blockers: 5990), prospective validation set (threshold = 30 μM, blockers: 53, non-blockers: 786), and FDA-approved drugs (thresholds = 1 μM, blockers: 15, non-blockers: 162; thresholds = 10 μM, blockers: 46, non-blockers: 131), where the training set comes from Public Domain hERG Bioactivity (ChEMBL) using a binary threshold (1 and 10 μM) and thallium flux assay (NCATS) using a threshold of 30 μM. We run the D-MPNN + moe206 model on the same training dataset as Siramshetty's consensus model and then evaluated performance on Siramshetty's validation dataset and FDA-1 dataset. As shown in Table 1 entry 2, on prediction of Siramshetty's Validation dataset D-MPNN + moe206 performs better in SE (0.808 ± 0.077 versus 0.77), while Siramshetty's consensus model has better SP (0.798 ± 0.039 versus 0.86). For prediction of the FDA-1 dataset (Table 1 entry 3), D-MPNN + moe206 performs much better in AUC-ROC (0.882 ± 0.013 versus 0.79) and SP (0.856 ± 0.023 versus 0.78), but not so good as the consensus model in SE (0.613 ± 0.110 versus 0.71). The models of entries 4 and 5 in Table 1 are based on Hou's dataset,[17] which contains training set I (blockers: 352, non-blockers: 40), training set II (blockers: 352, non-blockers: 40), test set I (blockers: 175, non-blockers: 20), and test set II (blockers: 175, non-blockers: 20), blockers and non-blockers are distinguished with a threshold of 40 μM. According to Hou's report,[17] the performance of the optimal model SVM reaches the highest AUC-ROC of 0.842 and 0.839 for test I and test II, respectively. Nevertheless, our D-MPNN + moe206 model showed more competitive and superior performance, which gave an AUC-ROC of 0.871 ± 0.010, ACC of 0.832 ± 0.010 for test I, and an AUC-ROC of 0.876 ± 0.015, ACC of 0.830 ± 0.010 for test II. Entry 6 of Table 1 used Ogura's datasets,[19] which is the largest hERG database so far. It contains a training set (blockers: 6923, non-blockers: 196 918) and a test set (blockers: 2966, non-blockers: 84 395), and as per the criteria of IC50 values ≤10 μM considered to be hERG blockers and IC50 values >10 μM considered to be hERG non-blockers. A prediction model based on the non-dominant sorting genetic algorithm (NSGA-II) was constructed to obtain the maximum prediction performance with 72 selected descriptors and ECFP4 structure fingerprint, reaching an ACC value of 0.984, respectively. D-MPNN + moe206 achieves comparable performance with AUC-ROC of 0.968 ± 0.001 and ACC value of 0.983 ± 0.001 based on the same training and test dataset as Ogura's model. Very recently Karim et al. proposed a robust predictor based on deep learning meta-feature ensembles called CardioTox net, which showed excellent performance in various classification indexes for three external test sets.[25] We tested the performance of D-MPNN + moe206 by comparison with CardioTox using Karim's[25] datasets in Table 2, which contains training set (blockers: 6643, non-blockers: 5977), test set-I (blockers: 30, non-blockers: 14), test set-II (blockers: 11, non-blockers: 30), and test set-III (blockers: 30, non-blockers: 710). The blockers are defined with IC50 values <10 μM, and the non-blockers are with IC50 values ≥ 10 μM. As shown in Table 2 entries 1 and 3, D-MPNN + moe206 is comparable to CardioTox on all three dataset. Although D-MPNN + moe206 performance is slightly inferior to CardioTox on test set-I and test set-III which achieved MCC of (0.567 ± 0.061 versus 0.599, 0.214 ± 0.016 versus 0.220) and ACC of (0.800 ± 0.030 versus 0.810, 0.696 ± 0.030 versus 0.746), D-MPNN + moe206 showed improved performance for all metrics except ACC on test set-II, which reaches MCC of 0.470 ± 0.053, NPV of 0.950 ± 0.032, PPV of 0.467 ± 0.020, SP of 0.620 ± 0.030, SE of 0.909 ± 0.064, and B-ACC of 0.764 ± 0.030.

Comparison the performance of D-MPNN + moe206 model with Karim's best model

	Model	Evaluation data	AUC-ROC	MCC	NPV	ACC	PPV	SP	SE	B-ACC
1a	CardioTox	Test set-I	NAb	0.599	0.688	0.810	0.893	0.786	0.833	0.810
1a	D-MPNN + moe206	Test set-I	0.849 ± 0.042	0.567 ± 0.061	0.656 ± 0.044	0.800 ± 0.030	0.890 ± 0.023	0.786 ± 0.051	0.807 ± 0.037	0.796 ± 0.031
2a	CardioTox	Test set-II	NAb	0.452	0.947	0.755	0.455	0.600	0.909	0.754
2a	D-MPNN + moe206	Test set-II	0.810 ± 0.055	0.470 ± 0.053	0.950 ± 0.032	0.698 ± 0.022	0.467 ± 0.020	0.620 ± 0.030	0.909 ± 0.064	0.764 ± 0.030
3a	CardioTox	Test set-III	NAb	0.220	0.986	0.746	0.113	0.698	0.794	0.746
3a	D-MPNN + moe206	Test set-III	0.830 ± 0.010	0.214 ± 0.016	0.986 ± 0.037	0.696 ± 0.030	0.110 ± 0.006	0.692 ± 0.033	0.788 ± 0.064	0.740 ± 0.020

Karim's dataset.

Not available. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed.

Karim's dataset. Not available. For each D-MPNN model, the average of different folds (N = 5) and the corresponding standard deviation are listed. According to these comparative analyses, the D-MPNN + moe206 model the D-MPNN + moe206 model showed excellent AUC-ROC, accuracy, MCC, sensitivity, and NPV across different datasets given here, providing one of the best classification tools for predicting potential hERG-induced cardiotoxicity of compounds.

Feature importance with Shapley additive explanations

Why the moe206 descriptors can boost the performance of D-MPNN on predicting hERG blocker? To answer this question, we attempted to analyze the relationship between the moe206 descriptors and the predictive probability of the model through the Shapley additive translation (SHAP) (https://github.com/slundberg/shap),[42] which would provide rich visualization of an individual feature. Lundberg et al.[42] proposed SHAP as a unified measure of feature importance and incorporated their algorithm into extreme gradient boosting (XGBoost)[43] and Light Gradient Boosting Machine (LightTGBM) in subsequent studies.[44] We first evaluated the performance of XGBoost based on the moe206 descriptor and ECFP4 fingerprint on Cai's hERG dataset. The results showed that the AUC-ROC of XGBoost on moe206 descriptor is only slightly lower than that of D-MPNN + moe206 (0.950 ± 0.006 versus 0.956 ± 0.005) and significantly better than AUC-ROC of XGBoost on ECFP4 descriptor (0.950 ± 0.006 versus 0.933 ± 0.001) (Fig. 3A). This further confirmed the importance of the moe206 descriptor.

Fig. 3

(A) Comparative prediction performance of XGBoost with D-MPNN under random split through five-fold cross-validation on Cai's dataset. (B) Relative importance and the SHAP values of the 20 highest ranked molecular descriptors of XGBoost with moe206. Each molecule represents a point to form a descriptor line. A molecule with a high (red) SHAP value will increase the probability of being predicted by the model as a blocker.

Next, we did SHAP analysis on the moe206 descriptor. As shown in Fig. 3B, h_pavgQ, logP(o/w), SlogP, vsa_pol, h_logD, and h_pkB are considered to be the six most important descriptors of the model. The higher h_pavgQ, logP(o/w), SlogP and h_logD, and the lower vsa_pol and h_pkB (The list of descriptors is shown in Table S2†), the more likely it is to be predicted as a blocker. According to Honma's research,[45] positively charged atoms are the main distinguishing factor. Approximately 80% of hERG non-blockers do not have but more than half of the inhibitors contain at least one. The h_pavgQ at the top of the list not only reflects this feature but also reflects the superiority of the model in terms of feature selection. Similarly, the logP (o/w), SlogP and h_logD (ranked second, third and fifth) are related to Log octanol/water partition coefficient, representing those fat-soluble fragments or aromatic heterocycles in the blockers form a Π-stacking interaction with Phe-656 and Tyr-652,[46] vsa_pol is also related to the hydrophobicity to some extent. Regarding pKb, on account of a number of hERG blockers have an aromatic ring at the end of the ligand and a basic amine, causing it to protonate readily at physiological pH and creating with cation–π interaction with Tyr652.[47] In short, the top-ranked descriptors are consistent with the characteristics reported in previous studies that most of the hERG blockers contained positively charged atom, more lipophilic, and more alkaline.[2,48]

Discussion

In this study, we establish a series of hERG blocker classifiers based on D-MPNN. Although our model has achieved some relatively better predictive performance compared with previous studies, there are some limitations. The main limitation lies in the collection and selection of datasets.[35,49] Even larger, unbiased and high-quality datasets are required to train the D-MPNN + moe206 model. Compared to D-MPNN without any descriptors, D-MPNN carries some descriptor prediction performance is even worse, some descriptor values may stay the same, or some descriptors are meaningless to reflect molecular characteristics, etc. It may be due to the fact that some descriptors cannot be used to classify blockers and non-blockers significantly, leading to ineffective work of deep learning, which requires us to intervene in the selection of descriptors. Thus, the complexity of the model is reduced and the generalization ability of the model is improved, and the computational resources and time are saved.[50]

Conclusions

Cardiotoxicity is one of the common side effects of drugs in clinical practice. Since traditional detection methods are time-consuming and laborious, a variety of methods have emerged to establish models in in silico to predict hERG cardiotoxicity in recent years. In this study, we use D-MPNN to build the model based on different datasets, and screen difference descriptors incorporated into D-MPNN to further improve the performance. D-MPNN with moe206 descriptor was found performing better than with other descriptors. Its AUC-ROC value can reach 0.922 ± 0.015 under scaffold split and 0.956 ± 0.005 under random split on Cai's hERG dataset. In addition, we made a quite comprehensive comparison with 5 other recent models reported. All the results showed that the D-MPNN + moe206 model is among the best classification models for evaluating hERG blockers. In conclusion, this study provides a new approach for predicting compounds with potential hERG-induced cardiotoxicity.

Data availability

All datasets and the molecules with the binding data with this paper have been provided in All_datasets.zip file. The dataset used in the paper and script for reproducing these results, can be found at https://github.com/AI-amateur/DMPNN-hERG.

Author's contributions

M. S. and C. J. were the key contributor in drafting the manuscript, designing and conducting computation. J. C. took part in conceiving and directing the project. L.-P. Q., J.-J. Q. and G. C. analysed the results, reviewed and finalized the manuscript.

Conflicts of interest

There are no conflicts to declare.

40 in total

1. Reoptimization of MDL keys for use in drug discovery.

Authors: Joseph L Durant; Burton A Leland; Douglas R Henry; James G Nourse
Journal: J Chem Inf Comput Sci Date: 2002 Nov-Dec

2. Deep Learning-Based Prediction of Drug-Induced Cardiotoxicity.

Authors: Chuipu Cai; Pengfei Guo; Yadi Zhou; Jingwei Zhou; Qi Wang; Fengxue Zhang; Jiansong Fang; Feixiong Cheng
Journal: J Chem Inf Model Date: 2019-02-15 Impact factor: 4.956

3. Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition.

Authors: Sabrina Jaeger; Simone Fulle; Samo Turk
Journal: J Chem Inf Model Date: 2018-01-10 Impact factor: 4.956

4. DeepHIT: a deep learning framework for prediction of hERG-induced cardiotoxicity.

Authors: Jae Yong Ryu; Mi Young Lee; Jeong Hyun Lee; Byung Ho Lee; Kwang-Seok Oh
Journal: Bioinformatics Date: 2020-05-01 Impact factor: 6.937

5. A mechanism for the proarrhythmic effects of cisapride (Propulsid): high affinity blockade of the human cardiac potassium channel HERG.

Authors: D Rampe; M L Roy; A Dennis; A M Brown
Journal: FEBS Lett Date: 1997-11-03 Impact factor: 4.124

6. State-of-the-Art Automated Patch Clamp Devices: Heat Activation, Action Potentials, and High Throughput in Ion Channel Screening.

Authors: Sonja Stoelzle; Alison Obergrussberger; Andrea Brüggemann; Claudia Haarmann; Michael George; Ralf Kettenhofen; Niels Fertig
Journal: Front Pharmacol Date: 2011-11-24 Impact factor: 5.810

7. PubChem BioAssay: 2017 update.

Authors: Yanli Wang; Stephen H Bryant; Tiejun Cheng; Jiyao Wang; Asta Gindulyte; Benjamin A Shoemaker; Paul A Thiessen; Siqian He; Jian Zhang
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

8. MoleculeNet: a benchmark for molecular machine learning.

Authors: Zhenqin Wu; Bharath Ramsundar; Evan N Feinberg; Joseph Gomes; Caleb Geniesse; Aneesh S Pappu; Karl Leswing; Vijay Pande
Journal: Chem Sci Date: 2017-10-31 Impact factor: 9.825

9. Analyzing Learned Molecular Representations for Property Prediction.

Authors: Kevin Yang; Kyle Swanson; Wengong Jin; Connor Coley; Philipp Eiden; Hua Gao; Angel Guzman-Perez; Timothy Hopper; Brian Kelley; Miriam Mathea; Andrew Palmer; Volker Settels; Tommi Jaakkola; Klavs Jensen; Regina Barzilay
Journal: J Chem Inf Model Date: 2019-08-13 Impact factor: 4.956