Literature DB >> 30023518

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction.

Raquel Rodríguez-Pérez¹, Martin Vogt¹, Jürgen Bajorath¹.

Abstract

In computational chemistry and chemoinformatics, the support vector machine (SVM) algorithm is among the most widely used machine learning methods for the identification of new active compounds. In addition, support vector regression (SVR) has become a preferred approach for modeling nonlinear structure-activity relationships and predicting compound potency values. For the closely related SVM and SVR methods, fingerprints (i.e., bit string or feature set representations of chemical structure and properties) are generally preferred descriptors. Herein, we have compared SVM and SVR calculations for the same compound data sets to evaluate which features are responsible for predictions. On the basis of systematic feature weight analysis, rather surprising results were obtained. Fingerprint features were frequently identified that contributed differently to the corresponding SVM and SVR models. The overlap between feature sets determining the predictive performance of SVM and SVR was only very small. Furthermore, features were identified that had opposite effects on SVM and SVR predictions. Feature weight analysis in combination with feature mapping made it also possible to interpret individual predictions, thus balancing the black box character of SVM/SVR modeling.

Entities: Chemical Gene Species

Year: 2017 PMID： 30023518 PMCID： PMC6045367 DOI： 10.1021/acsomega.7b01079

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Supervised machine learning is a preferred approach for the prediction of compound properties including biological activity.[1,2] Among machine learning approaches, support vector machines (SVM) have become increasingly popular.[3−5] The SVM methodology was originally conceived for binary class label prediction of objects[6−8] on the basis of training data. In a given feature space, SVM learning aims to construct a hyperplane to best separate training data with different class labels.[7,8] The hyperplane is derived on the basis of a limited number of training instances, so-called support vectors, to maximize a margin on each side of the plane. If the data are not separable by a hyperplane, the data can be projected into feature spaces of higher dimensionality where linear separation of positive and negative examples might be possible.[7,8] For a given feature space, a successfully derived hyperplane represents a classification model that can then be used to predict the class label of test objects in this space, depending on which side of the hyperplane (i.e., the positive or negative) they fall. In chemoinformatics, binary class label prediction is used for compound classification, for example, to distinguish active from inactive compounds.[3,4] In addition to class label prediction, SVM models can also be used for compound database ranking by calculating their distance from the “active” or “inactive side” of the hyperplane.[9] Support vector regression (SVR), an extension of the SVM algorithm, has been introduced for predicting numerical property values[10,11] such as compound potency. In SVR, instead of generating a hyperplane for class label prediction, a different function is derived on the basis of training data to predict numerical values. In analogy to SVM, SVR also projects training data with nonlinear structure–activity relationships (SARs) in a given feature space into higher-dimensional space representations where a linear regression function may be derived. In this case, compounds with different potency values are used to fit a regression model that can then be used to predict the potency of new candidate compounds. SVR typically produces statistically accurate regression models when predictions over all potency ranges are analyzed.[5,12] However, SVR also displays the tendency to underpredict highly potent compounds in data sets and hence eliminates activity cliffs from their activity landscape.[12] In SVM and SVR, mapping into higher-dimensional feature spaces, which is a signature of these algorithms, is accomplished through the use of kernel functions, the so-called “kernel trick”.[13] When using nonlinear kernel functions, SVM and SVR can resolve nonlinear SARs in original feature spaces through dimensionality extension. This makes SVR especially attractive for potency prediction because it is not confined to the applicability domain of conventional quantitative SAR analysis methods.[14] On the other hand, both SVM and SVR modeling have black box character, meaning that the predictions cannot be directly interpreted in chemical terms. Hence, it is generally difficult to rationalize model performance. Only few attempts have thus far been made to aid in SVM model interpretation in high-dimensional kernel spaces. For example, support vectors with largest contributions to SVM models have been visualized.[15] In addition, descriptor features have been organized in polar coordinate systems according to their contributions to SVM predictions.[16] To increase model interpretability and reduce the black box character of SVM and SVR, we aimed to identify descriptor features that determine model performance on individual compound data sets. Given the close methodological relationship between SVM and SVR, relevant features of classification and regression models were also compared. Intuitively, one might expect that SVM and SVR would prioritize similar features for a given compound data set because most informative chemical features for predicting whether a compound is active or not might also be relevant for predicting the magnitude of activity. For this purpose, feature weighting and mapping techniques were systematically applied. Feature mapping helped to rationalize the performance of SVM and SVR models.

Results and Discussion

Global Performance of SVM and SVR Models

A prerequisite for feature weight analysis is the assessment of the prediction accuracy of SVM and SVR models. This is the case because the evaluation of features that contribute to predictions is only meaningful if the underlying models reach a reasonably high-performance level. Figure summarizes the performance of our SVM and SVR models on the 15 activity classes using different figures of merit appropriate for assessing classification and regression calculations. Results are presented for two molecular representations, the MACCS fingerprint and extended connectivity fingerprint with bond diameter 4 (ECFP4). Figure a shows that the median F1 scores and the area under the ROC curve (AUC) values of the SVM models were clearly above 0.95 for both MACCS and ECFP4 fingerprints, reflecting accurate classification of active and inactive compounds. Furthermore, recall rates of the active compounds reached a median value of 0.77 for MACCS and 0.94 for ECFP4 among the top 1% of the ranked compounds. These results also reflected the usually observed higher performance of ECFP4 relative to MACCS.

Figure 1

Global performance. Box plots report the prediction accuracy of (a) SVM and (b) SVR calculations over all activity classes and 10 independent trials per class. For SVM calculations, the F1 score, AUC, and recall of active compounds among the top 1% of the ranked test set are reported. For SVR calculations, the MAE and MSE values and the Pearson correlation coefficient (r) for the observed and predicted potency values are given. Figure b reports the performance of the SVR models across the different activity classes. The median values of mean absolute error (MAE) and mean squared error (MSE) median values were between 0.5 and 0.6, and the median values of the Pearson correlation coefficient (r) between the predicted and observed pKi values were above 0.7 for MACCS and above 0.8 for ECFP4. In addition, errors of potency predictions were consistently limited to less than 1 order of magnitude. Thus, the SVR model also exhibited an overall reasonable performance.

Feature Relevance

A second condition for informative feature weight analysis is demonstrating the relevance of individual fingerprint features. Therefore, features were randomly removed from SVM models or in the order of decreasing feature weights, and classification calculations were repeated. Figure shows the results for exemplary activity classes and the MACCS (Figure a) and ECFP4 (Figure b) fingerprints. For MACCS containing 166 features, both random and weight-based feature removal decreased compound recall and increased MSE values. The magnitude of errors was greater for weight-based feature removal than for random feature removal. For ECFP4 comprising much larger numbers of possible features, random feature removal affected the calculations only marginally, if at all, whereas removal of highly weighted features led to a substantial reduction in compound recall and a gradual increase in MSE values. Thus, as anticipated, removal of features obtaining high weights during model building consistently reduced the model performance.

Figure 2

Effects of feature removal. For SVM and SVR, the effects of iterative fingerprint feature removal on recall of active compounds and MSE are reported for three exemplary activity classes (with TID values according to Table ) and the (a) MACCS and (b) ECFP4 fingerprints. Features were randomly removed (dashed lines) or in the order of decreasing feature weights (solid lines).

Table 1

Compound Data Setsa

TID	accession no.	target name	CPDs	median pK_i	IQR pK_i
11	P00734	thrombin	839	6.33	1.86
51	P08908	serotonin 1A (5-HT1A) receptor	1904	7.62	1.50
72	P14416	dopamine D2 receptor	2876	7.00	1.29
100	P23975	norepinephrine transporter	1099	6.82	1.60
129	P35372	mu-opioid receptor	2026	7.26	1.95
136	P41143	delta-opioid receptor	1547	7.11	1.97
137	P41145	kappa-opioid receptor	1930	7.28	2.07
138	P41146	nociceptin receptor	844	7.85	1.43
165	Q12809	HERG Homo sapiens	956	5.93	1.05
194	P00742	coagulation factor X	1476	8.05	2.80
278	P29275	adenosine A2b receptor	1187	7.23	1.43
10280	Q9Y5N1	histamine H3 receptor	2434	8.00	1.43
11362	P42336	PI3-kinase p110-α subunit	885	7.68	1.39
12968	O43614	orexin receptor 2	1040	6.70	1.57
20174	Q9Y5Y4	G protein-coupled receptor 44	833	7.65	1.90

Composition of 15 compound activity classes is reported that were selected for SVM and SVR modeling. For each class, the ChEMBL target ID (TID), accession number, target name, and number of compounds (CPDs) are given. In addition, median and interquartile range (IQR) pKi values are reported, which were calculated from the pKi distribution of each activity class.

Global Feature Weight Analysis

For SVM and SVR models, weights of fingerprint features were systematically determined over 10 independent trials and compared. In some instances, feature weights were consistently high or low over different trials, as further detailed below; in others, they varied depending on the training data. In addition, feature weights generally varied for different activity classes, as expected. Furthermore, it was observed that some individual features were equally important for SVM and SVR for a given class, consistent with their shared methodological framework. However, a striking finding was that the importance of many features for classification and regression fundamentally differed. Figures and 4 show representative examples for different activity classes and MACCS and ECFP4, respectively. Feature weights were assigned to three different categories (i.e., high, medium, and low), as detailed in the Materials and Methods section. Figures a and 4a show examples of MACCS and ECFP4 features, respectively, which had very different weights in SVM and SVR models, including features with consistently—or mostly—low weights in classification and high weights in regression model and vice versa. Thus, many features were only relevant for either classification or regression. On average, 7 MACCS and 18 ECFP4 features were identified per activity class that had a high weight in at least 5 of the 10 SVM trials and a low weight in at least 5 SVR trials and vice versa. Among these, there were no MACCS and on an average one ECFP4 feature that exclusively had high/low weights in all SVM/SVR trials and vice versa. One possible explanation for such differences in feature relevance might be the composition of support vectors in SVM and SVR. Although SVM and SVR share a closely related methodological framework, support vectors for SVM and SVR are determined in different ways. To derive support vectors for regression, only active compounds are considered, whereas classification models are trained with active and inactive compounds, which also contribute to support vectors. Given these intrinsic differences, SVM and SVR models may prioritize different chemical descriptors for support vector compounds during the training stage.

Figure 3

Figure 4

Distribution of ECFP4 feature weights and feature mapping. For an exemplary activity class (serotonin 1A (5-HT1A) receptor agonists, TID 51), (a) reports the distribution of weights of selected features for SVM (classification, blue color) and SVR (regression, red color) calculations over 10 trials. The color gradient represents the magnitude of feature weights (low, medium, or high). In (b), features that were highly weighted in SVM (blue color) and SVR (red color) are mapped on the same correctly predicted compound.

Distribution of MACCS feature weights and feature mapping. For an exemplary activity class (thrombin inhibitors, TID 11), (a) reports the distribution of weights of the selected features for SVM (classification, blue color) and SVR (regression, red color) over 10 trials. The color gradient represents the magnitude of feature weights (low, medium, or high). In (b), features that were highly weighted in SVM (blue color) and SVR (red color) are mapped on the same correctly predicted compound. In feature labels, “A” stands for any atom. Distribution of ECFP4 feature weights and feature mapping. For an exemplary activity class (serotonin 1A (5-HT1A) receptor agonists, TID 51), (a) reports the distribution of weights of selected features for SVM (classification, blue color) and SVR (regression, red color) calculations over 10 trials. The color gradient represents the magnitude of feature weights (low, medium, or high). In (b), features that were highly weighted in SVM (blue color) and SVR (red color) are mapped on the same correctly predicted compound. In Figures b and 4b, exemplary MACCS and ECFP4 features are mapped onto the structures of compounds that were correctly predicted. In Figure b, MACCS features that were highly weighted in classification (blue color) or regression (red color) were mapped onto the same molecule, a thrombin inhibitor, illustrating that features critical for SVM or SVR are often mapped to different parts of the same substructure. In Figure b, ECFP4 features critical for classification (blue color) or regression (red color) are mapped to a serotonin 1A (5-HT1A) receptor agonist, showing that features important for classification (feature 638) or regression (201) are mapped to distant parts of this compound. In principle, features relevant for SVM and SVR might be activity class-specific or shared by different classes. To identify features common to different classes, MACCS and ECFP4 features were determined that had a high weight in at least 5 of the 10 SVM or SVR trials per class. For SVM, on an average, 9 of such MACCS and 15 ECFP4 features were identified per activity class and for SVR, 14 MACCS and 35 ECFP4 features were identified. For SVM, a total of 38 MACCS and 47 ECFP4 highly weighted features were shared by two activity classes. For SVR, 56 MACCS and 116 ECFP4 features were shared by two classes. However, for SVM (SVR), only five (seven) MACCS and nine (three) ECFP4 features with at least five high weights were common to five or more activity classes. Thus, most features determining SVM and SVR predictions were weighted in a compound class-specific manner. Furthermore, we also determined the number of features that were consistently highly weighted in all trials per activity class. For SVM, on an average, only two of such MACCS and five ECFP4 features were identified and for SVR, two and four MACCS and ECFP4 features, respectively, were identified. Thus, weights of most features with strong contributions to SVM and SVR predictions displayed some variations in different activity classes depending on the training sets.

Features with Different Signs

So far, only absolute feature weights were analyzed, which revealed many features that contributed differently to SVM and SVR. However, in SVM and SVR, feature weights may carry a positive or negative sign depending on how they influence the predictions. Features with a positive weight contribute to the prediction of active compounds in SVM and high potency values in SVR, whereas features with a negative weight contribute to the prediction of inactive compounds in classification and low potency values in regression. Thus, taking these signs into account further refines the view of differential feature contributions to SVM and SVR. Therefore, we also searched for features with high weights and different signs. Such features have opposite effects in SVM and SVR. Only few features were identified that had high weights in corresponding SVM and SVR trials but consistently different signs. Exemplary features with opposite effects in SVM and SVR are shown in Figure . For example, three MACCS features in Figure a contributed to the prediction of active compounds but low potency values (dark green/light orange bars) and two to the prediction of inactive compounds but high potency values of active compounds (light green/dark orange bars). In Figure b, four ECFP4 features are shown that contributed to the prediction of active compounds and low potency values and one that contributed to the prediction of inactive compounds and high potency values. Among features with high weights in both SVM and SVR, as discussed above, sign inversion and opposite effects in SVM and SVR were exceptions.

Figure 5

Highly weighted features with different signs. For selected activity classes and (a) MACCS and (b) ECFP4 features (TID/feature), the number of trials is reported in which the features had high weights but different signs (+, −) in SVM and SVR. Features with positive weights contribute to the correct prediction of active compounds (dark green color) or high potency values (light green color), whereas features with negative weights contribute to the prediction of inactive compounds (dark orange bars) or low potency values (light orange bars). Bars are labeled with MACCS features (A, any atom and Q, heteroatom) or mapped ECFP4 atom environments (pink color).

Mapping of Highly Weighted Features

In Figure , highly weighted ECFP4 features are mapped on compounds from different activity classes that were correctly predicted using SVM and SVR. Atom environments were chosen for exemplary mapping because they have—by definition—a greater tendency to overlap than that involving discrete MACCS features. For an exemplary trial, features that had a high weight in the SVM and/or SVR model were mapped to the compounds shown. Figure a illustrates that only partly overlapping yet distinct atom environments led to the correct classification and potency value prediction of each compound.

Figure 6

Mapping of highly weighted features. ECFP4 atom environments with high weights in classification and regression are mapped onto correctly classified compounds and potency prediction within 0.2 pKi units. (a) shows individual compounds from three activity classes; (b,c) show pairs of analogues from two activity classes. Each compound is shown twice (side-by-side). On the left and right, features from classification (blue color) and regression (red color) are mapped, respectively. Single carbon atoms are displayed if they are a part of a mapped atom environment. In (b,c), substructures of analogues with feature differences are highlighted in gray color. The two thrombin inhibitors in Figure b are close structural analogues that are only distinguished by a heteroatom replacement in a ring and a fluorine substituent. As anticipated for highly similar compounds, these inhibitors shared a number of features that were highly weighted in classification and regression models. However, two features highly weighted for regression but not classification were mapped to the ring substructure distinguishing these compounds. Clearly, in contrast to the SVM model that assigned the same highly weighted features to both inhibitors, in accordance with their common activity, the SVR model accounted for the structural difference between these compounds. Hence, feature mapping also indicated that the fluorine substitution might be responsible for the higher potency of the inhibitor at the bottom, given its positive weight. The two mu-opioid receptor ligands in Figure c are also analogous to each other but distinguished from each other by multiple substitutions at the upper and lower ring. In this case, few highly weighted features were present, only one of which was shared by the classification and regression models, covering the methyl substituent at the upper phenyl ring. Other highly weighted features in the models were distinct and mapped to different substructures. In the SVR model, a highly weighted feature with negative contribution matched a part of the upper phenyl ring including the methoxy substituent of the compound at the top, indicating that this substructure (but not the lower ring) was important for potency variation among analogues. Taken together, these examples illustrate that comparative mapping of features highly weighted in SVM and SVR helps to rationalize predictions made by classification and regression models and may reveal SAR information.

Conclusions

In this work, we have investigated and compared the relevance of different fingerprint features for the corresponding SVM and SVR models. The MACCS and ECFP4 fingerprints used herein capture the structural features of compounds in different ways. To these ends, feature weight analysis was carried out for well-performing classification and regression models over different compound classes. Because SVM and SVR share a common methodological framework, one might hypothesize that there should be considerable overlap between structural features that determine binary activity and potency value predictions. By contrast, systematic feature weight analysis revealed that features with high weights in SVM and SVR predominantly differed, a rather unexpected finding. In many instances, individual features contributed very differently to classification and regression, although features with strongly opposing effects were rare, as revealed by the analysis of positive and negative weights. SVM and SVR predictions are usually determined by feature combinations rather than individual features with high weights. Thus, features with medium weights also make contributions to predictions, albeit at a lesser magnitude than the most important ones. Therefore, as also demonstrated herein, mapping of highly weighted features is usually sufficient to identify molecular regions that are important for the activity-based classification and structural differences between compounds that are responsible for potency variation. Accordingly, mapping and comparing features that are highly weighted in SVM and SVR models help to better understand how individual features influence or determine predictions and thus alleviate the often-cited black box character of SVM, SVR, and other machine learning approaches that hinder model interpretation. Moreover, mapping of features that are highly weighted in SVR models onto compounds with correctly predicted potency values also points at SAR-informative regions in active compounds.

Materials and Methods

Compound Data Sets

Different sets of compounds with activity against human targets were extracted from ChEMBL version 22.[17] Only compounds with numerically specified equilibrium constants (Ki values) for single human proteins with the highest assay confidence score were selected. If multiple Ki values for a compound and a target were available, they were averaged provided all values fell within the same order of magnitude; otherwise, the compound was discarded. Furthermore, compounds with a pKi value below 5 were not selected to exclude borderline active compounds from modeling. In addition, this pKi threshold also limited the range of potency values for SVR model building. Table summarizes the 15 large activity classes that were selected. Each class contained at least 800 active compounds. In addition, for SVM modeling, 250 000 compounds were randomly selected from ZINC[18] as a pool of negative (inactive) training and test instances. From this pool, negative training and test sets were randomly sampled for all classification calculations. Composition of 15 compound activity classes is reported that were selected for SVM and SVR modeling. For each class, the ChEMBL target ID (TID), accession number, target name, and number of compounds (CPDs) are given. In addition, median and interquartile range (IQR) pKi values are reported, which were calculated from the pKi distribution of each activity class.

Molecular Representation

Compounds were represented as MACCS[19] and ECFP4 fingerprints.[20] MACCS is a prototypic binary-keyed fingerprint comprising 166 bits, each of which accounts for the presence or absence of a structural fragment or pattern. ECFP4 is a representative feature set fingerprint enumerating layered atom environments, which are encoded by integers using a hashing function. By design, ECFP4 has variable sizes, but it can be folded to obtain a fixed-length representation. For our calculations, ECFP4 was folded into a 1024-bit format using modulo mapping. Feature-to-bit mapping was recorded to enable mapping of fingerprint bits to compound structural features. Although modulo mapping assigns different features (atom environments) to identical bits, it is possible to trace environments and map them. Fingerprint representations were generated using in-house Python scripts based upon the OEChem toolkit.[21]

Support Vector Machine

For binary classification, training instances defined by a feature vector x ∈ X and a class label y ∈ {−1,1} are projected into the feature space X. For activity prediction, negative and positive examples represent inactive and active compounds for a given target, respectively. The SVM algorithm attempts to construct a hyperplane H such that the distance between the classes, the so-called margin, is maximized. This hyperplane is defined by a normal vector w and a scalar b using the expression H = {x|⟨w,x⟩ + b = 0}. For data that cannot be separated using a linear function, slack variables are added that permit training instances to fall within the margin or on the incorrect side of the hyperplane. To control the magnitude of allowed training errors, the cost or regularization hyperparameter C is introduced to balance margin size and classification errors. This represents a primal optimization problem that can be expressed in a dual form using Lagrange multipliers α (Lagrangian dual problem). Its solution yields the normal vector of the hyperplane w = ∑αyx. Training examples with nonzero coefficients represent the support vectors and correspond to data points of one class that are closest to the other, that is, those that lie on the margin of the hyperplane. Once the hyperplane is derived, test data are projected into the feature space and classified according to the side of the plane on which they fall, that is, f(x) = sgn(∑αy⟨x,x⟩ + b), or ranked using the real value, that is, g(x) = ∑αy⟨x,x⟩ + b.[9]

Support Vector Regression

Training samples for SVR are defined by a feature vector x ∈ X and a numerical label y ∈ R.[10,11] If SVR is applied to potency prediction, the numerical label is the pKi value of the compound. SVR maps the training data as close as possible to the quantitative output y by deriving a regression function of the type f(x) = ⟨w,x⟩ + b. Tolerated deviations from the observed and predicted values of training data are at most ε, and larger errors are penalized. In SVR, the relaxation of error minimization problem is also controlled by a hyperparameter C, which penalizes large slack variables or deviations from the so-called ε tube. By solving the optimization problem with a Lagrange reformulation, the normal vector is derived and the prediction function is expressed as f(x) = ∑α⟨x,x⟩ + b.

Kernel Function

When accurate data separation is not feasible in the X space, the standard scalar product ⟨·,·⟩ is replaced by a kernel function K(·,·). Conceptually, the kernel function represents the scalar product in a high-dimensional space W in which the data might become linearly separable, without the need to compute an explicit mapping to W. This approach is known as the “kernel trick”[13] that is applied in both SVM and SVR. In chemoinformatics, one of the most popular kernels for fingerprint representations is the Tanimoto kernel[22] that was also used herein

Feature Weight Analysis

In the SVM model, different weights are assigned to molecular descriptors (features), which correspond to the coefficients of the primal optimization problem. The linear kernel (scalar product) allows direct determination of feature weights from the dual problem coefficients and support vectors. By contrast, direct access to feature weights is not possible when using nonlinear kernel functions because an explicit mapping into the high-dimensional feature space is not computed. However, for the Tanimoto kernel, feature weight analysis can be adapted from the linear case according to which the importance of a feature depends on the coefficients of those support vectors that contains the feature.[16] To account for the nonlinearity of the Tanimoto formalism, a normalization factor is included for each individual support vector by dividing the feature weight contribution by the total number of features present in each support vectorHere, FW(d) is the feature weight for feature d, D is the dimensionality, m is the number of support vectors, and v and α are the support vector coefficients of the dual problem solution. Feature contributions are not constant across feature space and depend on the fingerprint that is used.[16] However, adaptation of feature weight analysis from the linear case with normalization yields an average weight, indicating the importance of each feature. Highly weighted fingerprint features can then be mapped to compound structures.[16]

Calculations and Data Analysis

Each activity class was randomly divided into training and test (prediction) sets comprising 700 and 100 compounds, respectively, following previously derived guidelines for relative training and test set composition.[23] For SVM, 700 and 100 compounds from ZINC database were randomly selected as negative training and test instances, respectively. For SVR, the same positive training data were used in each case (but no negative data). For each activity class and SVM/SVR calculation protocol, 10 independent trials were carried out, and the results were averaged. For SVM and SVR models, the hyperparameter C was optimized using 10-fold cross-validation on training data using candidate values of 0.01, 0.1, 1, 5, 10, 20, 50, and 100. For SVM, hyperparameter optimization was guided by maximizing the F1 score; for SVR, optimization aimed to minimize the MAE.Here, n is the number of samples (see also MSE given below). Following hyperparameter optimization, feature weight analysis was carried out for classification and regression models. Weights were categorized as high, medium, or low, depending on whether their absolute value was at least 50, 25–50%, or less than 25% of the maximum weight observed for a given SVM model, respectively. Binary activity (active/inactive) and potency values of test compounds were predicted, and model performance was estimated using different figures of merit. For SVM, the F1 score, AUC, and the recall of active compounds among the top 1% of the ranked test set were determined. For SVR, MAE, MSE, and the Pearson correlation coefficient between the observed and predicted pKi values were calculated. Calculation and data analysis protocols were implemented in Python using Scikit-learn.[24]

15 in total

1. Extended-connectivity fingerprints.

Authors: David Rogers; Mathew Hahn
Journal: J Chem Inf Model Date: 2010-05-24 Impact factor: 4.956

2. Virtual screening of molecular databases using a support vector machine.

Authors: Robert N Jorissen; Michael K Gilson
Journal: J Chem Inf Model Date: 2005 May-Jun Impact factor: 4.956

3. Machine learning methods for property prediction in chemoinformatics: Quo Vadis?

Authors: Alexandre Varnek; Igor Baskin
Journal: J Chem Inf Model Date: 2012-05-25 Impact factor: 4.956

4. QSAR modeling: where have you been? Where are you going to?

Authors: Artem Cherkasov; Eugene N Muratov; Denis Fourches; Alexandre Varnek; Igor I Baskin; Mark Cronin; John Dearden; Paola Gramatica; Yvonne C Martin; Roberto Todeschini; Viviana Consonni; Victor E Kuz'min; Richard Cramer; Romualdo Benigni; Chihae Yang; James Rathman; Lothar Terfloth; Johann Gasteiger; Ann Richard; Alexander Tropsha
Journal: J Med Chem Date: 2014-01-06 Impact factor: 7.446

Review 5. Support vector machines for drug discovery.

Authors: Kathrin Heikamp; Jürgen Bajorath
Journal: Expert Opin Drug Discov Date: 2013-12-05 Impact factor: 6.098

Review 6. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation.

Authors: Hanna Geppert; Martin Vogt; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2010-02-22 Impact factor: 4.956

7. Visualization and Interpretation of Support Vector Machine Activity Predictions.

Authors: Jenny Balfer; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2015-06-02 Impact factor: 4.956

8. ChEMBL: a large-scale bioactivity database for drug discovery.

Authors: Anna Gaulton; Louisa J Bellis; A Patricia Bento; Jon Chambers; Mark Davies; Anne Hersey; Yvonne Light; Shaun McGlinchey; David Michalovich; Bissan Al-Lazikani; John P Overington
Journal: Nucleic Acids Res Date: 2011-09-23 Impact factor: 16.971

9. Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis.

Authors: Jenny Balfer; Jürgen Bajorath
Journal: PLoS One Date: 2015-03-05 Impact factor: 3.240

10. Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds.

Authors: Raquel Rodríguez-Pérez; Martin Vogt; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2017-04-10 Impact factor: 4.956

10 in total

1. Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions.

Authors: Raquel Rodríguez-Pérez; Jürgen Bajorath
Journal: J Comput Aided Mol Des Date: 2020-05-02 Impact factor: 3.686

2. Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery.

Authors: Raquel Rodríguez-Pérez; Jürgen Bajorath
Journal: J Comput Aided Mol Des Date: 2022-03-19 Impact factor: 4.179

3. Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity.

Authors: Salvatore Galati; Dimitar Yonchev; Raquel Rodríguez-Pérez; Martin Vogt; Tiziano Tuccinardi; Jürgen Bajorath
Journal: ACS Omega Date: 2021-01-26

4. Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions.

Authors: Raquel Rodríguez-Pérez; Jürgen Bajorath
Journal: J Comput Aided Mol Des Date: 2021-02-17 Impact factor: 3.686

5. Potential of Snapshot-Type Hyperspectral Imagery Using Support Vector Classifier for the Classification of Tomatoes Maturity.

Authors: Byeong-Hyo Cho; Yong-Hyun Kim; Ki-Beom Lee; Young-Ki Hong; Kyoung-Chul Kim
Journal: Sensors (Basel) Date: 2022-06-09 Impact factor: 3.847

6. Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction.

Authors: Ryosuke Asahara; Tomoyuki Miyao
Journal: ACS Omega Date: 2022-07-25

7. Deep Transfer Learning for Question Classification Based on Semantic Information Features of Category Labels.

Authors: Lei Su; Wenqian Kang; Liping Wu; Di Jiang
Journal: Comput Intell Neurosci Date: 2022-09-30

8. Predicting the Weathering Time by the Empty Puparium of Sarcophaga peregrina (Diptera: Sarcophagidae) with the ANN Models.

Authors: Xiangyan Zhang; Yang Bai; Fernand Jocelin Ngando; Hongke Qu; Yanjie Shang; Lipin Ren; Yadong Guo
Journal: Insects Date: 2022-09-05 Impact factor: 3.139

9. Combining Charge Density Analysis with Machine Learning Tools To Investigate the Cruzain Inhibition Mechanism.

Authors: Adriano M Luchi; Roxana N Villafañe; J Leonardo Gómez Chávez; M Lucrecia Bogado; Emilio L Angelina; Nelida M Peruchena
Journal: ACS Omega Date: 2019-11-12

10. Hybrid Harris hawks optimization with cuckoo search for drug design and discovery in chemoinformatics.

Authors: Essam H Houssein; Mosa E Hosney; Mohamed Elhoseny; Diego Oliva; Waleed M Mohamed; M Hassaballah
Journal: Sci Rep Date: 2020-09-02 Impact factor: 4.379

10 in total