Literature DB >> 26784447

Average Information Content Maximization--A New Approach for Fingerprint Hybridization and Reduction.

Marek Śmieja1, Dawid Warszycki2.   

Abstract

Fingerprints, bit representations of compound chemical structure, have been widely used in cheminformatics for many years. Although fingerprints with the highest resolution display satisfactory performance in virtual screening campaigns, the presence of a relatively high number of irrelevant bits introduces noise into data and makes their application more time-consuming. In this study, we present a new method of hybrid reduced fingerprint construction, the Average Information Content Maximization algorithm (AIC-Max algorithm), which selects the most informative bits from a collection of fingerprints. This methodology, applied to the ligands of five cognate serotonin receptors (5-HT2A, 5-HT2B, 5-HT2C, 5-HT5A, 5-HT6), proved that 100 bits selected from four non-hashed fingerprints reflect almost all structural information required for a successful in silico discrimination test. A classification experiment indicated that a reduced representation is able to achieve even slightly better performance than the state-of-the-art 10-times-longer fingerprints and in a significantly shorter time.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26784447      PMCID: PMC4718645          DOI: 10.1371/journal.pone.0146666

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Fingerprints are one of the most popular methods of converting chemical structures into a form that can be used in, e.g., machine learning experiments. They encode a compound’s structural features into a bitstring, where “1” and “0” mean the presence or absence, respectively, of a particular pattern. Fingerprints are divided into two subgroups: non-hashed fingerprints (e.g., Substructure fingerprint, Klekotha-Roth fingerprint), which encodes precisely defined structural patterns, and hashed fingerprints (e.g., Extended fingerprint, Graph-only fingerprint) which are without an assigned meaning for each bit (Fig 1). Fingerprints are widely used in classification problems or similarity searching; therefore, they have found application in computer-aided drug design campaigns [1-8].
Fig 1

Exemplary hashed (A) and non-hashed (B) fingerprints.

Presence of “1” and “0” corresponds to presence or absence of a particular pattern, repectively. In case of hashed fingerprint (A) bit collision phenomena is presented—one bit encodes more than one motif.

Exemplary hashed (A) and non-hashed (B) fingerprints.

Presence of “1” and “0” corresponds to presence or absence of a particular pattern, repectively. In case of hashed fingerprint (A) bit collision phenomena is presented—one bit encodes more than one motif. A multitude of structural features present in chemical compounds results in fingerprints, among which, the longest one contains 4860 bits [9]. The physical impossibility of the occurrence of hundreds of chemical substructures in low-molecular-weight chemical compounds and the biological insignificance of many bits increase the noise level in classification experiments. Moreover, the high resolution of the data increases the computational time, which is crucial in large virtual screening cascades. Therefore, the reduction of fingerprint length without the loss of any meaningful information has become an important cheminformatics challenge in recent years. Several methodologies, e.g., consensus fingerprints [10], bit scaling [11], reverse fingerprints [12] and bit silencing [13] were introduced to reduce fingerprints via the weighting of particular bits. Another approach proposed by Nisius et al. selects fingerprint bits according to their discrimination power which is measured by Kullback-Leibler divergence [14]. The method was applied to single fingerprints as well as to collections of fingerprints, leading to a successful attempt at fingerprint hybridization. [15]. In this study, we introduce a new method for fingerprint hybridization and reduction—Average Information Content Maximization (AIC-Max algorithm). The algorithm uses an extended version of mutual information, hereafter referred as the Average Information Content (AIC), to select the most informative bits of different fingerprints needed for splitting active from inactive compounds. In contrast to the aforementioned techniques, the AIC-Max algorithm may construct an optimal fingerprint for several biological targets. This approach substantially extends its application area. The strength of the AIC-Max algorithm stems from the fact that the selection process evaluates the discrimination power of entire groups of bits instead of single ones. Consequently, the algorithm will not select two features that carry similar information. The proposed methodology was applied to create a reduced representation dedicated to the analysis of five closely related serotonin receptors: 5-HT2, 5-HT2, 5-HT2, 5-HT5 and 5-HT6 (members of the G-protein coupled receptor superfamily) that play an important role in, e.g., the central nervous system (CNS) [16]. The algorithm was additionally tested on four other targets families: carbonic anhydrases, cathepsins, histamine receptors and kinases (See S1 File). Although the advantages of hashed fingerprints cannot be denied, only non-hashed fingerprints were considered in the current study. This conscious abandonment of hashed fingerprints was due to the lack of predefined substructural features and bit collision phenomenon (the same bit is set by multiple patterns) commonly occurring in those fingerprints [17], which make the structural interpretation of particular fingerprint coordinates nearly impossible. A hybrid fingerprint, reduced to 100 bits, reflects 99.77% of the information needed to distinguish active compounds from inactive ones (Fig 2) and contains structural patterns typical for serotonin receptors ligands, such as positively polarizable nitrogen atoms and aromatic systems.
Fig 2

The relationship between the number of bits selected by the AIC-Max algorithm and information related activity.

The information, measured by AIC Eq (1), was averaged over all datasets used in the underlying study.

The relationship between the number of bits selected by the AIC-Max algorithm and information related activity.

The information, measured by AIC Eq (1), was averaged over all datasets used in the underlying study. A reduced representation significantly outperformed four standard non-hashed fingerprints in a classification experiment and achieved slightly better results in comparison to hashed fingerprints generated by PaDEL software [18] when a random forest classifier [19] was used. Moreover, the average training time of the random forest predictor compared to the Extended fingerprint was reduced almost 20 times. The constructed fingerprint generalized well to related biological targets such as the 5-HT1 receptor as shown by additional tests. The results indicate that AIC-Max algorithm is an efficient method for fingerprint reduction and hybridization, opening new perspectives for both virtual screening campaigns and structural analysis of chemical space covered by ligands acting on similar targets.

Materials and Methods

The Average Information Content Maximization algorithm (AIC-Max algorithm) uses the notion of Average Information Content (AIC) to rank the features by their significance. The AIC quantifies the percentage of information that a set of features carries of the activity with respect to a set of biological receptors (the corresponding set of activity variables will be denoted by ). The AIC is defined as the mutual information normalized by the entropy SE(Y) [20-22], averaged over where S = {0,1} is a set of all binary sequences of length N and P(y), P(x), P(x;y) denote the probabilities that {Y = y}, {X1 = x1, …, X = x}, {X1 = x1, …, X = x, Y = y}, respectively. If fully determines the activity of all receptors, then AIC = 1; for independent of all elements of , it returns value 0. The set of features that reflects all the information of the activity against l receptors and none of the information for the remaining (k − l) receptors gives , as demonstrated in Table 1. For closely related biological targets, however, the most informative features usually overlap to a large extent.
Table 1

Minimal and maximal values of AIC.

The 3-bit fingerprint representation X1 X2 X3 of eight compounds and their activity labels Y1, Y2, Y3 given three biological targets, as listed in the table. Since the activity of the i-th receptor is fully determined by a single feature X, then AIC(X) = 1, for i = 1,2,3. In contrast, AIC(X) = 0, for i ≠ j because Y is independent of X. Finally, , since the activity of two out of three receptors was fully reflected by two bits.

compound no.X1X2X3Y1 = X1Y2 = X2Y3 = X3
1000000
2001001
3010010
4011011
5100100
6101101
7110110
8111111

Minimal and maximal values of AIC.

The 3-bit fingerprint representation X1 X2 X3 of eight compounds and their activity labels Y1, Y2, Y3 given three biological targets, as listed in the table. Since the activity of the i-th receptor is fully determined by a single feature X, then AIC(X) = 1, for i = 1,2,3. In contrast, AIC(X) = 0, for i ≠ j because Y is independent of X. Finally, , since the activity of two out of three receptors was fully reflected by two bits. The important point is that the value of AIC depends on the joint information contained in all features included in . In particular, if X1 = X2 then The above equality always holds if the correlation between X1 and X2 equals 1. In other words, the repeated addition of the same feature does not increase the value of AIC. In contrast, the extension of the set of features by an additional element cannot decrease AIC, as illustrated in Table 2.
Table 2

Influence of dependent and independent bits on AIC.

The activity of a given receptor depends only on two out of four features: X1 and X2. The addition of feature X3 to X1 does not change AIC because it is independent of Y, which results in AIC(X1) = AIC(X1, X3) = 0.38. The same holds for X4, which is completely correlated with X1, and AIC(X1) = AIC(X1, X4) = 0.38.

compound no.X1X2X3X4 = NOT(X1)Y = X1 AND X2
100010
200110
301010
401110
510000
610100
711001
811101

Influence of dependent and independent bits on AIC.

The activity of a given receptor depends only on two out of four features: X1 and X2. The addition of feature X3 to X1 does not change AIC because it is independent of Y, which results in AIC(X1) = AIC(X1, X3) = 0.38. The same holds for X4, which is completely correlated with X1, and AIC(X1) = AIC(X1, X4) = 0.38. To calculate AIC for a given set of receptors , the datasets of compounds for each can be created separately. This consideration implies that a single instance (compound) does not have a known activity label for all considered receptors. It is an important property because most of the compounds have proven activity (or inactivity) only for one receptor. It is worth mentioning that this reasoning cannot be applied to classical mutual information, where the activity of every compound has to be provided to perform analogical evaluation. Given a set of all features (fingerprint coordinates), the goal is to find an N-element subset of such that is maximal. In practice, it might be impossible to calculate AIC for all subsets of features to determine the most informative one (e.g, the number of m-element subsets of n-features equals which even for n = 1000 and m = 10 gives about 2 ⋅ 1023). The proposed AIC-Max algorithm uses a heuristic search in the space of all features to reduce the computational time of the entire selection process. It iteratively picks these coordinates which maximize —the information contained in already chosen features. The selection of N features is described as follows: AIC-Max algorithm: Input: – set of given features Output: – set of selected features 1. initialize , 2. iterate N-times: (a) find which maximizes , (b) update . To provide more efficient computations, the calculation of AIC in step 2a can be performed for a randomly selected n ≤ N element subset of —in the experiments we used n = 10. The concept of the AIC is based on information theory and is partially related to Asymmetric Clustering Index [23]. The most fundamental concept in information theory is Shannon entropy (SE), which quantifies the information contained in a given feature X [20]. Formally, if X takes values in {1, …, k}, then: where P(i) is a probability of observation {Y = i}. Note, that SE(Y) = 0 if X = constant. In contrast, if all values of X are equally probable, then SE attains a maximal value of log2 k. To measure the joint information shared by two features, the notion of mutual information (MI) has to be used [20]. For X and Y taking values in {1, …, k}, the MI is formulated as follows: where P(i;j) is the probability that {X = i, Y = j}. It can also be naturally extended to the set of features : the indexes i and j in the above expression must to be replaced by sequences of indexes (i1, …, i), (j1, …, j), respectively [20]. The evaluation of MI for a set of features and a set of receptors requires a single data set of chemical compounds and corresponding activity labels for all receptors. This makes technically impossible the application of MI for a determination of the most informative subset of features with respect to various receptors because there usually does not exist a representative data set where each compound has proven activity or inactivity given arbitrary . To overcome this problem, the calculation of was replaced by the computation of individual factors . These partial results are gathered into final form by averaging: The normalization by the entropy of Y ensures that every factor describes the percentage of joint information instead of the absolute amount of information. In particular:

Results and Discussion

The experiments concerned the application of the AIC-Max algorithm for the selection of the most significant bits for ligands acting on five closely related biological receptors: 5-HT2, 5-HT2, 5-HT2, 5-HT5, 5-HT6. Among all fingerprints generated in the PaDEL software, only non-hashed fingerprints were considered: EState, MACCS, PubChem and Substructure (possessing 1434 bits in total) to ensure the structural analysis of selected bits (Table 3). Although hashed representations can be more efficient for classification purposes, their coordinates do not have a straightforward meaning. Therefore, they were not incorporated into the selection process. Moreover, the longest fingerprint (KRFP), although it was non-hashed, was skipped because a high number of bits results in a rapid increase of the computational time required by the feature selection process. Clearly, some of the chemical patterns can be duplicated while concatenating the above four fingerprints together. Nevertheless, since the repeated addition of the same feature does not increase the value of AIC, there is no risk that the algorithm will pick two identical (or even very similar) bits for final representation.
Table 3

Fingerprints generated in PaDEL software [18].

FingerprintAbbreviationHashedLength
EState fingerprint [24]estateNO79
MACCS fingerprint [25]maccsNO166
PubChem fingerprint [18]pubchemNO881
Substructure fingerprint [18]substructureNO308
Klekota Roth fingerprint [9]KRFPNO4860
Fingerprint [26]fingerprintYES1024
Extended fingerprint [18]extendedYES1024
Graph-only fingerprint [18]graph onlyYES1024
All ligands were extracted from ChEMBL database version 20 (February 2015) [27]. Ligands with an inhibition constant (K) less than or equal to 100 nM were considered active; ligands with K higher than 1000 nM were used as inactives. Putative inactive compounds were randomly selected from the ZINC database [28] in a ratio of 9 inactives per 1 active (Table 4) [29].
Table 4

The summary of datasets used in the selection process.

ReceptorActivesInactivesZINC
5-HT2A2060108118540
5-HT2B4283413852
5-HT2C1303105011727
5-HT5A69146621
5-HT6162642614634
5-HT1A4427123039843
To evaluate the significance of the selected features, a 10-fold cross-validation was performed [30]. In this approach, a dataset is randomly partitioned into 10 equally sized subsets. Then, a single subset is retained as test data while the remaining 9 subsets are used in training. This process is repeated 10 times—each of 10 subsamples is used exactly once as the test data, and the results are averaged. The AIC-Max algorithm was run on a training data set (including actives, inactives and putative inactives), and the evaluation of selected features was reported for a test set. The score was measured by the normalized mutual information Eq (2) between the constructed representation and the true activity labels for each of the receptors. Information stored in a reduced fingerprint grows gradually with the increase in the number of features selected by AIC-Max algorithm (Fig 3). The level of 90% was rapidly attained by a representation containing approximately 20 bits for both datasets containing true inactives and compounds selected from ZINC. Nevertheless, to distinguish almost all considered active compounds from inactives, a set of 100 bits is required (more than 99% of information), while for putative inactives, only 30 bits suffice (close to 100% of information). This outcome is due to two particular reasons: the close structural similarity between actives and true inactives and the small amount of compounds with confirmed inactivity (Table 4).
Fig 3

The relationship between the number of bits selected by the AIC-Max algorithm and associated information of activity.

The information score was measured by the normalized mutual information calculated for constructed representations for every receptor averaged over all folds reported on a test set.

The relationship between the number of bits selected by the AIC-Max algorithm and associated information of activity.

The information score was measured by the normalized mutual information calculated for constructed representations for every receptor averaged over all folds reported on a test set. Because the AIC-Max algorithm returned slightly different subsets of bits in each fold, the algorithm was additionally applied to the entire dataset to obtain a single set of features. The reduced fingerprint (see S1 File for details) contained features that are crucial in ligand-protein interaction for serotonin receptors: a positively polarizable nitrogen atom and an aromatic system [31]. Moreover, the bit encoding the tertiary nitrogen atom is the most desirable in the reduction and hybridization process. Polarizable nitrogen atoms are encoded by several bits listed in the top-scored instances. The same situation can also be observed for the aromatic system, which appears three times out of the 10 most desirable bits. Amide and sulfonamide moieties (and their subelements) are another popular patterns present in universal fingerprint, which reflect actual trends in medicinal chemistry [32-36]. The quality of the bits chosen by the AIC-Max algorithm was verified in a classification experiment conducted for the 5 underlying serotonin receptor ligands. As a classification method, a random forests technique [19] implemented in randomForest R package was used because it is known to be one of the state-of-the-art approaches in activity prediction [6]. The accuracy of classification was evaluated via Matthews Correlation Coefficient (MCC), the well-known validation measure, especially for imbalanced datasets. This measure is defined as [37]: where TP stands for the number of true positives (actives labeled as actives), TN—true negatives, FP—false positives (inactives labeled as actives) and FN—false negatives. MCC takes values from -1 to +1; The number +1 represents perfect prediction while 0 represents random prediction and − 1 represents an inverse prediction. The experiment also assumed a 10-fold cross-validation procedure; a training set was used for a selection of bits and training of a classifier which was then evaluated on a test set. In each fold the AIC-Max algorithm was run for a merged set of actives, inactives and putative inactives to enforce generality of representation. On the other hand, the classifier was trained and tested separately on compounds of proven activity and on datasets containing active and putative inactive compounds. The addition of new features leads to the statistical improvement of the classification results (Fig 4). The highest increase was reported for representations including less than 20 bits. For a higher number of features, the difference in classification accuracy changes slightly. Because the gain in MCC value for representations containing more than 100 bits is negligible; then, longer representations were not taken into further consideration.
Fig 4

Classification performance.

The relationship between the number of bits selected by AIC-Max algorithm and associated MCC score for every receptor averaged over all folds reported on a test set.

Classification performance.

The relationship between the number of bits selected by AIC-Max algorithm and associated MCC score for every receptor averaged over all folds reported on a test set. The classification performance of the representation created for 25, 50 and 100 bits was then compared with original (raw) fingerprints (Tables 5 and 6). The reduced representations including 100 as well as 50 bits outperformed existing fingerprints on all receptors when putative inactive compounds were used. This case is considered the most important one because it reflects virtual screening campaigns [29]. In the case of true inactives, the average MCC score of representation including 100 coordinates was comparable to the best performing hashed fingerprints. Moreover, the time required for training a classifier was approximately 17 times lower when a reduced 100-bits representation was used instead of any of the hashed fingerprints (Fig 5).
Table 5

Classification performance on a dataset containing actives and inactives.

fingerprint5-HT2A5-HT2B5-HT2C5-HT5A5-HT6mean
reduced(25)0.6790.5210.7080.6980.7370.669
reduced(50)0.7310.5580.7430.7240.7460.701
reduced(100)0.7360.6200.7610.7590.7780.731
estate0.4250.4480.5010.6140.5840.514
maccs0.7130.6070.7410.7600.7550.715
pubchem0.7300.5450.7390.7900.7390.709
substructure0.5000.4830.5510.6470.5950.555
KRFP0.6970.5650.7070.7660.7420.695
extended0.7440.5960.7740.7360.8030.730
fingerprinter0.7330.5910.7730.7450.8060.730
graphonly0.7030.5590.7160.7880.7740.708
Table 6

Classification performance on a dataset containing actives and putative inactives.

fingerprint5-HT2A5-HT2B5-HT2C5-HT5A5-HT6mean
reduced(25)0.8890.8280.8870.8760.9330.883
reduced(50)0.9390.8780.9390.9260.9660.929
reduced(100)0.9590.8850.9520.9190.9710.937
estate0.6040.5030.5630.7250.8440.648
maccs0.9360.8770.9320.8940.9700.922
pubchem0.9310.8390.9160.8860.9670.908
substructure0.8200.6600.7430.7830.9060.782
KRFP0.9320.8410.9250.8620.9650.905
extended0.9360.8580.9200.8840.9670.913
fingerprinter0.9320.8520.9180.8680.9660.907
graphonly0.9160.8230.8960.8880.9540.895
Fig 5

Classification times.

Mean training times of a random forest classifier for various fingerprint representations averaged over all data sets of active and inactive compounds.

Classification times.

Mean training times of a random forest classifier for various fingerprint representations averaged over all data sets of active and inactive compounds. Finally, the generalization ability of created representation for another serotonin receptor was examined. A classification experiment was conducted on 5-HT1 receptor ligands assuming reduced representation selected for five base receptors. Surprisingly, the extended fingerprint achieved perfect precision for the first dataset including compounds with proven activity or inactivity (Table 7). Although the reduced representation gave a significantly lower result, MCC = 0.663, it performed better than any of non-hashed fingerprints. In the case of putative inactives, the performance of constructed representation was slightly better than the MACCS and Extended fingerprints.
Table 7

Classification performance on a dataset containing active and inactive compounds of 5-HT1 receptor (middle column) as well as actives and putative inactives (last column).

The reduced representation was constructed from four non-hashed fingerprints based on five biological targets (first 3 rows). The reduced representation from all fingerprints (except KRFP) was also evaluated (last row).

fingerprintinactivesZINC
reduced(25)0.5530.893
reduced(50)0.6320.950
reduced(100)0.6630.963
estate0.2500.566
maccs0.6300.961
pubchem0.6590.948
substructure0.3320.886
KRFP0.6500.958
extended1.0000.960
fingerprinter0.7130.957
graphonly0.6270.933
reduced (100) formed from all fingerprints0.9980.961

Classification performance on a dataset containing active and inactive compounds of 5-HT1 receptor (middle column) as well as actives and putative inactives (last column).

The reduced representation was constructed from four non-hashed fingerprints based on five biological targets (first 3 rows). The reduced representation from all fingerprints (except KRFP) was also evaluated (last row). To complement the study and investigate deeper the discriminative power of Extended fingerprint, we also considered a representation created from all fingerprints (Table 3) except KRFP including hashed ones. The results (Table 7) showed that the enhancement by bits from the hashed fingerprints significantly improved the statistics and gave almost ideal separation of actives from inactives. Analogue experiments were conducted also for four another families of biological targets: carbonic anhydrases, cathepsins, histamine receptors and kinases (see S1 File).

Conclusion

The paper introduced the AIC-Max algorithm as a method for fingerprint reduction and hybridization. The algorithm iteratively picks features uncorrelated among themselves to maximize AIC—a modified version of mutual information. In the present study, the algorithm was applied for constructing an essential representation of ligands of five families of closely related tergets. Such a representation can compete with raw fingerprints in classification experiments with significant CPU time reduction. The obtained results confirm that existing fingerprints contain much irrelevant information that may negatively influence on screening performance. The conducted experiments indicate that the generation and application of reduced and hybridized fingerprint allow rapid and effective calculations. The power of the methodology is underlined by the presence in universal representation bits that encode the most important structural features for serotonin receptor ligands: a polarizable nitrogen atom and the aromatic system.

The additional file, which can be retrieved from: http://www.ii.uj.edu.pl/~smieja/aic, contains the full list of 100 most informative bits selected from four non hashed fingerprints for five GPCRS receptors (Table A in S1 File) and the results of experiments conduced for the families of carbonic anhydrases (Tables B, F, J and K in S1 File), cathepsins (Tables C, G, L and M in S1 File, histamine receptors (Tables D, H, N and O in S1 File) and kinases (Tables E, I, Q and P in S1 File).

(PDF) Click here for additional data file.
  27 in total

1.  Quinoline- and isoquinoline-sulfonamide derivatives of LCAP as potent CNS multi-receptor-5-HT1A/5-HT2A/5-HT7 and D2/D3/D4-agents: the synthesis and pharmacological evaluation.

Authors:  Paweł Zajdel; Krzysztof Marciniec; Andrzej Maślankiewicz; Grzegorz Satała; Beata Duszyńska; Andrzej J Bojarski; Anna Partyka; Magdalena Jastrzębska-Więsek; Dagmara Wróbel; Anna Wesołowska; Maciej Pawłowski
Journal:  Bioorg Med Chem       Date:  2012-01-04       Impact factor: 3.641

2.  Novel 2D fingerprints for ligand-based virtual screening.

Authors:  Todd Ewing; J Christian Baber; Miklos Feher
Journal:  J Chem Inf Model       Date:  2006 Nov-Dec       Impact factor: 4.956

3.  Bit silencing in fingerprints enables the derivation of compound class-directed similarity metrics.

Authors:  Yuan Wang; Jürgen Bajorath
Journal:  J Chem Inf Model       Date:  2008-08-13       Impact factor: 4.956

Review 4.  Combinatorial chemistry on solid support in the search for central nervous system agents.

Authors:  Paweł Zajdel; Maciej Pawłowski; Jean Martinez; Gilles Subra
Journal:  Comb Chem High Throughput Screen       Date:  2009-08-01       Impact factor: 1.339

5.  PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints.

Authors:  Chun Wei Yap
Journal:  J Comput Chem       Date:  2010-12-17       Impact factor: 3.376

6.  The development and validation of a novel virtual screening cascade protocol to identify potential serotonin 5-HT(7)R antagonists.

Authors:  Rafał Kurczab; Mateusz Nowak; Zdzisław Chilmonczyk; Ingebrigt Sylte; Andrzej J Bojarski
Journal:  Bioorg Med Chem Lett       Date:  2010-03-06       Impact factor: 2.823

7.  Multi-Step Protocol for Automatic Evaluation of Docking Results Based on Machine Learning Methods--A Case Study of Serotonin Receptors 5-HT(6) and 5-HT(7).

Authors:  Sabina Smusz; Stefan Mordalski; Jagna Witek; Krzysztof Rataj; Rafał Kafel; Andrzej J Bojarski
Journal:  J Chem Inf Model       Date:  2015-04-08       Impact factor: 4.956

8.  New arylpiperazinylalkyl derivatives of 8-alkoxy-purine-2,6-dione and dihydro[1,3]oxazolo[2,3-f]purinedione targeting the serotonin 5-HT1A /5-HT2A /5-HT7 and dopamine D2 receptors.

Authors:  Grażyna Chłoń-Rzepa; Agnieszka Zagórska; Adam Bucki; Marcin Kołaczkowski; Maciej Pawłowski; Grzegorz Satała; Andrzej J Bojarski; Anna Partyka; Anna Wesołowska; Elżbieta Pękala; Karolina Słoczyńska
Journal:  Arch Pharm (Weinheim)       Date:  2015-03-13       Impact factor: 3.751

9.  Chemical substructures that enrich for biological activity.

Authors:  Justin Klekota; Frederick P Roth
Journal:  Bioinformatics       Date:  2008-09-10       Impact factor: 6.937

10.  The influence of negative training set size on machine learning-based virtual screening.

Authors:  Rafał Kurczab; Sabina Smusz; Andrzej J Bojarski
Journal:  J Cheminform       Date:  2014-06-11       Impact factor: 5.514

View more
  5 in total

1.  Practical application of the Average Information Content Maximization (AIC-MAX) algorithm: selection of the most important structural features for serotonin receptor ligands.

Authors:  Dawid Warszycki; Marek Śmieja; Rafał Kafel
Journal:  Mol Divers       Date:  2017-02-09       Impact factor: 2.943

2.  MOTiFS: Monte Carlo Tree Search Based Feature Selection.

Authors:  Muhammad Umar Chaudhry; Jee-Hyong Lee
Journal:  Entropy (Basel)       Date:  2018-05-20       Impact factor: 2.524

3.  Feature Selection based on the Local Lift Dependence Scale.

Authors:  Diego Marcondes; Adilson Simonis; Junior Barrera
Journal:  Entropy (Basel)       Date:  2018-01-30       Impact factor: 2.524

4.  Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets.

Authors:  Muhammad Umar Chaudhry; Muhammad Yasir; Muhammad Nabeel Asghar; Jee-Hyong Lee
Journal:  Entropy (Basel)       Date:  2020-09-29       Impact factor: 2.524

5.  Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction.

Authors:  Magdalena Wiercioch
Journal:  Int J Mol Sci       Date:  2019-05-02       Impact factor: 5.923

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.