Literature DB >> 21219648

jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints.

Georg Hinselmann1, Lars Rosenbaum, Andreas Jahn, Nikolas Fechner, Andreas Zell.   

Abstract

BACKGROUND: The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. <br> RESULTS: We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. <br> CONCLUSIONS: jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.

Entities:  

Year:  2011        PMID: 21219648      PMCID: PMC3033338          DOI: 10.1186/1758-2946-3-3

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  19 in total

1.  Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.

Authors:  Andreas Bender; Hamse Y Mussa; Robert C Glen; Stephan Reiling
Journal:  J Chem Inf Comput Sci       Date:  2004 Sep-Oct

2.  Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D).

Authors:  Andreas Bender; Hamse Y Mussa; Gurprem S Gill; Robert C Glen
Journal:  J Med Chem       Date:  2004-12-16       Impact factor: 7.446

3.  Screening for dihydrofolate reductase inhibitors using MOLPRINT 2D, a fast fragment-based method employing the naïve Bayesian classifier: limitations of the descriptor and the importance of balanced chemistry in training and test sets.

Authors:  Andreas Bender; Hamse Y Mussa; Robert C Glen
Journal:  J Biomol Screen       Date:  2005-09-16

4.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity.

Authors:  S Joshua Swamidass; Jonathan Chen; Jocelyne Bruand; Peter Phung; Liva Ralaivola; Pierre Baldi
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

5.  SHED: Shannon entropy descriptors from topological feature distributions.

Authors:  Elisabet Gregori-Puigjané; Jordi Mestres
Journal:  J Chem Inf Model       Date:  2006 Jul-Aug       Impact factor: 4.956

6.  The pharmacophore kernel for virtual screening with support vector machines.

Authors:  Pierre Mahé; Liva Ralaivola; Véronique Stoven; Jean-Philippe Vert
Journal:  J Chem Inf Model       Date:  2006 Sep-Oct       Impact factor: 4.956

7.  Benchmark data set for in silico prediction of Ames mutagenicity.

Authors:  Katja Hansen; Sebastian Mika; Timon Schroeter; Andreas Sutter; Antonius ter Laak; Thomas Steger-Hartmann; Nikolaus Heinrich; Klaus-Robert Müller
Journal:  J Chem Inf Model       Date:  2009-09       Impact factor: 4.956

8.  Optimal assignment methods for ligand-based virtual screening.

Authors:  Andreas Jahn; Georg Hinselmann; Nikolas Fechner; Andreas Zell
Journal:  J Cheminform       Date:  2009-08-25       Impact factor: 5.514

9.  Large scale study of multiple-molecule queries.

Authors:  Ramzi J Nasr; S Joshua Swamidass; Pierre F Baldi
Journal:  J Cheminform       Date:  2009-06-04       Impact factor: 5.514

10.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.

Authors:  Christoph Steinbeck; Yongquan Han; Stefan Kuhn; Oliver Horlacher; Edgar Luttmann; Egon Willighagen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr
View more
  15 in total

1.  Identification of a μ-δ opioid receptor heteromer-biased agonist with antinociceptive activity.

Authors:  Ivone Gomes; Wakako Fujita; Achla Gupta; S Adrian Saldanha; Adrian S Saldanha; Ana Negri; Christine E Pinello; Christina Eberhart; Edward Roberts; Marta Filizola; Peter Hodder; Lakshmi A Devi
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-01       Impact factor: 11.205

2.  Improved pose and affinity predictions using different protocols tailored on the basis of data availability.

Authors:  Philip Prathipati; Chioko Nagao; Shandar Ahmad; Kenji Mizuguchi
Journal:  J Comput Aided Mol Des       Date:  2016-10-06       Impact factor: 3.686

3.  Structure-based virtual screening of small-molecule antagonists of platelet integrin αIIbβ3 that do not prime the receptor to bind ligand.

Authors:  Ana Negri; Jihong Li; Sarasija Naini; Barry S Coller; Marta Filizola
Journal:  J Comput Aided Mol Des       Date:  2012-08-15       Impact factor: 3.686

4.  Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases.

Authors:  Shardul Paricharak; Tom Klenka; Martin Augustin; Umesh A Patel; Andreas Bender
Journal:  J Cheminform       Date:  2013-12-13       Impact factor: 5.514

5.  Inferring multi-target QSAR models with taxonomy-based multi-task learning.

Authors:  Lars Rosenbaum; Alexander Dörr; Matthias R Bauer; Frank M Boeckler; Andreas Zell
Journal:  J Cheminform       Date:  2013-07-11       Impact factor: 5.514

6.  A ranking method for the concurrent learning of compounds with various activity profiles.

Authors:  Alexander Dörr; Lars Rosenbaum; Andreas Zell
Journal:  J Cheminform       Date:  2015-01-16       Impact factor: 5.514

7.  Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets.

Authors:  César R García-Jacas; Ernesto Contreras-Torres; Yovani Marrero-Ponce; Mario Pupo-Meriño; Stephen J Barigye; Lisset Cabrera-Leyva
Journal:  J Cheminform       Date:  2016-02-25       Impact factor: 5.514

8.  Prediction of anticancer molecules using hybrid model developed on molecules screened against NCI-60 cancer cell lines.

Authors:  Harinder Singh; Rahul Kumar; Sandeep Singh; Kumardeep Chaudhary; Ankur Gautam; Gajendra P S Raghava
Journal:  BMC Cancer       Date:  2016-02-09       Impact factor: 4.430

9.  The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching.

Authors:  Egon L Willighagen; John W Mayfield; Jonathan Alvarsson; Arvid Berg; Lars Carlsson; Nina Jeliazkova; Stefan Kuhn; Tomáš Pluskal; Miquel Rojas-Chertó; Ola Spjuth; Gilleain Torrance; Chris T Evelo; Rajarshi Guha; Christoph Steinbeck
Journal:  J Cheminform       Date:  2017-06-06       Impact factor: 5.514

10.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation.

Authors:  Jie Dong; Dong-Sheng Cao; Hong-Yu Miao; Shao Liu; Bai-Chuan Deng; Yong-Huan Yun; Ning-Ning Wang; Ai-Ping Lu; Wen-Bin Zeng; Alex F Chen
Journal:  J Cheminform       Date:  2015-12-09       Impact factor: 5.514

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.