Literature DB >> 18829357

Data mining PubChem using a support vector machine with the Signature molecular descriptor: classification of factor XIa inhibitors.

Derick C Weis1, Donald P Visco, Jean-Loup Faulon.   

Abstract

The amount of high-throughput screening (HTS) data readily available has significantly increased because of the PubChem project (http://pubchem.ncbi.nlm.nih.gov/). There is considerable opportunity for data mining of small molecules for a variety of biological systems using cheminformatic tools and the resources available through PubChem. In this work, we trained a support vector machine (SVM) classifier using the Signature molecular descriptor on factor XIa inhibitor HTS data. The optimal number of Signatures was selected by implementing a feature selection algorithm of highly correlated clusters. Our method included an improvement that allowed clusters to work together for accuracy improvement, where previous methods have scored clusters on an individual basis. The resulting model had a 10-fold cross-validation accuracy of 89%, and additional validation was provided by two independent test sets. We applied the SVM to rapidly predict activity for approximately 12 million compounds also deposited in PubChem. Confidence in these predictions was assessed by considering the number of Signatures within the training set range for a given compound, defined as the overlap metric. To further evaluate compounds identified as active by the SVM, docking studies were performed using AutoDock. A focused database of compounds predicted to be active was obtained with several of the compounds appreciably dissimilar to those used in training the SVM. This focused database is suitable for further study. The data mining technique presented here is not specific to factor XIa inhibitors, and could be applied to other bioassays in PubChem where one is looking to expand the search for small molecules as chemical probes.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18829357     DOI: 10.1016/j.jmgm.2008.08.004

Source DB:  PubMed          Journal:  J Mol Graph Model        ISSN: 1093-3263            Impact factor:   2.518


  8 in total

1.  A machine learning-based method to improve docking scoring functions and its application to drug repurposing.

Authors:  Sarah L Kinnings; Nina Liu; Peter J Tonge; Richard M Jackson; Lei Xie; Philip E Bourne
Journal:  J Chem Inf Model       Date:  2011-02-03       Impact factor: 4.956

2.  Exploiting PubChem for Virtual Screening.

Authors:  Xiang-Qun Xie
Journal:  Expert Opin Drug Discov       Date:  2010-12       Impact factor: 6.098

3.  REST is a novel prognostic factor and therapeutic target for medulloblastoma.

Authors:  Pete Taylor; Jason Fangusaro; Veena Rajaram; Stewart Goldman; Irene B Helenowski; Tobey MacDonald; Martin Hasselblatt; Lars Riedemann; Alvaro Laureano; Laurence Cooper; Vidya Gopalakrishnan
Journal:  Mol Cancer Ther       Date:  2012-07-30       Impact factor: 6.261

4.  A novel method for mining highly imbalanced high-throughput screening data in PubChem.

Authors:  Qingliang Li; Yanli Wang; Stephen H Bryant
Journal:  Bioinformatics       Date:  2009-10-13       Impact factor: 6.937

5.  Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules.

Authors:  Tiejun Cheng; Yanli Wang; Stephen H Bryant
Journal:  Bioinformatics       Date:  2010-10-13       Impact factor: 6.937

6.  QSAR models for predicting cathepsin B inhibition by small molecules--continuous and binary QSAR models to classify cathepsin B inhibition activities of small molecules.

Authors:  Zhigang Zhou; Yanli Wang; Stephen H Bryant
Journal:  J Mol Graph Model       Date:  2010-02-01       Impact factor: 2.518

7.  Profiling animal toxicants by automatically mining public bioassay data: a big data approach for computational toxicology.

Authors:  Jun Zhang; Jui-Hua Hsieh; Hao Zhu
Journal:  PLoS One       Date:  2014-06-20       Impact factor: 3.240

8.  Pharmaceutical Machine Learning: Virtual High-Throughput Screens Identifying Promising and Economical Small Molecule Inhibitors of Complement Factor C1s.

Authors:  Jonathan J Chen; Lyndsey N Schmucker; Donald P Visco
Journal:  Biomolecules       Date:  2018-05-07
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.