Literature DB >> 18302356

Data mining a small molecule drug screening representative subset from NIH PubChem.

Xiang-Qun Xie1, Jian-Zhong Chen.   

Abstract

PubChem is a scientific showcase of the NIH Roadmap Initiatives. It is a compound repository created to facilitate information exchange and data sharing among the NIH Roadmap-funded Molecular Library Screening Center Network (MLSCN) and the scientific community. However, PubChem has more than 10 million records of compound information. It will be challenging to conduct a drug screening of the whole database of millions of compounds. Thus, the purpose of the present study was to develop a data mining cheminformatics approach in order to construct a representative and structure-diverse sublibrary from the large PubChem database. In this study, a new chemical diverse representative subset, rePubChem, was selected by whole-molecule chemistry-space matrix calculation using the cell-based partition algorithm. The representative subset was generated and was then subjected to evaluations by compound property analyses based on 1D and 2D molecular descriptors. The new subset was also examined and assessed for self-similarity analysis based on 2D molecular fingerprints in comparing with the source compound library. The new subset has a much smaller library size (540K compounds) with minimum similarity and redundancy without loss of the structural diversity and basic molecular properties of its parent library (5.3 million compounds). The new representative subset library generated could be a valuable structure-diverse compound resource for in silico virtual screening and in vitro HTS drug screening. In addition, the established subset generation method of using the combined cell-based chemistry-space partition metrics with pairwised 2D fingerprint-based similarity search approaches will also be important to a broad scientific community interested in acquiring structurally diverse compounds for efficient drug screening, building representative virtual combinatorial chemistry libraries for syntheses, and data mining large compound databases like the PubChem library in general.

Mesh:

Substances:

Year:  2008        PMID: 18302356     DOI: 10.1021/ci700193u

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  33 in total

1.  Development and implementation of (Q)SAR modeling within the CHARMMing web-user interface.

Authors:  Iwona E Weidlich; Yuri Pevzner; Benjamin T Miller; Igor V Filippov; H Lee Woodcock; Bernard R Brooks
Journal:  J Comput Chem       Date:  2014-11-03       Impact factor: 3.376

2.  An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data.

Authors:  Ming Hao; Yanli Wang; Stephen H Bryant
Journal:  Anal Chim Acta       Date:  2013-11-06       Impact factor: 6.558

3.  Targeting cannabinoid receptor-2 pathway by phenylacetylamide suppresses the proliferation of human myeloma cells through mitotic dysregulation and cytoskeleton disruption.

Authors:  Rentian Feng; Qin Tong; Zhaojun Xie; Haizi Cheng; Lirong Wang; Suzanne Lentzsch; G David Roodman; Xiang-Qun Xie
Journal:  Mol Carcinog       Date:  2015-01-16       Impact factor: 4.784

Review 4.  Screening the cellular microenvironment: a role for microfluidics.

Authors:  Jay W Warrick; William L Murphy; David J Beebe
Journal:  IEEE Rev Biomed Eng       Date:  2008-11-05

5.  Compound acquisition and prioritization algorithm for constructing structurally diverse compound libraries.

Authors:  Chao Ma; John S Lazo; Xiang-Qun Xie
Journal:  ACS Comb Sci       Date:  2011-04-18       Impact factor: 3.784

6.  A novel method for mining highly imbalanced high-throughput screening data in PubChem.

Authors:  Qingliang Li; Yanli Wang; Stephen H Bryant
Journal:  Bioinformatics       Date:  2009-10-13       Impact factor: 6.937

7.  Investigating the correlations among the chemical structures, bioactivity profiles and molecular targets of small molecules.

Authors:  Tiejun Cheng; Yanli Wang; Stephen H Bryant
Journal:  Bioinformatics       Date:  2010-10-13       Impact factor: 6.937

Review 8.  Recent advances in fragment-based QSAR and multi-dimensional QSAR methods.

Authors:  Kyaw Zeyar Myint; Xiang-Qun Xie
Journal:  Int J Mol Sci       Date:  2010-10-08       Impact factor: 5.923

9.  Lead discovery, chemistry optimization, and biological evaluation studies of novel biamide derivatives as CB2 receptor inverse agonists and osteoclast inhibitors.

Authors:  Peng Yang; Kyaw-Zeyar Myint; Qin Tong; Rentian Feng; Haiping Cao; Abdulrahman A Almehizia; Mohammed Hamed Alqarni; Lirong Wang; Patrick Bartlow; Yingdai Gao; Jürg Gertsch; Jumpei Teramachi; Noriyoshi Kurihara; Garson David Roodman; Tao Cheng; Xiang-Qun Xie
Journal:  J Med Chem       Date:  2012-10-31       Impact factor: 7.446

10.  Effects of acrylamide on the activity and structure of human brain creatine kinase.

Authors:  Qing Sheng; He-Chang Zou; Zhi-Rong Lü; Fei Zou; Yong-Doo Park; Yong-Bin Yan; Shan-Jing Yao
Journal:  Int J Mol Sci       Date:  2009-11-20       Impact factor: 6.208

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.