| Literature DB >> 24358818 |
Ye Hu1, Jurgen Bajorath1.
Abstract
We have generated a number of compound data sets and programs for different types of applications in pharmaceutical research. These data sets and programs were originally designed for our research projects and are made publicly available. Without consulting original literature sources, it is difficult to understand specific features of data sets and software tools, basic ideas underlying their design, and applicability domains. Currently, 30 different entries are available for download from our website. In this data article, we provide an overview of the data and tools we make available and designate the areas of research for which they should be useful. For selected data sets and methods/programs, detailed descriptions are given. This article should help interested readers to select data and tools for specific computational investigations.Entities:
Year: 2012 PMID: 24358818 PMCID: PMC3782340 DOI: 10.12688/f1000research.1-11.v1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Publicly available data sets and programs.
A list of 30 entries providing data sets and/or methods/programs is shown. For each entry, research area indices are assigned as described in the text, i.e. area ‘A’ indicates virtual screening (similarity searching), fingerprint engineering and machine learning; area ‘B’ represents molecular selectivity analysis, area ‘C’ SAR visualization, and area ‘D’ structure-activity or -selectivity relationship-oriented data mining. In addition, publication information is given. For compound data sets, short descriptions are provided. Selected compound data sets are highlighted in red and discussed in the text.
| Entry | Year | Area Index | Provided | Data set description |
|---|---|---|---|---|
|
| 2007 | A | Data sets | Nine activity classes (ACs) with increasing structural diversity |
| 2
[ | 2007 | A | Data sets | A list of ~1.44 million ZINC compounds used for various virtual screening trials |
| 3
[ | 2007 | A | Methods | – |
|
| 2007 | B | Data sets | Four SD files including 26 selectivity sets where compounds are annotated with selectivity values for different tragets |
|
| 2008 | A; B | Data sets | Seven compound selectivity sets containing 267 biogenic amine GPCR antagonists |
|
| 2008 | A; B | Data sets | 18 selectivity sets involving targets from four protein families |
|
| 2008 | A | Data sets | 25 data sets with compounds of increasing complexity and size |
| 8
[ | 2009 | A | Data sets | A set of 242 compounds with hERG inhibitions |
| 9
[ | 2009 | A; B | Data sets | A set of 243 ionotropic glutamate ion channel antagonists |
| 10
[ | 2009 | C | Data sets; Methods | A sample data set consisting of 51 thrombin inhibitors |
| 11
[ | 2009 | A | Data sets | 20 ACs assembled from the literature and 15 ACs collected from MDDR |
| 12
[ | 2010 | A | Data sets | Eight ACs |
| 13
[ | 2010 | B; D | Methods | – |
| 14
[ | 2010 | C | Data sets; Methods | A sample data set containing 33 kinase inhibtors |
| 15
[ | 2010 | C | Methods | – |
| 16
[ | 2010 | C | Data sets; Methods | A sample data set containing 248 Cathepsin S inhibitors |
|
| 2010 | D | Data sets | Two sets of MMPs identified from BindingDB and ChEMBL, respectively |
| 18
[ | 2010 | C | Data sets; Methods | A sample data set consisting of 874 factor Xa inhibitors |
| 19
[ | 2010 | A | Data sets | 17 target-directed scaffold sets where each set contains a minimum of 10 distinct scaffolds and each scaffold represents five compounds |
| 20
[ | 2011 | C | Data sets | A list of 10,489 GSK malaria screening hits |
| 21
[ | 2011 | D | Data sets | A total of 458 target sets with scaffolds and scaffold hierarchies |
| 22
[ | 2011 | C | Data sets | Four data sets containing compounds active against three or four targets |
| 23
[ | 2011 | C | Data sets | A set of 881 factor Xa inhibitors |
|
| 2011 | A | Data sets | 50 prioritized ACs for similarity search benchmarking |
|
| 2011 | A | Data sets | 25 data sets from successful prospective ligand-based virtual screening applications |
| 26
[ | 2011 | D | Data sets | A list of 26 conserved scaffolds in activity profile sequences of length four |
| 27
[ | 2011 | A | Methods | – |
| 28
[ | 2011 | D | Data sets | Two data sets with exclusive K i and IC 50 measurements |
| 29
[ | 2012 | C | Data sets | Four ACs |
|
| 2012 | D | Data sets | Five sets of activity cliffs representing different cliff types |
Description of programs and methods.
Eight entries with methods/programs are listed. For each entry, a brief description is provided. Selected entries are highlighted in red and discussed in the text.
| Entry | Topic | Description |
|---|---|---|
| 3 | Histogram filtering method | A molecular similarity-based method for the identification of active compounds |
|
| Combinatorial analog graph (CAG) | A methodology that systematically organizes compound analogue series according to substitution sites and identifies combinations of sites that determine SAR discontinuity |
| 13 | Target-selectivity patterns of scaffolds | An data mining analysis to identify target-selective scaffolds and their corresponding target-selectivity patterns |
|
| Multi-target CAG | A methodology for the study of multi-target SARs and identification of substitution sites in analogue series |
|
| SARANEA | A freely available program to mine structure-activity and selectivity relationship information in compound data sets |
|
| 3D activity landscape | A computational approach to derive 3D activity landscapes for compound data sets |
|
| Similarity potency tree (SPT) | An intuitive method for visualizing local SARs and prioritizing subsets of compounds of high structural similarity and high SAR information content |
|
| Scaffold distance function | A quantitative measure of structural distance between molecular scaffolds |