| Literature DB >> 29795804 |
Mariusz Butkiewicz1, Yanli Wang2, Stephen H Bryant2, Edward W Lowe1, David C Weaver1, Jens Meiler1.
Abstract
Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as 'active', though its meaning is 'inactive' on the target of interest as it is 'active' on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking.Entities:
Keywords: Datasets; HTS; LB-CADD; PubChem
Year: 2017 PMID: 29795804 PMCID: PMC5962024
Source DB: PubMed Journal: Chem Inform ISSN: 2470-6973
Listing of datasets containing curated compounds uploaded to PubChem.
| Protein Target | Target Class | Internal ID | Number of Actives | PubChem AID |
|---|---|---|---|---|
| Orexin1 Receptor | GPCR | SAID_435008 | 234* | 743306 |
| M1 Muscarinic Receptor agonists | GPCR | SAID_1798 | 188 | 652178 |
| M1 Muscarinic Receptor antagonists | GPCR | SAID_435034 | 447* | 1053187 |
| Potassium Ion Channel Kir2.1 | Ion Channel | SAID_1843 | 172 | 743120 |
| KCNQ2 potassium channel | Ion Channel | SAID_2258 | 287* | 1159610 |
| Cav3 T-type Calcium Channels | Ion Channel | SAID_463087 | 703 | 1053190 |
| Choline Transporter | Transporter | SAID_488997 | 256* | 1053196 |
| Serine/Threonine Kinase 33 | Kinase Inhibitor | SAID_2689 | 172 | 743321 |
| Tyrosyl-DNA Phosphodiesterase | Enzyme | SAID_485290 | 292 | 489007 |
| NPY-Y1 Receptor | GPCR | SAID_1040 | 801 | 1159609 |
| NPY-Y2 Receptor | GPCR | SAID_793 | 699 | 1159608 |
Figure 1Curation process of AID1040. The center green arrow represents the initial set of active compounds while red arrows symbolize a specific subtraction of compounds.
Figure 2Curation process of AID793. The center green arrow leads to the final set of active compounds while red arrows and numbers and type of compounds in red mark compound subtractions.