| Literature DB >> 28346087 |
Yanli Wang1, Tiejun Cheng1, Stephen H Bryant1.
Abstract
High-throughput screening (HTS) is now routinely conducted for drug discovery by both pharmaceutical companies and screening centers at academic institutions and universities. Rapid advance in assay development, robot automation, and computer technology has led to the generation of terabytes of data in screening laboratories. Despite the technology development toward HTS productivity, fewer efforts were devoted to HTS data integration and sharing. As a result, the huge amount of HTS data was rarely made available to the public. To fill this gap, the PubChem BioAssay database ( https://www.ncbi.nlm.nih.gov/pcassay/ ) was set up in 2004 to provide open access to the screening results tested on chemicals and RNAi reagents. With more than 10 years' development and contributions from the community, PubChem has now become the largest public repository for chemical structures and biological data, which provides an information platform to worldwide researchers supporting drug development, medicinal chemistry study, and chemical biology research. This work presents a review of the HTS data content in the PubChem BioAssay database and the progress of data deposition to stimulate knowledge discovery and data sharing. It also provides a description of the database's data standard and basic utilities facilitating information access and use for new users.Entities:
Keywords: PubChem BioAssay; data sharing; high-throughput screening; open access
Mesh:
Substances:
Year: 2017 PMID: 28346087 PMCID: PMC5480605 DOI: 10.1177/2472555216685069
Source DB: PubMed Journal: SLAS Discov ISSN: 2472-5552 Impact factor: 3.341
PubChem BioAssay Statistics (as of October 10, 2016).
| Description | Small-Molecule Assays | RNAi Assays |
|---|---|---|
| Assay records (AIDs) | 1,218,601 | 91 |
| Substance samples (SIDs) | 3,224,025 | 352,044 |
| Chemical structures (CIDs) | 2,283,536 | — |
| Bioactivity outcomes | 230,270,094 | 1,033,519 |
| Data points | 1,499,625,480 | 14,598,030 |
| Species | 3,543 | 7 |
| Protein targets | 10,182 | — |
| Protein targets (human) | 4,784 | — |
| Gene targets | — | 55,714 |
| Gene targets (human) | — | 24,888 |
| Gene targets with phenotype | — | 15,866 |
Summary of MLP’s HTS Assay Projects.
| Assay Count[ | Compound Count[ | ||||||
|---|---|---|---|---|---|---|---|
| Screening Center | Summary | Primary | Confirmatory | Tested | Active | Chemical Probe | Protein Target Count |
| Broad Institute | 103 | 136 | 950 | 500,665 | 129,547 | 27 | 233 |
| Burnham Center for Chemical Genomics | 102 | 206 | 651 | 419,794 | 143,200 | 36 | 450 |
| Columbia University Molecular Screening Center | 19 | 10 | 197,092 | 9,067 | 9 | ||
| Emory University Molecular Libraries Screening Center | 2 | 22 | 29 | 348,780 | 24,326 | 20 | |
| Johns Hopkins Ion Channel Center | 25 | 103 | 106 | 345,281 | 37,359 | 4 | 23 |
| Molecular Libraries Program, Specialized Chemistry Center, University of Kansas | 2 | 22 | 2,941 | 312 | 10 | ||
| NIH Chemical Genomics Center (NCGC) | 179 | 36 | 976 | 443,829 | 244,064 | 35 | 255 |
| New Mexico Molecular Libraries Screening Center (NMMLSC) | 30 | 167 | 206 | 375,901 | 40,549 | 15 | 69 |
| Penn Center for Molecular Discovery (PCMD) | 26 | 31 | 224,377 | 4,424 | 16 | ||
| Southern Research Specialized Biocontainment Screening Center | 14 | 1 | 272 | 355,238 | 16,350 | 5 | 11 |
| Southern Research Molecular Libraries Screening Center (SRMLSC) | 1 | 47 | 40 | 224,571 | 31,718 | 2 | 11 |
| Scripps Research Institute Molecular Screening Center | 150 | 468 | 703 | 397,994 | 136,876 | 54 | 574 |
| University of Pittsburgh Molecular Library Screening Center | 1 | 32 | 48 | 223,277 | 25,711 | 1 | 16 |
| Vanderbilt Screening Center for GPCRs, Ion Channels and Transporters | 13 | 15 | 73 | 222,812 | 20,078 | 6 | 94 |
| Vanderbilt Specialized Chemistry Center | 10 | 14 | 125 | 1,750 | 683 | 63 | 132 |
AID count.
CID count.
Summary of Small-Molecule HTS Screens (Excluding MLP).
| Data Source | Assay Count | Compound Count | Protein Target Count | |
|---|---|---|---|---|
| Tested[ | Active | |||
| Abbott Labs | 2 | 7,567 | 4,912 | |
| ChemBank | 106 | 5,201 | 1,629 | |
| Chemical genetic matrix | 2 | 13,048 | 1,568 | |
| Cheminformatics & Chemogenomics Research Group (CCRG), Indiana University School of Informatics | 36 | 2,500 | 970 | |
| Chen Lab, School of Medicine, Emory University | 1 | 1,947 | 15 | 1 |
| Circadian Research, Kay Laboratory, University of California at San Diego (UCSD) | 2 | 1,276 | 15 | |
| UCLA Molecular Screening Shared Resource | 1 | 1,385 | 5 | |
| NCI’s Developmental Therapeutics Program (DTP/NCI) | 173 | 176,929 | 25,036 | |
| GlaxoSmithKline (GSK) | 15 | 14,038 | 14,038 | 2 |
| Genomics Institute of the Novartis Research Foundation (GNF)/Scripps Winzeler Lab | 1 | 5,662 | 274 | |
| Gregory J. Crowther | 6 | 13,451 | 227 | 6 |
| ICCB–Longwood/NSRB Screening Facility, Harvard Medical School | 28 | 528,893 | 10,426 | 15 |
| Meiler Lab, Vanderbilt University | 10 | 11,385 | 3,259 | 4 |
| Milwaukee Institute for Drug Discovery | 13 | 17,808 | 1,251 | 1 |
| NCI’s Molecular Targets Development Program (MTDP) | 4 | 99,858 | 861 | 4 |
| NINDS Approved Drug Screening Program | 34 | 1,033 | 190 | |
| NIMH’s Psychoactive Drug Screening Program (PDSP) | 2 | 2,730 | 603 | 2 |
| Southern Research Institute | 10 | 361,147 | 4,871 | 4 |
| Tox21 | 105 | 8,747 | 4,661 | 20 |
| UW Madison, Small Molecule Screening Facility | 1 | 69,794 | 380 | |
| ChEMBL::Novartis Malaria Screening | 6 | 5,614 | 5,014 | |
| ChEMBL::St. Jude Malaria Screening | 16 | 1,523 | ||
Only HTS screens testing more than 1,000 samples are included.
Summary of RNAi HTS Projects.
| Data Source | Assay Count | RNAi Reagent Count | Gene Target Count | |
|---|---|---|---|---|
| Tested | Show Phenotype | |||
| Cancer Research UK Cambridge Research Institute | 1 | 331 | 331 | 97 |
| Department of Molecular Cell Biology, Weizmann Institute of Science | 1 | 85 | 85 | 20 |
| Drosophila RNAi Screening Center (DRSC) | 37 | 31,356 | 14,276 | 3,894 |
| GE Healthcare Dharmacon RNAi Technologies | 1 | 840 | 840 | 5 |
| Iain Fraser | 14 | 1,512 | 252 | 239 |
| InfectX Consortium | 1 | 115,372 | 18,612 | |
| INSERM, Institut National de la Sante et de la Recherche Medicale | 2 | 22,950 | ||
| Peterson Lab, Genentech | 1 | 158 | 157 | 33 |
| siGENOME Human KINOME Library (BTR reporter screen) | 1 | 714 | 713 | 49 |
| Genomics Institute of the Novartis Research Foundation (GNF) | 1 | 33,364 | 17,453 | 268 |
| Victorian Centre for Functional Genomics, Peter MacCallum Cancer Centre | 12 | 39,160 | 34,619 | 3,690 |
| VTT Technical Research Centre of Finland (CSMA) | 1 | 1,380 | 660 | 422 |
Figure 1.Browse HTS projects using the PubChem BioAssay Classification Tree. A subtree node can be expanded by a click on the triangle icon. The count of BioAssay records associated with each node is shown, and clickable linking to the corresponding list of BioAssay records in Entrez.
Figure 2.Summary of compounds in the MLSMR library that are associated with biological data. The x axis provides a count of BioAssay accessions (AIDs); the y axis provides the percentage of the substance samples in MLSMR that are tested across multiple assays at a given count of AIDs. x axis for (a) counts of all tested assays and for (b) counts of only active assays.
Figure 3.Growth of the MLP’s HTS data, including BioAssay records, tested substances, unique chemical structures, bioactivity outcomes, data points, protein targets, and species.
Figure 4.Hit rates for MLP centers. The red dot shows the median of hit rates for each center. Only primary assays that screened more than 100,000 substance samples were included.
Figure 5.A summary of the MLP assay records (AID count) and chemical probes (probe count) among classes of assay targets. The number of assay records at the two phases of MLP are indicated by MLSCN and MLPCN; the count of chemical probes is indicated by “Probe.”