| Literature DB >> 27239230 |
Jeremy J Yang1, Oleg Ursu1, Christopher A Lipinski2, Larry A Sklar3, Tudor I Oprea1, Cristian G Bologa1.
Abstract
BACKGROUND: Bioassay data analysis continues to be an essential, routine, yet challenging task in modern drug discovery and chemical biology research. The challenge is to infer reliable knowledge from big and noisy data. Some aspects of this problem are general with solutions informed by existing and emerging data science best practices. Some aspects are domain specific, and rely on expertise in bioassay methodology and chemical biology. Testing compounds for biological activity requires complex and innovative methodology, producing results varying widely in accuracy, precision, and information content. Hit selection criteria involve optimizing such that the overall probability of success in a project is maximized, and resource-wasteful "false trails" are avoided. This "fail-early" approach is embraced both in pharmaceutical and academic drug discovery, since follow-up capacity is resource-limited. Thus, early identification of likely promiscuous compounds has practical value.Entities:
Keywords: Compound promiscuity; Drug discovery informatics; High-throughput screening (HTS); Molecular scaffolds; Statistical learning
Year: 2016 PMID: 27239230 PMCID: PMC4884375 DOI: 10.1186/s13321-016-0137-3
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Badapple pScore ranges
| pScore range | Advisory |
|---|---|
| ~ | Unknown; no data |
| 0–99 | Low pScore; no indication |
| 100–299 | Moderate pScore; weak indication of promiscuity |
| 300+ | High pScore; strong indication of promiscuity |
Fig. 1Badapple score dependence on assay-active and assay-tested statistics
Fig. 2Top promiscuous scaffolds, ranked by Badapple score (see Additional file 2 for full statistics)
Fig. 3Promiscuity score distribution
Fig. 4ROC curves, total bioactivity versus ranked top scaffolds for top 5 %
Badapple datasets
| #scafs | Tested | Nonzero | #assy | Activities | Date | Source | |
|---|---|---|---|---|---|---|---|
| Bard1 | 146,024 | 141,642 | 54,136 | 510 | 30M | 2013-01 | BARD |
| Bard2 | 143,098 | 137,668 | 52,328 | 383 | 46M | 2014-06 | BARD |
| Pc1 | 143,098 | 141,533 | 60,200 | 822 | 223M | 2014-06 | PubChem |
| Pc2 | 143,098 | 125,940 | 50,912 | 527 | 113M | 2010-12 | PubChem |
BARD assay counts are experiment counts. Tested means bioassay data exist. Nonzero means tested with nonzero scores
Badapple dataset comparison: scaffolds in common (total/non-zero)
| Bard2 | Pc2 | Pc1 | |
|---|---|---|---|
| Bard1 | 141,896 | 141,629 | 141,629 |
| Bard2 | 142,817 | 142,817 | |
| Pc2 | 143,087 |
Badapple dataset comparison: PScore correlation, Pearson/Spearman-rank
| Bard2 | Pc2 | Pc1 | |
|---|---|---|---|
| Bard1 | 0.85 | 0.95 | 0.92 |
| Bard2 | 0.86 | 0.85 | |
| Pc2 | 0.96 |
Retrospective comparison of high scores, pc2 versus pc1
| Scaffold | pscore_pc2 | pscore_pc1 | pscore_diff | wTested_diff |
|---|---|---|---|---|
|
| 436 | 395 | 41 | 7,234,388 |
|
| 374 | 343 | 31 | 5,511,793 |
|
| 443 | 369 | 74 | 2,644,122 |
|
| 432 | 392 | 40 | 1,701,524 |
|
| 578 | 468 | 110 | 1,652,980 |
|
| 618 | 627 | −9 | 1,019,584 |
|
| 375 | 315 | 60 | 899,779 |
|
| 463 | 366 | 97 | 654,420 |
|
| 365 | 361 | 4 | 524,438 |
|
| 461 | 367 | 94 | 496,504 |
|
| 358 | 303 | 55 | 422,280 |
|
| 805 | 696 | 109 | 376,575 |
|
| 403 | 331 | 72 | 358,012 |
|
| 459 | 487 | −28 | 319,746 |
|
| 841 | 721 | 120 | 318,307 |
Scaffolds ranked by number of new activity data (wells tested) after pc1 and in pc2. The small changes in score confirm initial trends, for many new targets and assays
K-fold cross validation (K = 5): Ntotal = 389,533, Pearson correlation, all test scores
| k | Ntrain | Ntest | Correlation |
|---|---|---|---|
| 1 | 311,540 | 77,993 | 0.895 |
| 2 | 311,463 | 78,070 | 0.891 |
| 3 | 311,652 | 77,881 | 0.898 |
| 4 | 311,565 | 77,968 | 0.901 |
| 5 | 311,454 | 78,079 | 0.903 |
Medchem analysis of selected high scoring, promiscuous scaffolds
|
| Scaffold of well-known toxin toxoflavin present in |
|
| Scaffold a 6 |
|
| Scaffold likely made by reaction of orthophenylene diamine with the corresponding furanyl alpha diketone. It could be a false positive if contaminated with the furanyl alpha diketone. It has only weak metal coordinating activity |
|
| Scaffold the synthesis of the tricyclic scaffold by a malononitrile cyclization with a 2-amino-3-formyl-4-oxo-4 |
|
| Scaffold is reported to possess strong fluorescence, UV absorbance as well as strong mutagenic activity [ |
Fig. 5HScaf scaffolds of quinine
Fig. 6Badapple public web app, available at http://datascience.unm.edu/public-biocomputing-apps
Fig. 7Database workflow