Literature DB >> 20378557

A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval.

S Joshua Swamidass1, Chloé-Agathe Azencott, Kenny Daily, Pierre Baldi.   

Abstract

MOTIVATION: The performance of classifiers is often assessed using Receiver Operating Characteristic ROC [or (AC) accumulation curve or enrichment curve] curves and the corresponding areas under the curves (AUCs). However, in many fundamental problems ranging from information retrieval to drug discovery, only the very top of the ranked list of predictions is of any interest and ROCs and AUCs are not very useful. New metrics, visualizations and optimization tools are needed to address this 'early retrieval' problem.
RESULTS: To address the early retrieval problem, we develop the general concentrated ROC (CROC) framework. In this framework, any relevant portion of the ROC (or AC) curve is magnified smoothly by an appropriate continuous transformation of the coordinates with a corresponding magnification factor. Appropriate families of magnification functions confined to the unit square are derived and their properties are analyzed together with the resulting CROC curves. The area under the CROC curve (AUC[CROC]) can be used to assess early retrieval. The general framework is demonstrated on a drug discovery problem and used to discriminate more accurately the early retrieval performance of five different predictors. From this framework, we propose a novel metric and visualization-the CROC(exp), an exponential transform of the ROC curve-as an alternative to other methods. The CROC(exp) provides a principled, flexible and effective way for measuring and visualizing early retrieval performance with excellent statistical power. Corresponding methods for optimizing early retrieval are also described in the Appendix. AVAILABILITY: Datasets are publicly available. Python code and command-line utilities implementing CROC curves and metrics are available at http://pypi.python.org/pypi/CROC/ CONTACT: pfbaldi@ics.uci.edu

Mesh:

Year:  2010        PMID: 20378557      PMCID: PMC2865862          DOI: 10.1093/bioinformatics/btq140

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

Review 1.  Assessing the accuracy of prediction algorithms for classification: an overview.

Authors:  P Baldi; S Brunak; Y Chauvin; C A Andersen; H Nielsen
Journal:  Bioinformatics       Date:  2000-05       Impact factor: 6.937

2.  Protocols for bridging the peptide to nonpeptide gap in topological similarity searches.

Authors:  R P Sheridan; S B Singh; E M Fluder; S K Kearsley
Journal:  J Chem Inf Comput Sci       Date:  2001 Sep-Oct

Review 3.  Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures.

Authors:  Jérôme Hert; Peter Willett; David J Wilton; Pierre Acklin; Kamal Azzaoui; Edgar Jacoby; Ansgar Schuffenhauer
Journal:  J Chem Inf Comput Sci       Date:  2004 May-Jun

4.  Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity.

Authors:  S Joshua Swamidass; Jonathan Chen; Jocelyne Bruand; Peter Phung; Liva Ralaivola; Pierre Baldi
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

5.  Assessing the discriminatory power of scoring functions for virtual screening.

Authors:  Markus H J Seifert
Journal:  J Chem Inf Model       Date:  2006 May-Jun       Impact factor: 4.956

6.  The pharmacophore kernel for virtual screening with support vector machines.

Authors:  Pierre Mahé; Liva Ralaivola; Véronique Stoven; Jean-Philippe Vert
Journal:  J Chem Inf Model       Date:  2006 Sep-Oct       Impact factor: 4.956

7.  Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.

Authors:  S Joshua Swamidass; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2007-02-28       Impact factor: 4.956

8.  Managing bias in ROC curves.

Authors:  Robert D Clark; Daniel J Webster-Clark
Journal:  J Comput Aided Mol Des       Date:  2008-02-07       Impact factor: 3.686

9.  Influence relevance voting: an accurate and interpretable virtual high throughput screening method.

Authors:  S Joshua Swamidass; Chloé-Agathe Azencott; Ting-Wan Lin; Hugo Gramajo; Shiou-Chuan Tsai; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2009-04       Impact factor: 4.956

10.  A statistical framework to evaluate virtual screening.

Authors:  Wei Zhao; Kirk E Hevener; Stephen W White; Richard E Lee; James M Boyett
Journal:  BMC Bioinformatics       Date:  2009-07-20       Impact factor: 3.169

View more
  29 in total

1.  Deep Learning to Predict the Formation of Quinone Species in Drug Metabolism.

Authors:  Tyler B Hughes; S Joshua Swamidass
Journal:  Chem Res Toxicol       Date:  2017-02-02       Impact factor: 3.739

2.  A simple model predicts UGT-mediated metabolism.

Authors:  Na Le Dang; Tyler B Hughes; Varun Krishnamurthy; S Joshua Swamidass
Journal:  Bioinformatics       Date:  2016-06-20       Impact factor: 6.937

3.  Shape-based virtual screening with volumetric aligned molecular shapes.

Authors:  David Ryan Koes; Carlos J Camacho
Journal:  J Comput Chem       Date:  2014-07-22       Impact factor: 3.376

4.  Computationally Assessing the Bioactivation of Drugs by N-Dealkylation.

Authors:  Na Le Dang; Tyler B Hughes; Grover P Miller; S Joshua Swamidass
Journal:  Chem Res Toxicol       Date:  2018-02-06       Impact factor: 3.739

5.  Computational Approach to Structural Alerts: Furans, Phenols, Nitroaromatics, and Thiophenes.

Authors:  Na Le Dang; Tyler B Hughes; Grover P Miller; S Joshua Swamidass
Journal:  Chem Res Toxicol       Date:  2017-03-14       Impact factor: 3.739

6.  Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores.

Authors:  Faezeh Movahedi; Rema Padman; James F Antaki
Journal:  J Thorac Cardiovasc Surg       Date:  2021-07-30       Impact factor: 5.209

7.  Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation.

Authors:  Sean Ekins; Joel S Freundlich; Robert C Reynolds
Journal:  J Chem Inf Model       Date:  2013-10-30       Impact factor: 4.956

8.  Modeling the Bioactivation and Subsequent Reactivity of Drugs.

Authors:  Tyler B Hughes; Noah Flynn; Na Le Dang; S Joshua Swamidass
Journal:  Chem Res Toxicol       Date:  2021-01-26       Impact factor: 3.739

9.  The Metabolic Rainbow: Deep Learning Phase I Metabolism in Five Colors.

Authors:  Na Le Dang; Matthew K Matlock; Tyler B Hughes; S Joshua Swamidass
Journal:  J Chem Inf Model       Date:  2020-02-24       Impact factor: 4.956

10.  ROCS: receiver operating characteristic surface for class-skewed high-throughput data.

Authors:  Tianwei Yu
Journal:  PLoS One       Date:  2012-07-06       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.