Literature DB >> 20097914

Exploring classification strategies with the CoEPrA 2006 contest.

Ozgur Demir-Kavuk1, Henning Riedesel, Ernst-Walter Knapp.   

Abstract

MOTIVATION: In silico methods to classify compounds as potential drugs that bind to a specific target become increasingly important for drug design. To build classification devices training sets of drugs with known activities are needed. For many such classification problems, not only qualitative but also quantitative information of a specific property (e.g. binding affinity) is available. The latter can be used to build a regression scheme to predict this property for new compounds. Predicting a compound property explicitly is generally more difficult than classifying that the property lies below or above a given threshold value. Hence, an indirect classification that is based on regression may lead to poorer results than a direct classification scheme. In fact, initially researchers are only interested to classify compounds as potential drugs. The activities of these compounds are subsequently measured in wet lab.
RESULTS: We propose a novel approach that uses available quantitative information directly for classification rather than first using a regression scheme. It uses a new type of loss function called weighted biased regression. Application of this method to four widely studied datasets of the CoEPrA contest (Comparative Evaluation of Prediction Algorithms, http://coepra.org) shows that it can outperform simple classification methods that do not make use of this additional quantitative information. AVAILABILITY: A stand alone application is available at the webpage http://agknapp.chemie.fu-berlin.de/agknapp/index.php?menu=software&page=PeptideClassifier that can be used to build a model for a peptide training set to be submitted.

Mesh:

Substances:

Year:  2010        PMID: 20097914      PMCID: PMC2828124          DOI: 10.1093/bioinformatics/btq021

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

Review 1.  Assessing the accuracy of prediction algorithms for classification: an overview.

Authors:  P Baldi; S Brunak; Y Chauvin; C A Andersen; H Nielsen
Journal:  Bioinformatics       Date:  2000-05       Impact factor: 6.937

2.  Toward the quantitative prediction of T-cell epitopes: coMFA and coMSIA studies of peptides with affinity for the class I MHC molecule HLA-A*0201.

Authors:  I A Doytchinova; D R Flower
Journal:  J Med Chem       Date:  2001-10-25       Impact factor: 7.446

3.  Physicochemical explanation of peptide binding to HLA-A*0201 major histocompatibility complex: a three-dimensional quantitative structure-activity relationship study.

Authors:  Irini A Doytchinova; Darren R Flower
Journal:  Proteins       Date:  2002-08-15

4.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

5.  Poor binding of a HER-2/neu epitope (GP2) to HLA-A2.1 is due to a lack of interactions with the center of the peptide.

Authors:  J J Kuhns; M A Batalia; S Yan; E J Collins
Journal:  J Biol Chem       Date:  1999-12-17       Impact factor: 5.157

6.  kScore: a novel machine learning approach that is not dependent on the data structure of the training set.

Authors:  Scott Oloff; Ingo Muegge
Journal:  J Comput Aided Mol Des       Date:  2007-02-28       Impact factor: 3.686

7.  AAindex: Amino Acid Index Database.

Authors:  S Kawashima; H Ogata; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

Review 8.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

9.  Meeting review: the Second meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2), Asilomar, California, December 13-16, 1996.

Authors:  R L Dunbrack; D L Gerloff; M Bower; X Chen; O Lichtarge; F E Cohen
Journal:  Fold Des       Date:  1997

10.  Tclass: tumor classification system based on gene expression profile.

Authors:  Li Wuju; Xiong Momiao
Journal:  Bioinformatics       Date:  2002-02       Impact factor: 6.937

View more
  1 in total

1.  Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features.

Authors:  Ozgur Demir-Kavuk; Mayumi Kamada; Tatsuya Akutsu; Ernst-Walter Knapp
Journal:  BMC Bioinformatics       Date:  2011-10-25       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.