Literature DB >> 11747533

Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.

Marco Saerens1, Patrice Latinne, Christine Decaestecker.   

Abstract

It sometimes happens (for instance in case control studies) that a classifier is trained on a data set that does not reflect the true a priori probabilities of the target classes on real-world data. This may have a negative effect on the classification accuracy obtained on the real-world data set, especially when the classifier's decisions are based on the a posteriori probabilities of class membership. Indeed, in this case, the trained classifier provides estimates of the a posteriori probabilities that are not valid for this real-world data set (they rely on the a priori probabilities of the training set). Applying the classifier as is (without correcting its outputs with respect to these new conditions) on this new data set may thus be suboptimal. In this note, we present a simple iterative procedure for adjusting the outputs of the trained classifier with respect to these new a priori probabilities without having to refit the model, even when these probabilities are not known in advance. As a by-product, estimates of the new a priori probabilities are also obtained. This iterative algorithm is a straightforward instance of the expectation-maximization (EM) algorithm and is shown to maximize the likelihood of the new data. Thereafter, we discuss a statistical test that can be applied to decide if the a priori class probabilities have changed from the training set to the real-world data. The procedure is illustrated on different classification problems involving a multilayer neural network, and comparisons with a standard procedure for a priori probability estimation are provided. Our original method, based on the EM algorithm, is shown to be superior to the standard one for a priori probability estimation. Experimental results also indicate that the classifier with adjusted outputs always performs better than the original one in terms of classification accuracy, when the a priori probability conditions differ from the training set to the real-world data. The gain in classification accuracy can be significant.

Mesh:

Year:  2002        PMID: 11747533     DOI: 10.1162/089976602753284446

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  11 in total

1.  Automatic identification of bird targets with radar via patterns produced by wing flapping.

Authors:  Serge Zaugg; Gilbert Saporta; Emiel van Loon; Heiko Schmaljohann; Felix Liechti
Journal:  J R Soc Interface       Date:  2008-09-06       Impact factor: 4.118

2.  Automatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles.

Authors:  Ramakanth Kavuluru; Anthony Rios
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

3.  Plant recognition by AI: Deep neural nets, transformers, and kNN in deep embeddings.

Authors:  Lukáš Picek; Milan Šulc; Yash Patel; Jiří Matas
Journal:  Front Plant Sci       Date:  2022-09-27       Impact factor: 6.627

4.  XLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides.

Authors:  Chao Ji; Sujun Li; James P Reilly; Predrag Radivojac; Haixu Tang
Journal:  J Proteome Res       Date:  2016-05-06       Impact factor: 4.466

5.  The importance of intrinsic disorder for protein phosphorylation.

Authors:  Lilia M Iakoucheva; Predrag Radivojac; Celeste J Brown; Timothy R O'Connor; Jason G Sikes; Zoran Obradovic; A Keith Dunker
Journal:  Nucleic Acids Res       Date:  2004-02-11       Impact factor: 16.971

Review 6.  Protein function in precision medicine: deep understanding with machine learning.

Authors:  Burkhard Rost; Predrag Radivojac; Yana Bromberg
Journal:  FEBS Lett       Date:  2016-08-06       Impact factor: 4.124

7.  Osteoporotic hip fracture prediction from risk factors available in administrative claims data - A machine learning approach.

Authors:  Alexander Engels; Katrin C Reber; Ivonne Lindlbauer; Kilian Rapp; Gisela Büchele; Jochen Klenk; Andreas Meid; Clemens Becker; Hans-Helmut König
Journal:  PLoS One       Date:  2020-05-19       Impact factor: 3.240

8.  Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae.

Authors:  Kevin Kontos; Patrice Godard; Bruno André; Jacques van Helden; Gianluca Bontempi
Journal:  BMC Proc       Date:  2008-12-17

9.  Factors influencing the statistical power of complex data analysis protocols for molecular signature development from microarray data.

Authors:  Constantin F Aliferis; Alexander Statnikov; Ioannis Tsamardinos; Jonathan S Schildcrout; Bryan E Shepherd; Frank E Harrell
Journal:  PLoS One       Date:  2009-03-17       Impact factor: 3.240

10.  Automatic Fungi Recognition: Deep Learning Meets Mycology.

Authors:  Lukáš Picek; Milan Šulc; Jiří Matas; Jacob Heilmann-Clausen; Thomas S Jeppesen; Emil Lind
Journal:  Sensors (Basel)       Date:  2022-01-14       Impact factor: 3.576

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.