A D Forbes1. 1. Medical Department, Hewlett-Packard Laboratories, Palo Alto, CA 94303-0867, USA.
Abstract
OBJECTIVE: The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets. METHODS: Two distinct contexts of classification are defined, involving "objects-by-inspection" and "objects-by-segmentation." In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's "confusion" regarding the true classifications. A proper measure of classification-algorithm performance must meet four requirements. A proper measure should obey six additional constraints. RESULTS: Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised. CONCLUSIONS: The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.
OBJECTIVE: The objective of this paper is to introduce, explain, and extend methods for comparing the performance of classification algorithms using error tallies obtained on properly sized, populated, and labeled data sets. METHODS: Two distinct contexts of classification are defined, involving "objects-by-inspection" and "objects-by-segmentation." In the former context, the total number of objects to be classified is unambiguously and self-evidently defined. In the latter, there is troublesome ambiguity. All five of the measures of performance here considered are based on confusion matrices, tables of counts revealing the extent of an algorithm's "confusion" regarding the true classifications. A proper measure of classification-algorithm performance must meet four requirements. A proper measure should obey six additional constraints. RESULTS: Four traditional measures of performance are critiqued in terms of the requirements and constraints. Each measure meets the requirements, but fails to obey at least one of the constraints. A nontraditional measure of algorithm performance, the normalized mutual information (NMI), is therefore introduced. Based on the NMI, methods for comparing algorithm performance using confusion matrices are devised. CONCLUSIONS: The five performance measures lead to similar inferences when comparing a trio of QRS-detection algorithms using a large data set. The modified NMI is preferred, however, because it obeys each of the constraints and is the most conservative measure of performance.
Authors: Joseph C Roden; Brandon W King; Diane Trout; Ali Mortazavi; Barbara J Wold; Christopher E Hart Journal: BMC Bioinformatics Date: 2006-04-07 Impact factor: 3.169
Authors: Euan A Adie; Richard R Adams; Kathryn L Evans; David J Porteous; Ben S Pickard Journal: BMC Bioinformatics Date: 2005-03-14 Impact factor: 3.169
Authors: Christopher E Hart; Lucas Sharenbroich; Benjamin J Bornstein; Diane Trout; Brandon King; Eric Mjolsness; Barbara J Wold Journal: Nucleic Acids Res Date: 2005-05-10 Impact factor: 16.971
Authors: Maria Elena Cefalì; Enric Ballesteros; Joan Lluís Riera; Eglantine Chappuis; Marc Terradas; Simone Mariani; Emma Cebrian Journal: PLoS One Date: 2018-05-24 Impact factor: 3.240
Authors: Marylyn D Ritchie; Jacquelaine Bartlett; William S Bush; Todd L Edwards; Alison A Motsinger; Eric S Torstenson Journal: BMC Proc Date: 2007-12-18