Chuanlei Zhang1, Ralph L Kodell. 1. Department of Applied Mathematics and Computer Science, Philander Smith College, 900 W. Daisy L. Gatson Bates Dr., Little Rock, AR 72202, USA.
Abstract
OBJECTIVE: Although classification algorithms are promising tools to support clinical diagnosis and treatment of disease, the usual implicit assumption underlying these algorithms, that all patients are homogeneous with respect to characteristics of interest, is unsatisfactory. The objective here is to exploit the population heterogeneity reflected by characteristics that may not be apparent and thus not controlled, in order to differentiate levels of classification accuracy between subpopulations and further the goal of tailoring therapies on an individual basis. METHODS AND MATERIALS: A new subpopulation-based confidence approach is developed in the context of a selective voting algorithm defined by an ensemble of convex-hull classifiers. Populations of training samples are divided into three subpopulations that are internally homogeneous, with different levels of predictivity. Two different distance measures are used to cluster training samples into subpopulations and assign test samples to these subpopulations. RESULTS: Validation of the new approach's levels of confidence of classification is carried out using six publicly available datasets. Our approach demonstrates a positive correspondence between the predictivity designations derived from training samples and the classification accuracy of test samples. The average difference between highest- and lowest-confidence accuracies for the six datasets is 17.8%, with a minimum of 11.3% and a maximum of 24.1%. CONCLUSION: The classification accuracy increases as the designated confidence increases.
OBJECTIVE: Although classification algorithms are promising tools to support clinical diagnosis and treatment of disease, the usual implicit assumption underlying these algorithms, that all patients are homogeneous with respect to characteristics of interest, is unsatisfactory. The objective here is to exploit the population heterogeneity reflected by characteristics that may not be apparent and thus not controlled, in order to differentiate levels of classification accuracy between subpopulations and further the goal of tailoring therapies on an individual basis. METHODS AND MATERIALS: A new subpopulation-based confidence approach is developed in the context of a selective voting algorithm defined by an ensemble of convex-hull classifiers. Populations of training samples are divided into three subpopulations that are internally homogeneous, with different levels of predictivity. Two different distance measures are used to cluster training samples into subpopulations and assign test samples to these subpopulations. RESULTS: Validation of the new approach's levels of confidence of classification is carried out using six publicly available datasets. Our approach demonstrates a positive correspondence between the predictivity designations derived from training samples and the classification accuracy of test samples. The average difference between highest- and lowest-confidence accuracies for the six datasets is 17.8%, with a minimum of 11.3% and a maximum of 24.1%. CONCLUSION: The classification accuracy increases as the designated confidence increases.
Authors: Scott A Tomlins; Sheila M J Aubin; Javed Siddiqui; Robert J Lonigro; Laurie Sefton-Miller; Siobhan Miick; Sarah Williamsen; Petrea Hodge; Jessica Meinke; Amy Blase; Yvonne Penabella; John R Day; Radhika Varambally; Bo Han; David Wood; Lei Wang; Martin G Sanda; Mark A Rubin; Daniel R Rhodes; Brent Hollenbeck; Kyoko Sakamoto; Jonathan L Silberstein; Yves Fradet; James B Amberson; Stephanie Meyers; Nallasivam Palanisamy; Harry Rittenhouse; John T Wei; Jack Groskopf; Arul M Chinnaiyan Journal: Sci Transl Med Date: 2011-08-03 Impact factor: 17.956
Authors: Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend Journal: Nature Date: 2002-01-31 Impact factor: 49.962
Authors: Catherine L Nutt; D R Mani; Rebecca A Betensky; Pablo Tamayo; J Gregory Cairncross; Christine Ladd; Ute Pohl; Christian Hartmann; Margaret E McLaughlin; Tracy T Batchelor; Peter M Black; Andreas von Deimling; Scott L Pomeroy; Todd R Golub; David N Louis Journal: Cancer Res Date: 2003-04-01 Impact factor: 12.701
Authors: U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine Journal: Proc Natl Acad Sci U S A Date: 1999-06-08 Impact factor: 11.205
Authors: Robert B West; Dimitry S A Nuyten; Subbaya Subramanian; Torsten O Nielsen; Christopher L Corless; Brian P Rubin; Kelli Montgomery; Shirley Zhu; Rajiv Patel; Tina Hernandez-Boussard; John R Goldblum; Patrick O Brown; Marc van de Vijver; Matt van de Rijn Journal: PLoS Biol Date: 2005-05-10 Impact factor: 8.029
Authors: Sara Moccia; Elena De Momi; Marco Guarnaschelli; Matteo Savazzi; Andrea Laborai; Luca Guastini; Giorgio Peretti; Leonardo S Mattos Journal: J Med Imaging (Bellingham) Date: 2017-09-29