Literature DB >> 21191152

Comparative analysis of instance selection algorithms for instance-based classifiers in the context of medical decision support.

Maciej A Mazurowski1, Jordan M Malof, Georgia D Tourassi.   

Abstract

When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using k-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance, reduce their storage requirements and improve their response time. However, choosing the right selection algorithm is crucial.

Entities:  

Mesh:

Year:  2010        PMID: 21191152     DOI: 10.1088/0031-9155/56/2/012

Source DB:  PubMed          Journal:  Phys Med Biol        ISSN: 0031-9155            Impact factor:   3.609


  2 in total

1.  Retrieval boosted computer-aided diagnosis of clustered microcalcifications for breast cancer.

Authors:  Hao Jing; Yongyi Yang; Robert M Nishikawa
Journal:  Med Phys       Date:  2012-02       Impact factor: 4.071

2.  The effect of class imbalance on case selection for case-based classifiers: an empirical study in the context of medical decision support.

Authors:  Jordan M Malof; Maciej A Mazurowski; Georgia D Tourassi
Journal:  Neural Netw       Date:  2011-07-18
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.