RATIONALE AND OBJECTIVES: To investigate optimization of feature selection for computerized mass detection in digitized mammograms, and to compare the effectiveness of a genetic algorithm (GA) in such optimization with that of an "exhaustive" search of all feature permutations. MATERIALS AND METHODS: A Bayesian belief network (BBN) was used to classify positive and negative regions for masses depicted in digitized mammograms; 20 features were computed for each of 592 positive and 3,790 negative regions in two databases. Conditional probabilities for the BBN were computed by using a "training" database of 288 positive and 2,204 negative regions. Performance was measured by the area under the receiver operating characteristic curve (A) by using the remainder database (304 positive and 1,586 negative regions). The optimal set was first found by using an "exhaustive" (complete permutation) searching method. A GA-based search for the optimal set then was applied, and the results of the two approaches were compared. RESULTS: As the number of features in the classifier increased, the A value increased until it reached a maximum performance for 11 features of 0.876 +/- 0.008. The A value then decreased monotonically as the number of features increased from 11 to 20. Using 100 random chromosomes (seeds) in the first generation, the GA identified the same optimal set of features but reduced the total computation time by a factor of 65. CONCLUSION: A GA-based search might be an efficient and effective approach to selecting an optimal feature set.
RATIONALE AND OBJECTIVES: To investigate optimization of feature selection for computerized mass detection in digitized mammograms, and to compare the effectiveness of a genetic algorithm (GA) in such optimization with that of an "exhaustive" search of all feature permutations. MATERIALS AND METHODS: A Bayesian belief network (BBN) was used to classify positive and negative regions for masses depicted in digitized mammograms; 20 features were computed for each of 592 positive and 3,790 negative regions in two databases. Conditional probabilities for the BBN were computed by using a "training" database of 288 positive and 2,204 negative regions. Performance was measured by the area under the receiver operating characteristic curve (A) by using the remainder database (304 positive and 1,586 negative regions). The optimal set was first found by using an "exhaustive" (complete permutation) searching method. A GA-based search for the optimal set then was applied, and the results of the two approaches were compared. RESULTS: As the number of features in the classifier increased, the A value increased until it reached a maximum performance for 11 features of 0.876 +/- 0.008. The A value then decreased monotonically as the number of features increased from 11 to 20. Using 100 random chromosomes (seeds) in the first generation, the GA identified the same optimal set of features but reduced the total computation time by a factor of 65. CONCLUSION: A GA-based search might be an efficient and effective approach to selecting an optimal feature set.
Authors: Bin Zheng; Dror Lederman; Jules H Sumkin; Margarita L Zuley; Michelle Z Gruss; Linda S Lovy; David Gur Journal: Acad Radiol Date: 2010-12-03 Impact factor: 3.173