RATIONALE AND OBJECTIVES: The authors assessed the performance changes of a computer-assisted diagnosis (CAD) scheme as a function of the number of regions used for training (rule-setting). MATERIALS AND METHODS: One hundred twenty regions depicting actual masses and 400 suspicious but actually negative regions were selected as a testing data set from a database of 2,146 regions identified as suspicious on 618 mammograms. An artificial neural network using 24 and 16 region-based features as input neurons was applied to classify the regions as positive or negative for the presence of a mass. CAD scheme performance was evaluated on the testing data set as the number of regions used for training increased from 60 to 496. RESULTS: As the number of regions in the training sets increased, the results decreased and plateaued beyond a sample size of approximately 200 regions. Performance with the testing data set continued to improve as the training data set increased in size. CONCLUSION: A trend in a system's performance as a function of training set size can be used to assess adequacy of the training data set in the development of a CAD scheme.
RATIONALE AND OBJECTIVES: The authors assessed the performance changes of a computer-assisted diagnosis (CAD) scheme as a function of the number of regions used for training (rule-setting). MATERIALS AND METHODS: One hundred twenty regions depicting actual masses and 400 suspicious but actually negative regions were selected as a testing data set from a database of 2,146 regions identified as suspicious on 618 mammograms. An artificial neural network using 24 and 16 region-based features as input neurons was applied to classify the regions as positive or negative for the presence of a mass. CAD scheme performance was evaluated on the testing data set as the number of regions used for training increased from 60 to 496. RESULTS: As the number of regions in the training sets increased, the results decreased and plateaued beyond a sample size of approximately 200 regions. Performance with the testing data set continued to improve as the training data set increased in size. CONCLUSION: A trend in a system's performance as a function of training set size can be used to assess adequacy of the training data set in the development of a CAD scheme.