| Literature DB >> 25392680 |
Maxine Tan1, Jiantao Pu2, Bin Zheng3.
Abstract
In the field of computer-aided mammographic mass detection, many different features and classifiers have been tested. Frequently, the relevant features and optimal topology for the artificial neural network (ANN)-based approaches at the classification stage are unknown, and thus determined by trial-and-error experiments. In this study, we analyzed a classifier that evolves ANNs using genetic algorithms (GAs), which combines feature selection with the learning task. The classifier named "Phased Searching with NEAT in a Time-Scaled Framework" was analyzed using a dataset with 800 malignant and 800 normal tissue regions in a 10-fold cross-validation framework. The classification performance measured by the area under a receiver operating characteristic (ROC) curve was 0.856 ± 0.029. The result was also compared with four other well-established classifiers that include fixed-topology ANNs, support vector machines (SVMs), linear discriminant analysis (LDA), and bagged decision trees. The results show that Phased Searching outperformed the LDA and bagged decision tree classifiers, and was only significantly outperformed by SVM. Furthermore, the Phased Searching method required fewer features and discarded superfluous structure or topology, thus incurring a lower feature computational and training and validation time requirement. Analyses performed on the network complexities evolved by Phased Searching indicate that it can evolve optimal network topologies based on its complexification and simplification parameter selection process. From the results, the study also concluded that the three classifiers - SVM, fixed-topology ANN, and Phased Searching with NeuroEvolution of Augmenting Topologies (NEAT) in a Time-Scaled Framework - are performing comparably well in our mammographic mass detection scheme.Entities:
Keywords: Computer-aided detection (CAD); Machine Learning; Mammographic mass detection; NeuroEvolution of Augmenting Topologies (NEAT); Optimal Feature Selection
Year: 2014 PMID: 25392680 PMCID: PMC4216038 DOI: 10.4137/CIN.S13885
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Example of a malignant mass ROI (A) and its corresponding segmentation mask (B).
Figure 2Example of an FP ROI (A) and its corresponding segmentation mask (B).
Figure 3A flow diagram of our mass detection scheme.
Summary of computed image features for our mass detection scheme.
| FEATURE GROUP/TYPE | DESCRIPTION |
|---|---|
| Shape | Eccentricity, equivalent diameter, extent, convex area, major axis length, minor axis length, orientation, solidity, shape factor ratio, ratio of major to minor axis length, modified compactness |
| Fat | Size (pixel number), size factor ratio (size/mass area), region number, average distance to the mass center (average distance/mean radial length of mass region) |
| Calcifications | Size (pixel number), size factor ratio (size/mass area), region number |
| Texture (lesion segment only) | 4 gray level co-occurrence matrix based features, 22 average and maximum values of gray level run length based texture features |
| Texture (dilated lesion segments) | 24 average and maximum values of gray level co-occurrence matrix based features, 66 average and maximum values of gray level run length based texture features |
| Spiculation | Features computed on the maxima points and on the whole image of the divergence of the normalized gradient (DNG) and the curl of the normalized gradient (CNG) |
| Contrast | Contrast based features (previously defined in Refs. |
| Isodensity | Isodensity based features (previously defined in Ref. |
| Previously-computed features | 27 intensity, contrast, shape, border segment, and local topology based features previously described in Refs. |
Figure 4ROC curves of the five compared classifiers computed over the 10-fold cross-validation experiments–(1) Phased Searching with NEAT in a Time-Scaled Framework using the maximization of AUC as the fitness function, (2) fixed-topology ANNs, (3) SVMs, (4) bagged decision trees, and (5) LDA. The error bars are symmetric, and are two standard deviation units in length.
Average AUC values and the corresponding standard deviations for the five compared classifiers computed using the 10-fold cross-validation experiments.
| METHOD | AUC |
|---|---|
| Phased Searching | 0.856 ± 0.029 |
| ANN | 0.871 ± 0.025 |
| SVM | 0.886 ± 0.026 |
| Decision trees | 0.807 ± 0.015 |
| LDA | 0.841 ± 0.028 |
Student’s t-test performed at the 5% significance level to study if the AUC results of the different classifiers are significantly different from each other. The P-value of rejecting the null hypothesis is given in the table. The diagonal P-values in the table are equivalent; thus, they have been omitted (–).
| METHOD | ANN | SVM | DECISION TREES | LDA | |
|---|---|---|---|---|---|
| Phased Searching | – | 0.242 | 0.026 | <0.001 | 0.270 |
| ANN | – | – | 0.19 6 | <0.001 | 0.024 |
| SVM | – | – | – | <0.001 | 0.002 |
| Decision trees | – | – | – | – | 0.004 |
| LDA | – | – | – | – | – |
Features selected or retained by Phased Searching with NEAT in a Time-Scaled Framework. The 271 proposed features are divided into nine feature groups or types listed in the far-left column. The number of the features represented in each group is represented in the middle column. The average percentages of the features selected by Phased Searching with standard deviation intervals are shown in the far-right column.
| FEATURE GROUP/TYPE | NUMBER OF FEATURES | AVERAGE PERCENTAGE AND STD. DEV. INTERVALS |
|---|---|---|
| Shape | 11 | 77.3 ± 13.0% |
| Fat | 4 | 80.0 ± 23.0% |
| Calcifications | 3 | 80.0 ± 17.2% |
| Texture (mass segment only) | 26 | 68.1 ± 10.1% |
| Texture (dilated mass segments) | 90 | 75.4 ± 5.0% |
| Spiculation | 20 | 65.5 ± 13.8% |
| Contrast | 60 | 75.8 ± 5.6% |
| Previously-computed morphological features | 27 | 74.1 ± 7.4% |
| Isodensity | 30 | 28.3 ± 4.8% |
Figure 5Graphs of the fitness computed as the AUC of the training subsets of the best network per generation in the run of the best-performing network (selected out of five runs), averaged on the 10 folds of Phased Searching with NEAT in a Time-Scaled Framework with alternating generations of complexification or simplification phases.
Figure 7Graphs of the average network complexity (average number of connections) per generation in the run of the best-performing network (selected out of five runs), averaged on the 10 folds of Phased Searching with NEAT in a Time-Scaled Framework with alternating generations of complexification or simplification phases.
Average AUC values and standard deviations obtained by varying the complexification and simplification generations of Phased Searching with NEAT in a Time-Scaled Framework (the complexification or simplification phases were alternated over an 800 generation evolutionary time scale). The AUC results correspond with the best fitness and network complexity analysis performed in Figures 5–7.
| ALTERNATING COMPLEXIFICATION/SIMPLIFICATION GENERATIONS | AUC |
|---|---|
| 200 gens. complexify/200 gens. simplify | 0.853 ± 0.020 |
| 100 gens. complexify/100 gens. simplify | 0.855 ± 0.026 |
| 50 gens. complexify/50 gens. simplify | 0.854 ± 0.027 |
| 50 gens. complexify/150 gens. simplify | 0.856 ± 0.029 |
| 20 gens. complexify/180 gens. simplify | 0.853 ± 0.021 |
| 35 gens. complexify/165 gens. simplify | 0.855 ± 0.027 |
Figure 6Graphs of the number of connections of the best network per generation in the run of the best-performing network (selected out of five runs), averaged on the 10 folds of Phased Searching with NEAT in a Time-Scaled Framework with alternating generations of complexification or simplification phases.