Literature DB >> 26557265

Ensemble Supervised Classification Method Using the Regions of Interest and Grey Level Co-Occurrence Matrices Features for Mammograms Data.

Hossein Yousefi Banaem¹, Alireza Mehri Dehnavi², Makhtum Shahnazi³.

Abstract

BACKGROUND: Breast cancer is one of the most encountered cancers in women. Detection and classification of the cancer into malignant or benign is one of the challenging fields of the pathology.
OBJECTIVES: Our aim was to classify the mammogram data into normal and abnormal by ensemble classification method. PATIENTS AND METHODS: In this method, we first extract texture features from cancerous and normal breasts, using the Gray-Level Co-occurrence Matrices (GLCM) method. To obtain better results, we select a region of breast with high probability of cancer occurrence before feature extraction. After features extraction, we use the maximum difference method to select the features that have predominant difference between normal and abnormal data sets. Six selected features served as the classifying tool for classification purpose by the proposed ensemble supervised algorithm. For classification, the data were first classified by three supervised classifiers, and then by simple voting policy, we finalized the classification process.
RESULTS: After classification with the ensemble supervised algorithm, the performance of the proposed method was evaluated by perfect test method, which gave the sensitivity and specificity of 96.66% and 97.50%, respectively.
CONCLUSIONS: In this study, we proposed a new computer aided diagnostic tool for the detection and classification of breast cancer. The obtained results showed that the proposed method is more reliable in diagnostic to assist the radiologists in the detection of abnormal data and to improve the diagnostic accuracy.

Entities: Chemical Disease Species

Keywords: Breast Cancer; Classification; Mammogram

Year: 2015 PMID： 26557265 PMCID： PMC4632564 DOI： 10.5812/iranjradiol.11656

Source DB: PubMed Journal: Iran J Radiol ISSN： 1735-1065 Impact factor: 0.212

1. Background

Breast cancer is one of the most frequent cancers among women throughout the world. One in every 1000 women has been suffering from breast cancer during 1974 - 1978. However, nowadays it occurs in one in every 10 women. This means that effective preventive actions must be taken to reduce the rate of this dangerous cancer (1). The commonly used diagnostic techniques for breast cancer screening is mammography, thermography and ultrasound imaging. Among these techniques, mammography is the gold standard approach for early detection. In early stage, the detection of microcalcifications appears in the breast tissue. Microcalcifications are small calcium deposits and appear as groups of radio-opaque spots in most cancerous mammograms. Detection and classification of mammogram abnormalities is the challenging field of breast cancer diagnosis. There are different techniques for breast cancer detection, such as neural network, fuzzy logic and wavelet based algorithms (2, 3). Mammography is the best screening tool for the detection of breast cancer in early stages, before appearance in physical examination. There are several features in mammography that help physicians to detect abnormalities in early stage, and these features can be directly extracted by image processing methods (4). The cancerous breast symptoms comprise of mass, changes in shape, color and dimension of breast. If the cancer is detected in earlier stage, a better treatment can be provided. Recently, computer aided diagnosis (CAD) systems have been developed to detect breast cancer automatically. Normal tissues typically have smooth boundary and surface, whereas abnormal tissue presents rough surfaces and jagged boundaries (5). The goal of diagnosis is distinguishing between normal and abnormal images. For this purpose, there are several methods available that we can use for features extraction from the digital image, such as: region-based features, shape-based features, texture based features and position based features. In digital mammography, the most common used feature for classifying normal and abnormal pattern is texture feature. In this paper, we used the texture based Gray-Level Co-occurrence Matrices (GLCM) features for this purpose. For breast cancer classification, we should select several features with special criteria. For breast cancer detection and classification, there are numerous research methods and algorithms. Yachoub et al. (6) used a hypothesis test to determine if the feature can discriminate or not. Verma et al. (7) developed a diagnosis algorithm based on a neural-genetic algorithm feature selection method for digital mammograms and the obtained accuracy was 85%. Alolfe et al. (8) used the filter model and wrapper model for feature selection. Chen et al. (9) proposed rough set-based feature selection. Vasantha et al. (10) proposed the hybrid feature selection method for mammogram classification. The highest classification accuracy obtained by this approach was 96%. Huang et al. (11) used a support vector machine based feature selection and obtained accuracy was of 86%. Prathibha et al. (12) used the Sequential Floating Forward Selection (SFFS) to reduce the feature dimensionality. Luo et al. (13) used two well-known feature selection techniques, including forward selection and backward selection, and two classifiers for ensemble classification. They have used a decision tree and supper vector machine, as an initial classifier. Wei et al. (14) used a sequential backward selection method for the purpose of selecting the most relevant features. In their work, 18 features were extracted, out of which 12 features were finally selected for the classification of benign and malignant pattern.

2. Objectives

To overcome the problem of overfitting and underfitting encountered in other studies, we present the ensemble supervised classification method with simple voting policy for the detection of normal and abnormal pattern in the mammogram data, with reasonable accuracy.

3. Patients and Methods

The proposed method consists of three main steps, as follows: 1- feature extraction, 2- feature selection and 3- the classification process using ensemble supervised classification technique. In brief our methodology has presented in Figure 1.

Figure 1.

Block diagram of the proposed method.

3.1. Data

We obtained the required data from the Digital Database for Screening Mammography (DDSM). The resolution of obtained images was 42 microns with 4964 × 2900 pixels and breast density rating was up to 3 in the Breast Imaging-Reporting and Data System (BI-RADS). study instead of paper, we have used 300 mammograms for classification. More than 2500 DDSM data sets were available at http://marathon.csee.usf/edu/Mammography/DDSM (15). The original image that was downloaded from DDSM has been shown in Figure 2.

Figure 2.

Original image that was obtained from the digital database for screening mammography database: A, Cancerous image; B, Normal image.

3.2. Feature Extraction

Feature extraction is a crucial step in the mammograms classification. If the extracted features are not proper, overfitting, underfitting and misclassification occurs. For obtaining relevant features, after reading the image, we restricted ourselves to the region of mammograms with highest probability of cancer occurrence. We selected a region of interest (ROI) rectangular window with 512 × 512 pixels in size and then we extracted the features from this region. The obtained image has been shown in Figure 3.

Figure 3.

Region of interest selected image: A, cancerous image; B, normal image.

In this approach, the obtained parameters are more reliable for classification procedures. Here we have used GLCM for feature extraction method. The GLCM features are calculated in four directions, which are 0°, 45°, 90°, 145°, and four distances (1, 2, 3, 4). The 20 expressions of GLCM descriptors are listed in Table 1. Extracted features from data have been shown in Table 2. These features are more redundant for classification and several of them are unnecessary; therefore, we applied the feature selection method.

Table 1.

Expression of Gray-Level Co-Occurrence Matrices Descriptors

Features Computed	Formulation
Autocorrelation	=∑i=0G-1∑j=0G-1(px-μx)(py-μy)/σxσy
Contrast	=∑i=0G-1∑j=0G-1p(i,j)(i-j)2
Correlation	=∑i=0G-1∑j=0G-1P(i,j)(ij)-(μxμy)/σxσy
Cluster prominence	=∑i=0G-1∑j=0G-1P(i,j)(i+j-μx-μy)4
Cluster shade	=∑i=0G-1∑j=0G-1P(i,j)(i+j-μx-μy)3
Dissimilarity	∑i=0G-1∑j=0G-1i-jP(i,j)
Energy	∑i=0G-1∑j=0G-1P(i,j)2
Entropy	-∑i=0G-1∑j=0G-1P(i,j)log⁡(P(i,j))
Homogeneity	-∑i=0G-1∑j=0G-1P(i,j)1+i-j
Maximum probability	max⁡(i,j)P(i,j)
Sum of squares variance	=∑i=0G-1∑j=0G-1P(i,j)(i-μ)2
Sum average	=∑i=02G-2iPx+y
Sum variance	=∑i=02G-2(i-∑i=02G-2iPx+y)2Px+y
Sum entropy	-∑i=02G-2Px+y(i)log⁡(Px+y(i))
Difference variance	∑i=0G-1i2Px-y(i)
Difference entropy	-∑i=0G-1Px+y(i)log⁡(Px+y(i))
Information measure of correlation 1	HXY-HXY1max⁡(HX,HY)
Information measure of correlation 2	(1-exp⁡[-2.0(HXY2-HXY)])
Inverse difference normalized (INN)	∑i=0G-1∑j=0G-1p(i,j)1+i-j
Inverse difference moment normalized	∑i=0G-1∑j=0G-1p(i,j)1+i-j2

Table 2.

List of Gray-Level Co-occurrence Matrices Features Sets that Were Obtained From Selected Regions of Interest

Features Computed	Normal		Abnormal
Features Computed	Sample 1	Sample 2	Sample 3	Sample 4
Autocorrelation	1.20143	1.20115	7.28408	7.28177
Contrast	1.72836	1.67006	1.83504	1.73961
Correlation	9.70845	9.71811	9.42761	9.45678
Cluster prominence	3.52433	3.52576	1.01281	1.01238
Cluster shade	2.69382	2.69782	8.70367	8.72303
Dissimilarity	1.71778	1.66048	1.80913	1.72401
Energy	1.36384	1.37367	1.81605	1.83811
Entropy	2.31528	2.30567	2.05505	2.03751
Homogeneity	9.14278	9.17125	9.09961	9.14058
Maximum probability	2.20597	2.20477	3.08118	3.09041
Sum of squares variance	1.20107	1.19866	7.30073	7.28414
Sum average	6.04537	6.04403	4.80535	4.80318
Sum variance	2.65242	2.65528	1.45384	1.45814
Sum entropy	2.19301	2.18787	1.92265	1.91338
Difference variance	1.72836	1.67137	1.83509	1.73961
Difference entropy	4.61256	4.51838	4.78272	4.63405
Information measure of correlation 1	-7.2285	-7.27798	-6.5625	-6.66892
Information measure of correlation 2	9.62953	9.63631	9.30402	9.32727
Inverse difference normalized (INN)	9.80925	9.81561	9.79926	9.88614
Inverse difference moment normalized	9.97342	9.97431	9.97181	9.97341

Where x and y are the coordinates of an entry in the co-occurrence matrix, µx, µy, σx, and σy are the mean and standard deviation, and the partial probability function, px+y(i) is the probability of co-occurrence matrix coordinating summing to x + y. The HX and HY are the entropies of px and py: , and .

3.3. Feature Selection

Feature selection is an important step for feature dimension reduction in the classification procedure. After feature extraction, feature selection method was applied to select the best features. The maximum difference method was used as a feature selection method in this paper. This method selects the features that have maximum difference between two groups of data. Therefore, the selected features show more differences between normal and abnormal data. At the end, six dominant features of 20 features were selected, as shown in Table 3. In the next step, the selected features were used for classification.

Table 3.

Selected Features for Classification From 20 Features

Features Computed	Normal		Abnormal		Mathematical Equation
Features Computed	Sample 1	Sample 2	Sample 3	Sample 4	Mathematical Equation
Autocorrelation	1.20143	1.20115	7.28408	7.28177	=∑i=0G-1∑j=0G-1(px-μx)(py-μy)/σxσy
Sum of squares variance	1.20107	1.19866	7.30073	7.28414	=∑i=0G-1∑j=0G-1p(i,j)(i-μ)2
Sum average	6.04537	6.04403	4.80535	4.80318	∑i=02G-2iPx+y
Sum variance	2.65242	2.65528	1.45384	1.45814	=∑i=02G-2(i-∑i=02G-2iPx+y)2Px+y
Cluster prominence	3.52433	3.52576	1.01281	1.01238	=∑i=0G-1∑j=0G-1P(i,j)(i+j-μx-μy)4
Cluster shade	2.69382	2.69782	8.70367	8.72303	=∑i=0G-1∑j=0G-1P(i,j)(i+j-μx-μy)3

3.4. Classification

The classification process includes two steps: 1) initial classification and 2) ensemble. In the initial step, the K-nearest neighbors (KNN), naive Bayes and support vector machine (SVM) algorithms have been used as supervised classifiers for classification of normal and abnormal data. The KNN algorithm is a method for classifying objects on closed training data in the feature space. In KNN algorithm, classification of an object enrolled by a majority vote of its neighbors is performed. In this paper, we used the KNN algorithm with K = 5 (16). Naive Bayes classifier can handle an arbitrary number of independent variables, whether continuous or categorical. Given a set of variables, , we want to construct the posterior probability for the given Cj among a set of possible outcomes . Using Bei’ rule (17): Where is the posterior probability of class membership. Since Naive Bayes assumes that the conditional probabilities of the independent variables are statistically independent, we can decompose the likelihood to a product of terms: And rewrite the posterior as: Equation 3. Using the Bayes rule above, we label a new case X with a class level Cj, that achieves the highest posterior probability. The SVMs construct a decision surface in the feature space that bisects the two categories and maximizes the margin of separation between two classes of points. This decision surface can then be used as a basis for classifying points of unknown class (18). Suppose we have N training data points f (x1; y1); (x2; y2); (xN; yN) g, where xi Rd and yi . The problem of finding a maximal margin separating hyperplane plane can be written as: This is a convex quadratic programming problem. Introducing Lagrange multipliers and solving to get the Wolfe dual, we obtain: Subject to: The solution of the primal problem is given by: To train the SVM, we search through the feasible region of the dual problem and maximize the objective function. To classify the mammograms, the first 200 data features have been used for classifier training, and the remaining data was used for classifier evaluation. To obtain acceptable accuracy, we have used ensemble classifier approach. In this approach, we initially classify the data by three classification algorithms. In the next step, we applied the simple voting policy for finalizing the classification. This policy is done in two steps. At the first step, we assign a label to data, as temporary label. After assigning temporary labels to all data, the second step commences. In this step, the final label for each pixel will be the one that obtains the maximum number of votes between the temporary labels of its surrounding neighbors. Equation 8 presents this process for each data set. Where the Final_class (i) is the final label allocated to data (i), is a set of associated labels to two normal and abnormal, s is variable for defining neighborhood around data (i) and is defined by Equation 9. In this Equation, determines temporary label for data (i) and is defined through Equation 10. In Equation 10, the label is a matrix with the size of M Multiply 3 (M×3), where M is the length of data. It contains all labels that different classifiers assign to the data. For example, label can be defined for data (M) as follow: Label (M, 1) = KNN_Label (M) Label (M, 2) = Bayes_Label (M) Label (M, 3) = SVM_Label (M).

4. Results

In this section, the performance of our ensemble-supervised classifier is investigated using the DDSM dataset provided by Massachusetts General Hospital, Boston, MA, USA, the University of South Florida, Tampa, FL, USA, and Sandia National Laboratories, Albuquerque, NM, USA. Here, we used 200 data sets for training process, of which 100 data sets are chosen randomly for evaluation of classifier. Finally, the evaluation of data consists of sixty abnormal and forty normal data sets. By applying feature extraction and selection method on ROI in training data, six salience features were selected, which lead to appropriate accuracy. In the evaluation step, we only extracted these features from ROI in the test data and fed them to classifier. Finally, the obtained results were compared with the gold standards that were labeled as normal or abnormal by an expert. Sensitivity and specificity were used to investigate classifier performance; Where TP = True positive, TN = True negative, FP = False positive, and FN = False negative, a 100% sensitivity is the theoretical desired prediction for the cancerous data. Also, a 100% specificity is the theoretical desired prediction for the non-cancerous data. The sensitivity and specificity of the proposed system are shown in Table 4.

Table 4.

Measured Sensitivity and Specificity of the Proposed System With Maximum Difference Feature Selection [a]

Test Outcome	Condition Positive	Condition Negative	PV
Positive	58 (TP)	1 (FP)	98.30 (PPV) [b]
Negative	2 (FN)	39 (TN)	95.12 (NPV) [c]
Sensitivity/Specificity	96.66	97.50

a Abbreviations: PV, Predictive Value; TP, True Positive; FP, False Positive; FN, False Negative; TN, True Negative; PPV, Positive Predictive Value; NPV, Negative Predictive Value.

b PPV = TP/(TP+FP).

c NPV = TN/(TN+FN).

To verify that our selected features are robust and our feature selection method is acceptable, we compared the obtained results from the proposed method with the results of random feature selection method. In random selection method, we assume that the data distribution is normal and therefore, by using “randm” we selected the random features. Obtained results are shown in Table 5.

Table 5.

Comparison of Proposed Method With Difference Feature Selection Method

Breast Cancer Classifier Performance	Percent
With maximum difference feature selection
Specificity	97.50
Sensitivity	96.66
Accuracy	97
With randomly feature selection
Specificity	92.30
Sensitivity	91.80
Accuracy	92

a Abbreviations: PV, Predictive Value; TP, True Positive; FP, False Positive; FN, False Negative; TN, True Negative; PPV, Positive Predictive Value; NPV, Negative Predictive Value. b PPV = TP/(TP+FP). c NPV = TN/(TN+FN).

5. Discussion

In this paper, we have proposed a new CAD method to classify the tumoral mammogram. This method is fully automatic and does not need operator manipulation. At first, we select the area of mammograms with high cancer probability. Selected area contains the suspected region which is given for feature extraction process. The extracted features are classified into normal and abnormal, using ensemble supervised classification method. The performance of the proposed method is evaluated by the perfect test method, which gives the sensitivity and specificity of the result. The sensitivity and specificity of the proposed method are 96.66% and 97.50%, respectively. The proposed classification method gave the correct classification of 97% for the division into two categories according to BI-RADS standard on the DDSM. The obtained accuracy of the proposed method, 97%, is comparable with KNN (96%), SVM (87%) and Naive Bayes (89%). In this paper, by assembling three classifiers and applying single voting policy, we improve the classification results in comparison to the method proposed by Luo et al. (13) The obtained results show that our method has a slight improvement over the other proposed methods on the dataset, which is publicly available. Therefore, the proposed method is more reliable in order to assist the radiologist in the detection of abnormal data and to improve the diagnostic accuracy.

3 in total

1 in total

Review 1. Radiological images and machine learning: Trends, perspectives, and prospects.

Authors: Zhenwei Zhang; Ervin Sejdić
Journal: Comput Biol Med Date: 2019-02-27 Impact factor: 4.589