Literature DB >> 26167385

Content-based image retrieval of digitized histopathology in boosted spectrally embedded spaces.

Akshay Sridhar¹, Scott Doyle¹, Anant Madabhushi¹.

Abstract

CONTEXT: Content-based image retrieval (CBIR) systems allow for retrieval of images from within a database that are similar in visual content to a query image. This is useful for digital pathology, where text-based descriptors alone might be inadequate to accurately describe image content. By representing images via a set of quantitative image descriptors, the similarity between a query image with respect to archived, annotated images in a database can be computed and the most similar images retrieved. Recently, non-linear dimensionality reduction methods have become popular for embedding high-dimensional data into a reduced-dimensional space while preserving local object adjacencies, thereby allowing for object similarity to be determined more accurately in the reduced-dimensional space. However, most dimensionality reduction methods implicitly assume, in computing the reduced-dimensional representation, that all features are equally important. AIMS: In this paper we present boosted spectral embedding(BoSE), which utilizes a boosted distance metric to selectively weight individual features (based on training data) to subsequently map the data into a reduced-dimensional space. SETTINGS AND
DESIGN: BoSE is evaluated against spectral embedding (SE) (which employs equal feature weighting) in the context of CBIR of digitized prostate and breast cancer histopathology images.
MATERIALS AND METHODS: The following datasets, which were comprised of a total of 154 hematoxylin and eosin stained histopathology images, were used: (1) Prostate cancer histopathology (benign vs. malignant), (2) estrogen receptor (ER) + breast cancer histopathology (low vs. high grade), and (3) HER2+ breast cancer histopathology (low vs. high levels of lymphocytic infiltration). STATISTICAL ANALYSIS USED: We plotted and calculated the area under precision-recall curves (AUPRC) and calculated classification accuracy using the Random Forest classifier.
RESULTS: BoSE outperformed SE both in terms of CBIR-based (area under the precision-recall curve) and classifier-based (classification accuracy) on average across all of the dimensions tested for all three datasets: (1) Prostate cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.63; Accuracy: BoSE = 0.93, SE = 0.80), (2) ER + breast cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.68; Accuracy: BoSE = 0.96, SE = 0.96), and (3) HER2+ breast cancer histopathology (AUPRC: BoSE = 0.54, SE = 0.44; Accuracy: BoSE = 0.93, SE = 0.91).
CONCLUSION: Our results suggest that BoSE could serve as an important tool for CBIR and classification of high-dimensional biomedical data.

Entities: Chemical

Keywords: Boosted; breast cancer; content-based image retrieval; histopathology; non-linear dimensionality reduction; prostate cancer; spectral embedding

Year: 2015 PMID： 26167385 PMCID： PMC4498317 DOI： 10.4103/2153-3539.159441

Source DB: PubMed Journal: J Pathol Inform

INTRODUCTION

Content-based image retrieval (CBIR) systems allow a user to retrieve images from a database based on visual similarity to the query image. This is particularly useful for digital pathology and medical imaging databases, where text-based descriptors alone might be inadequate to accurately describe image content.[1234567] In CBIR systems, a query image is used as the input and based on image attribute matching; the most similar images from within a database are retrieved. Two main components of a CBIR system are (a) the image (or feature) representation, and (b) choice of similarity metric for performing retrieval. An ideal similarity metric (distance measure) would yield a large value when comparing visually dissimilar images and a small value when similar images are compared. For any given query image, the most similar images in the database as determined by the similarity metric are retrieved in decreasing order of relevance. However, in cases where images are represented by a large number of image attributes, the similarity measure might be affected by the so called “curse of dimensionality” problem, wherein the number of attributes may be greater than the total number of instances in the database. Dimensionality reduction (DR) is a technique that is used to project high-dimensional data into a reduced-dimensional embedding space. The low-dimensional data representation allows for more consistent and accurate similarity computations, compared to the high-dimensional space, to help determine image similarity[89] DR techniques can be broadly categorized as linear or nonlinear. Linear DR techniques such as principal component analysis (PCA)[10] fail to accurately capture object (image) relationships where the data reside on some non-linear manifold.[11] Objects residing on different ends of the manifold could potentially be mapped closer to each other in the lower-dimensional space, since linear DR methods use the Euclidean norm as opposed to the geodesic distance (appropriate for adjacency determination for objects residing on nonlinear manifolds). Nonlinear dimensionality reduction (NLDR) methods[12131415] attempt to capture object adjacency on nonlinear manifolds by preservation of the local linear neighborhood structure.[16] However, NLDR methods such as Isomap[12] and locally linear embedding (LLE)[13] are sensitive to the choice of the size of the local neighborhood (κ) within which linearity is assumed. Diffusion Maps,[14] another NLDR method, is sensitive to the number of time steps specified for the random walk. spectral embedding (SE)[15] is a NLDR method that unlike neighborhood preserving NLDR schemes (such as LLE, Isomap), defines object adjacency by using a Gaussian kernel in conjunction with the Euclidean distance metric (EDM) to yield a similarity matrix for all objects. The eigenvalue decomposition of this similarity matrix is then determined to yield the low-dimensional representation (eigenvectors) of the data. While SE is still sensitive to the parameters of the kernel, it has been shown to be more robust compared to LLE and Isomap.[17] CBIR could be performed in conjunction with SE by mapping the query and database images into a reduced-dimensional space and then retrieving relevant images as those in the neighborhood of the query instance. A key shortcoming of the EDM, however, is that it implicitly assumes all features (dimensions) are equally relevant. In the context of CBIR, features that are poor in discriminating between two image classes could potentially map dissimilar images close to each other in the low-dimensional space. Hence, in order to determine a more optimal low-dimensional representation of the data, it is desirable to weight the discriminatory attributes higher compared to the erroneous or noisy features prior to computing the similarity matrix. There has been some previous work in the development of SE variants. Tiwari et al. proposed a weighted multi-kernel learning scheme to yield an improved weight matrix for use in conjunction with SE.[18] ElGhawalby and Hancock[19] formulated a variant of SE that used an edge-based wave kernel that embedded the nodes of a graph as points on the surface of a manifold, and used the resulting point-set to compute graph characteristics. Robles-Kelly and Hancock[20] used the Kruskal coordinates to compute the edge-weights for a weight matrix and used it to embed the nodes of the graph onto a Riemannian manifold. In this paper we employ a novel variant of SE called boosted spectral embedding (BoSE), a supervised NLDR technique that utilizes a boosted distance metric (BDM) in place of the EDM. The BDM, which was first introduced,[21] employs the AdaBoost algorithm developed by Freund and Schapire.[22] AdaBoost is an ensemble approach that allows for implicit feature weighting based on class discriminability. The difference between SE and BoSE is that BDM actively places importance on discriminatory features while mitigating the role of weaker features, yielding an embedding which encourages same class objects to be embedded closer to each other and dissimilar class objects to be mapped farther apart. The main purpose of BoSE is to improve the lower-dimensional embedding so that classification and image retrieval will be more accurate in the lower-dimensional spaces. Feature weighting prior to DR makes BoSE a supervised method. Similar to SE, the methods[1920] are unweighted and unsupervised. The primary contributions of this work are twofold. First we present a new NLDR scheme boosted spectral embedding (BoSE) that employs AdaBoost with SE to generate lower-dimensional data representations with greater class separability. Second, BoSE is employed in conjunction with a CBIR scheme (CBIR-BoSE) to perform accurate retrieval of database images with respect to a query instance. An overview of the CBIR-BoSE system is illustrated in Figure 1. For a database of N annotated images, feature extraction is performed to yield N corresponding high-dimensional feature vectors. A subset of the N high-dimensional feature vectors is used as a training set for a boosted classifier to compute the weights for each feature in this two-class CBIR problem. A low-dimensional embedding of the entire dataset (MBoSE) is then created via BoSE. The Euclidean distance between the query image and the database images is then computed in MBoSE and the most similar (lowest distance) images are first retrieved. Images retrieved from the same class as the query instance are considered as “relevant”. Evaluation is done by constructing precision-recall (PR) curves, where a large area under precision-recall curves (AUPRC) reflects that CBIR-BoSE is retrieving the most relevant images first.

Figure 1

A flowchart illustrating the different components of the content-based image retrieval/boosted spectral embedding (CBIR-BoSE) system. Initially quantitative feature extraction is performed on a query image Q to yield a set of K image descriptors F1,….Fk. The database contains N annotated images (with corresponding class labels) with their corresponding feature-based representations. For the particular problem of interest, the image features are assigned weights corresponding to their class separability. A weighted similarity matrix is then created via the boosted distance metric, which is then used with BoSE to project the data into a lower-dimensional space. In the reduced space, the distance between the query Q and the database images is calculated and the database images most similar to the query are retrieved (R1, …., R5) In this work we evaluated our CBIR-BoSE system on three different two-class problems, illustrated in Figure 2. The three datasets comprised (1) 58 hematoxylin and eosin (H and E) stained prostate cancer tissue biopsy samples classified as benign [Figure 2a] or malignant [Figure 2d] and these images were represented by Gabor, Haralick, and first-order statistical features; (2) 55 H and E stained estrogen receptor (ER) + breast cancer histology specimens classified as low [Figure 2b] or high [Figure 2e] grade and these images were represented by Haralick features; and (3) 41 H and E stained HER2+ breast cancer tissue specimens classified as having low [Figure 2c] or high [Figure 2f] levels of lymphocytic infiltration (LI) and these images were represented by architectural features using the delaunay triangulation, minimum spanning tree, and the voronoi diagram. The choice of these datasets was dictated by the fact that manual inspection of both prostate and breast cancer histology suffers from high inter- and intra-pathologist variability.[232425] Typically the pathologist first determines if the histology sample is benign or malignant. If it is found to be malignant, the cancer is assigned a grade based on the morphologic and architectural attributes; cancer grade being highly correlated to patient outcome[2326] In the progression of solid tumors, local and systemic inflammation tends to play an important role.[27] Tumor infiltrating lymphocytes represent a local immune response and the degree of LI in a tumor is considered as being prognostic of patient outcome in several different disease states.[282930] The development of CBIR tools with applications in digital pathology[31] could assist pathologists by providing a quantitative, reproducible and accurate image-based risk score, indicative of disease aggressiveness and patient outcome.[23] Additionally, a CBIR system for digitized histopathology could serve as a teaching, training, and instructional tool for pathology residents and fellows.

Figure 2

Example images of (a) benign and (d) malignant prostate tissue, (b) low and (e) high grade estrogen receptor (ER+) breast cancer tissue, and HER2+ breast cancer tissue with (c) low and (f) high levels of lymphocytic infiltration. The histology images were obtained by digitizing biopsy samples previously stained with hematoxylin and eosin (H&E). In (a) the nucleoli are less prominent and the glands are more open, whereas in (d) the nucleoli are more apparent and the glands are shriveled due to increased cell proliferation. There is a greater amount of nuclear proliferation in (e) high grade ER+ breast cancer when compared to (b). A similar phenomenon can be observed when looking at HER2+ breast cancer tissue with low versus high levels of lymphocytic infiltration. In (f) there are more lymphocytes that have infiltrated the cancerous tissue compared to (c)

Table 1

List of mathematical symbols and notations used throughout the paper

BOOSTED DISTANCE METRIC

Brief Overview of Boosted Distance Metric

We define a set of objects as X = {x1, x2,…, xN} where N is the number of objects and each image represents one object. Each image xi, i ∈ {1,…, N} belongs to one of two classes + 1 or − 1. The ground truth label of xi is denoted L (xi) ∈ {+1, −1} where L (xi) = −1 indicates membership in class − 1 and L (xi) =1 indicates membership in class + 1. Let Φd (xi) for d ∈ {1, 2,…, D} represent the value of feature d from xi. The BDM construction is comprised of three main steps: Step 1: Constructing weak classifiers: Weak classifier hd(xi) ∈ {−1, 1} predicts the class label of xi based on feature operator Φd. In this work, a weak classifier is one that outputs a class label for the object under consideration. The weak learner may be one that outputs a probabalistic likelihood that an object (in this case, an image) belongs to a specific class based solely on a single attribute. These probabilities can be thresholded to obtain the class label. Multiple different weak learners derived from various image features can be constructed and evaluated in terms of classifier accuracy (assuming that a training set with class labels is available). Weak classifiers were constructed by using only a subset (training set) of the entire dataset Step 2: Implicit Feature Weighting: The T most accurate weak classifiers, ht, t ∈ {1, 2,…, T} are identified and weights associated with each ht are learned via the AdaBoost[22] algorithm, thereby enabling implicit feature weighting Step 3: BDM Construction: The BDM is then defined using the features Φt (xi) and associated weights obtained in Step 2.

Construction of Weak Classifiers

Each individual feature (weak classifier) is used to classify an image and its classification accuracy is leveraged in determining its class separability. The construction of the weak classifiers employed in this work is outlined below: Step 1: Calculate Φd (xi) for all d ∈ {1, 2,…, D}, i ∈ {1, 2,…, N}, in order to obtain corresponding feature values for each of the images. Step 2: Create training set Xtr ⊂ X containing N objects by randomly sampling half of the entire dataset X Step 3: Let X + indicate all objects in Xtr belonging to class + 1. Similarly, X − is the set of all samples in Xtr that belong to class − 1. We can obtain an appropriate probability distribution function (PDF), the integral of the density function, which predicts the likelihood of observing a feature value given a class label as: for a ∈ {+,−}, ωb ∈ {+1,−1} Γ is the gamma function, and τ, η >0 are scale and shape parameters. Equation 1 is a gamma function estimation of the PDF,[32] and is preferred to a Gaussian distribution because the feature histograms are asymmetric about the mean and the gamma function models the distribution more accurately. Step 4: Obtain the a posteriori probability P (+1|Φd (xi)) which computes the likelihood that an object with feature value Φd (xi) belongs to the positive class + 1 by solving, Step 5: Once the a posteriori probabilities have been computed for each image based on a single feature, the weak classifiers are defined based on the individual features. The weak classifiers may now be defined as follows: If the probability, which based on a single feature, of the image xi belonging to class +1 is greater than its probability of belonging to class −1, it will be given a class label of 1. Otherwise, it will be given a classification label of −1.

Implicit Feature Weighting

We use the AdaBoost[22] algorithm to perform implicit weighting of the weak classifiers (in turn reflecting the importance of the individual image attributes) in order to distinguish between the positive and negative classes. Our feature weighting algorithm is illustrated in Figure 3. AdaBoost works in an iterative fashion by first identifying the best-performing weak classifiers and then assigning weights based on the discriminative value of that feature.[22] The weights of the training images are initialized by taking the reciprocal of the number of images there are in the training set (Line 0). For each weak classifier (feature), its classification error is computed (Line 2). At each iteration, the weak classifier with the lowest classification error is chosen and its weight is determined (Line 4). The weights of the training images are updated such that the images that were frequently classified properly received lower weights, while the images that were frequently misclassified received higher weights (Line 5). This ensures that subsequent weak classifiers are picked based on their ability to classify these hard to classify instances. The process repeats T for iterations. The output of the algorithm is a set of weak classifiers ht and their associated normalized weights , t ∈ {1, 2,…, T} where 1≤T≤D and 0<<1. is the operator for the feature selected at iteration t of AdaBoost. The algorithm stops when εt>0.5.

Figure 3

The BoostFeatWeights algorithm for implicitly weighting the top performing image features for a specific task. All samples were initialized with equal weights. The weights for the weak classifiers are computed based on the classification error ε. At each iteration, weights (Π (i)) increase for samples that are difficult to classify. This forces the weak classifiers to concentrate on the images that are frequently misclassified. Once all the weights (α) for the weak classifiers are found, they are normalized so that they would range from 0 to 1. The T best performing classifiers and their weights are computed

Constructing the Boosted Distance Metric

The BDM is constructed after the weights and features have been chosen. To find the distance between two points in the high-dimensional space, we calculate, This is essentially a weighted Euclidean distance, where the weights influence the contribution of each feature. If ≈ 0, then will not affect the value of the similarity measure. Proposition 2.1 given that is the Euclidean distance metric, is also a distance metric. Proof since DEu is a metric, it is (1) positive, (2) symmetric, (3) definite, and (4) the triangle inequality holds. DBDM must also be a metric since ∈ [0.1] is positive and real valued. Therefore properties (1)-(4) are satisfied for DBDM. A look at the simple case where T = 2, and where a, b ∈ R2 can provide some insight into DBDM. If L (a) = L (b) then on average, over the entire training set, DEu (a, b) >DBDM (a, b). Ideally, if L (a) = L (b), then D (a, b) ≈0. We denote the distance between a and b in the first dimension as Δ1 and the second dimension as Δ2. Assume that feature dimension Δ1 is on average, over the entire training set, more discriminating than Δ2; more specifically that ||δ1 (a) −δ1 (b) ||<|| δ2 (a) −δ2 (b)|| where δ1 and δ2 represent the positions of the objects in feature spaces Δ1 and Δ2 respectively. Thus, via the learned feature weights. Recall that D(a,b)= and D(a,b)=. It can be seen that on average, over the entire training set, the following holds: The BoSE algorithm. The weak classifiers are built using the training samples (Xtr) and the weights are calculated via AdaBoost. The boosted distance metric is then employed with the weights to calculate the distances between all the objects in X. The distances are used in conjunction with the Gaussian kernel to obtain the weight matrix W. The lower-dimensional embedding Y is then obtained by solving the eigenvalue decomposition in Equation 8 Recall that and . Therefore, the left hand side of the inequality would yield a positive number and the right hand side would yield a negative number. Therefore on average, over the entire training set, D(a, b) < D(a, b) if L(a) = L(b). Note that it is similarly possible to show that under the same assumptions made for proposition 2.1 if L(a) ≠ L(b), then on average, over the entire training set, D(a, b) < D(a, b).

BOOSTED SPECTRAL EMBEDDING FOR CONTENT-BASED IMAGE RETRIEVAL

Boosted Spectral Embedding

The goal of SE is to project the feature vectors from a D dimensional space to a k dimensional space, where k << D. The low-dimensional representation of X is denoted Y = {y1, y2,…, yN}. The first step in SE is to create a weight matrix W where each element (i, j) in W is denoted by wij and represents the distance between xi and xj defined by some metric D. The low-dimensional representation of X is then found by solving the eigenvalue decomposition problem: where L is the diagonal matrix, L= The typical formulation of W involves the use of the EDM, where wij = exp (−DEu (xi, xj)/σ), and σ is the standard deviation of X. However, in BoSE, we replace the EDM with the BDM to obtain, Since SE seeks to preserve object adjacencies as defined by W by improving the description of adjacency via the BDM, we should improve the resulting low-dimensional embedding (achieve greater class separability in the reduced embedding space). Since DBDM is a metric, W is positive, semi-definite, and symmetric.

Content-Based Image Retrieval - Boosted Spectral Embedding

The high-dimensional feature data extracted from each of the datasets is reduced to a fewer number of dimensions via BoSE, the intent being to perform retrieval in the BoSE reduced space. Briefly, the retrieval is performed as follows. The query sample and all existing annotated database samples are aggregated and the BoSE representation for all images (following feature extraction and weighting) is determined. Using the EDM, the distance between the query image and all of the database images is calculated in the BoSE space. The resulting distance vector is sorted in ascending order and the most similar database images in terms of distance are outputted. The CBIR-BoSE algorithm is illustrated in Figure 5.

Figure 5

The content-based image retrieval/boosted spectral embedding algorithm

Experimental Design and Evaluation Dataset Description

We considered three datasets [Table 2]. Slides from all three datasets were stained with H and E and scanned into a computer via a whole-slide digital scanner at the University of Pennsylvania (prostate cancer) and the Cancer Institute of New Jersey (breast cancer). The prostate and breast cancer images were taken at magnifications of ×40 and ×20, respectively, and were saved in the SVS format. Pathologists were instructed to manually place a contour around homogeneous regions of tissue corresponding to either “cancer” or “noncancer” regions. Annotation was performed on the scanned SVS biopsy image files using the Image Scope software platform (Aperio ePathology, Leica Biosystems). No confounding tissue types (e.g., atrophy, prostatic intraepithelial neoplasia) were included. The entire set of tissue biopsy images were then divided into 30-by-30 square pixel regions; within these 900 pixels, if over 50% of the pixels (450) contained the expert's annotation, those regions were included in the dataset. All of the images were converted from the RGB color space to the hue, saturation, value (HSV) space to mitigate the effect of varying stain intensities. By converting images to the HSV space, we ensure that any potential stain intensity variation across images is confined to a single channel (the “value” channel). All three channels are still evaluated. However, if the variation in the intensity is detrimental, the features in the “value” channel will not be selected. The objective of experiment 1 (D1) was to distinguish between malignant and benign prostate tissue patches [Table 2, Figure 6] from biopsy samples obtained from 58 patients. In experiment 2 (D2), we aimed to distinguish between high and low grade breast cancer tissue patches from biopsy samples obtained from 55 patients. Lastly, the objective of Experiment 3 (D3) was to distinguish between high and low levels of LI in breast cancer tissue patches from 41 biopsy samples obtained from 12 patients. The final diagnosis and grade for each of the datasets was obtained as a consensus of two expert pathologists.

Table 2

List of the breast cancer and prostate cancer datasets used in this study

Figure 6

Examples of (a) benign and (e) Gleason grade 3 prostate cancer images and their corresponding feature images obtained at the level of sub-regions of the whole image: (b) (f)First-order statistics (range using a 5 × 5 window, Hue color channel), (c) (g) Haralick (Correlation using a 5 × 5 window, Hue color channel), and (d) (h) Gabor features (5 × 5 window, Hue color channel)

List of mathematical symbols and notations used throughout the paper List of the breast cancer and prostate cancer datasets used in this study Examples of (a) benign and (e) Gleason grade 3 prostate cancer images and their corresponding feature images obtained at the level of sub-regions of the whole image: (b) (f)First-order statistics (range using a 5 × 5 window, Hue color channel), (c) (g) Haralick (Correlation using a 5 × 5 window, Hue color channel), and (d) (h) Gabor features (5 × 5 window, Hue color channel)

Content-Based Image Retrieval Comparisons

For each of the experiments detailed below, we compared the performance of three CBIR paradigms. (1) CBIR-BoSE: The proposed system, whereby the BDM is used to perform SE and obtain a low-dimensional representation of objects for retrieval [Figure 5]. (2) CBIR-SE: Traditional SE, whereby the EDM is used to obtain the low-dimensional representation. (3) CBIR-BDM: CBIR performed using BDM to identify object similarities in the original high-dimensional space. These three paradigms are evaluated in terms of CBIR measures of performance. CBIR-BoSE and CBIR-SE are also evaluated in terms of classification accuracy. These are detailed below in Section 4.6 “Evaluation”.

Experiment 1: Distinguishing Malignant from Benign Prostate Histopathology

In,[32] Doyle, et al. found that a weighted combination of Gabor filter features, Haralick co-occurrence features, and first-order statistics could accurately distinguish between benign and malignant patterns of H and E-stained prostate tissue. In malignant tissue cell proliferation is increased, leading to enlarged nuclei and visible nucleoli, which cause malignant tissue to absorb an increased amount of hematoxylin. These changes lead to clear texture differences [Figure 6], which are captured by the following features: Gabor filter features:[33] Gabor filter banks are created by modulating a Gaussian function by a sinusoid, parameterized by an orientation and a frequency parameter. The filter provides a large response when convolved with directional image intensity patterns that correspond to the filter's parameters. The increased size and number of nuclei, in addition to the appearance of clear nucleoli, cause significant changes in the response at small frequency parameters Haralick co-occurrence features: [34] Co-occurrence image features are based on the adjacency of pixel values in an image. An adjacency matrix is created where the value of the ith row and the jth column equals the number of times pixel values i and j appear within a fixed distance of one another. By calculating these feature values for a local neighborhood, we can discriminate between the different texture patterns produced by benign and malignant nuclei First-order statistical features: Simple statistics calculated on image values are used to quantify intensity variations. These texture features include the mean, median, and standard deviation of local neighborhoods as well as gradient features and directional derivatives, which indicate transitions between high intensity values (the stroma/lumen) and the low intensity values (nuclei and nucleoli). These changes should be pronounced in malignant tissue, where nuclei are stained more heavily. Feature operators extract these three families of quantitative features from each channel in an image, yielding a total dataset of over 900 features. In,[32] 14 highly discriminating pixel-wise features were learned via AdaBoost[22] out of a feature set that comprised over 900 features. AdaBoost assigned a weight to all of the features and these weights were thresholded in that features with α >0.05were retained while the noninformative features were discarded. In the current study these 14 features were extracted for each image, generating 14 corresponding feature images, three of which are illustrated in Figure 6. The pixel values for each feature image were averaged, generating a 14 element feature vector to characterize each prostate image [Table 3].

Table 3

Texture features extracted from the prostate tissue sample images

Experiment 2: Distinguishing High from Low Grade Breast Histopathology

Two of the defining histological features of breast cancer are the disorganization of the tissue and the structure of the cells. The severity of the cancer is given a Bloom Richardson (BR) grade level.[26] Breast cancer tissue samples with greater disorganization and increasingly irregular structure are given higher grades. High grade samples exhibit more nuclear proliferation than low grade samples. As with the prostate cancer samples, the breast cancer biopsy samples were stained with H and E. Haralick features were extracted and used to describe the degree of nuclear proliferation by quantifying the variations in the intensity values in the images. The objective of this experiment was to retrieve images corresponding to the grade of the query image. To define a two-class problem, all images are first separated into either low (BR 4, 5) and high (BR 7, 8) grade classes [Table 2]. From each image, 12 Haralick feature images were generated (contrast energy, contrast inverse moment, contrast average, contrast variance, contrast entropy, intensity average, intensity variance, intensity entropy, entropy, energy, correlation, and one information measure of correlation) and the following statistics were computed from the pixel values from each feature image: mean, standard deviation, and entropy. This was done for all three color channels in the HSV space.

Experiment 3: Distinguishing High Lymphocytic Infiltration from Low Lymphocytic Infiltration Breast Histopathology

The class problem is defined as follows: Images were separated into either low LI or high LI classes [Table 2]. To quantify the arrangement of lymphocytic nuclei in the histology images, architectural features were computed for each image. The centroids of the lymphocytic nuclei are used to construct the delaunay triangulation GD [Figure 7b and f], the minimum spanning tree GM [Figure 7c and g], and the Voronoi Diagram GV [Figure 7d and h]. Automated nuclear detection was performed to identify the nuclear centers as centroids of the different graphs. However, the cancer and lymphocytic nuclei are similar in appearance. In general, lymphocytic nuclei differ in appearance from cancer cell nuclei by their smaller size, more circular shape, and a darker homogeneous staining.[35] We took these differences into account and performed automated nuclear detection in the following manner.

Figure 7

Example breast histopathology images that contain (a) low and (e) high levels of lymphocytic infiltration with their corresponding feature images: (b) (f) Delaunay Triangulation, (c) (g) Minimum Spanning Tree, and (d) (h) Voronoi Diagram. Quantitative graph features were calculated using the graphs constructed on the image Step 1: On each image, M candidate nuclear centers M = {m1, m2, …, mM} were found by convolving the image xi with a Gaussian (smoothing) kernel at multiple scales. This was done to account for the variation in lymphocyte size. The darkest pixels were found on the smoothed image based on local differences in luminance and these were the candidate lymphocytic nuclear centers. Step 2: Using Hojjatoleslami and Kittler's region-growing scheme,[36] each of the M candidate lymphocytic nuclear centers was grown into a corresponding region. The optimal regions were identified when the boundary strength, which is defined as the difference in the mean intensity of the pixels in the internal boundary and the current boundary of the region, was at a maximum. See[35] for a more detailed description. Step 3: Each of r ∈ R contained two random variables: Ar ∈ {ωc, ωl} which is the classification of the candidate nuclear centers as either a cancer (ωc) or lymphocytic (ωl) nucleus and Br ≡ [Cr,ɸr]T ∈ R(+2) where Cr is the square root of the nuclear area and ɸr is the standard deviation of the luminance in the nuclear region. The labels, Ar, given the feature vectors Br are estimated via a maximum a posteriori (MAP) estimation by finding the Ar that maximizes the posterior probability. where p (Br|Ar) is the likelihood term and p (Ar) and p (Br) are prior probabilities. p (Br) is ignored because maximization was done with respect to p (Ar). Step 4: p (Br|Ar) is computed from PDFs, where Ar is provided by manual delineation of lymphocytes in a training set. Step 5: The prior probabilities p (Ar) is defined by a Markov random field) andcomputed. The iterated conditional modes algorithm,[37] a deterministic relaxation procedure, was used to compute the MAP estimation and classify each r ∈ R. The regions classified as cancer nuclei were discarded and the centroids of the lymphocytic nuclei were calculated, yielding O={o1, o2,…, oL} where O ⊆ M. Details of the automated nuclear detection can be found in.[35] Using the O centroids, we constructed a graph G = (V, E, J), where V represent the vertices of the graph which correspond to the number of centroids, E are the set of edges, and J are the weights of the edges, proportional to edge length. The set of vertices, edges, and weights make up a unique graph on the image. From each graph, we extracted a set of features listed in Table 4. A detailed description of the graph construction and feature extraction can be found in.[38]

Table 4

List of the features extracted to quantify the degree of LI. A detailed description of the feature extraction and graph construction can be found in[38]

EVALUATION

The performance of a CBIR system is determined by how many retrieved images for a given query image are relevant to the query, defined as images which belong to the same class as the query image, and also the order in which they appear. Precision is defined as , where ξ(β) denotes the number of relevant objects in the β closest objects. Recall is defined as . Precision-recall curves were generated by plotting p(β) versus r(β) for β∈ {1,2...,N–1}. Area under the AUPRC was measured and used to evaluate the CBIR system. The AUPRC ∈ [0,1] values where an AUPRC 1 indicates that the CBIR system only retrieved relevant images and an AUPRC 0 indicates that the CBIR system only retrieved irrelevant images. Therefore, the higher the AUPRC, the better the CBIR system. We denote and as the AUPRC values for CBIR-BoSE, CBIR-SE, and CBIR-BDM, respectively. CBIR-BDM retrieves images from the database using the BDM without DR.

Classifier Evaluation of Boosted Spectral Embedding and Spectral Embedding

A second performance measure for evaluating BoSE is classifier accuracy. Of the classifiers available (Support Vector Machines, Neural Nets, etc.), the Random Forest (RF) classifier was chosen due to its ability to accurately and efficiently run on large databases with minimal training time and lower overall computational time. The RF classifier (obtained by bagging decision trees)[39] is trained on both MBoSE and MSE [Figure 10]. The accuracy of the RF classifier should reflect the class discriminability of MBoSE. A RF classifier is an ensemble of decision trees (i.e., weak learners) combined via bootstrap aggregation. Averaging decisions across weak learners creates a strong learner that reduces overall bias and variance.[39] We define and as the classification accuracy when performing classification in the lower-dimensional spaces created by BoSE and SE, respectively. The classification accuracy is defined as where TP are the true positives, TN are the true negatives, FP are the false positives, and FN are the false negatives.

Figure 10

MBoSE and MSE shown for (a), (d) D1, (b), (e) D2, and (c), (f) D3 using (a), (b), (c) BoSE and (d), (e), (f) SE. Although the low-dimensional data do not appear as a set of ‘clusters’, we can see a clear class separation on the manifold when using BoSE (top row) compared to SE (bottom row). BoSE: Boosted spectral embedding, SE: Spectral embedding

Let S+1 ⊂ X and where for any S+1, L(a) =+1 and for any b ∈ S–1, L(b) =–1. S+1 and S–1 are subsets of the total number of the specific class objects we have in X. S+1 and are S–1 randomly sampled with replacement from X, ensuring that each of S+1 and S–1 only comprise of instances from either of +1and −1. Each random sampling of S+1 and S–1 is used to train a decision tree classifier Ω, where v ∈ {1,2…,V} and so that Ωv (x) ∈ {+1, −1}. Randomized, 3-fold cross-validation was used to determine training and testing inputs for the RF classifier. First, the entire dataset X was randomly divided into three equally-sized subsets X1, X2, X3 ⊂ X. Two of the subsets were used for training the RF classifier, which was then evaluated on the remaining subset. The subsets were subsequently rotated until each subset was used for evaluation exactly once. The entire cross-validation scheme was repeated over 50 iterations, over which mean and standard deviation classification accuracy were reported.

Evaluating Intrinsic Dimensionality for Content-Based Image Retrieval - Boosted Spectral Embedding

When performing retrieval and classification in the lower-dimensional space, identifying the optimal number of dimensions within which to embed the data is a nontrivial task. Each dataset possesses an intrinsic dimensionality in which the classification accuracy and the retrieval performance will be optimal. In order to evaluate the effect of the total number of embedding dimensions to be considered, for the purpose of maximizing classification accuracy and the AUPRC, each dataset was reduced to lower-dimensional embeddings. The corresponding number of dimensions associated with these reduced-dimensional embeddings was varied and BoSE was evaluated in these different spaces [Table 5]. We define and as the accuracy and AUPRC using BoSE in k dimensions, where k ∈ {1,2...,K} and similarly and for SE The maximum, minimum, and average AUPRC and classification accuracy is reported and calculated in the following manner: where μ ∈ {Acc, AU} and v ∈{BoSE, SE}.

Table 5

The original dimensionality of the datasets and their reduced dimensionality employed for evaluating CBIR-BoSE and CBIR-SE. Both CBIR systems were evaluated after projecting the original high-dimensional data into spaces of progressively different reduced dimensions

RESULTS AND DISCUSSION

Quantitative evaluation

Figure 8 and Table 8 reveal that over a range of dimensions, CBIR-BoSE consistently outperforms CBIR-SE in terms of (a) AUPRC, and (b) accuracy. For D1, and were greater than and [Table 6]. The average AUPRC for CBIR-BoSE () across the all the dimensionalities evaluated was greater than the average AUPRC for CBIR-SE () [Table 6]. and were greater than and [Table 8]. was greater than and unlike the AUPRC values, the accuracy values remain relatively invariant to the number of dimensions that D1 is embedded into via BoSE and SE.

Figure 8

Quantitative results displaying and over the dimensions for Experiment 1. A second order polynomial was fitted to the data to illustrate the trends in

Table 8

Quantitative results showing the maximum, minimum, and mean classification accuracies for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is greater than or D1 and D3 and is statistically significant using a P<0.05

Table 6

Quantitative results showing the maximum, minimum, and mean AUPRC values for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is greater than for D1, D2, and D3 and is statistically significant using a P<0.0.5

Quantitative results displaying and over the dimensions for Experiment 1. A second order polynomial was fitted to the data to illustrate the trends in Quantitative results showing the maximum, minimum, and mean AUPRC values for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is greater than for D1, D2, and D3 and is statistically significant using a P<0.0.5 Quantitative results showing the maximum, minimum, and mean AUPRC values for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is comparable to for D1 and D2. is greater than for D2 and is statistically significant using a P<0.5 Quantitative results showing the maximum, minimum, and mean classification accuracies for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is greater than or D1 and D3 and is statistically significant using a P<0.05

Qualitative evaluation

For each of the top five images retrieved, CBIR-BoSE yielded more relevant images compared to CBIR-SE [Figure 9] reflecting that objects from the same class are mapped closer to each other in MBoSE. Figure 10a and d display MSE and MBoSE, respectively, showing a much greater separation between the malignant and benign classes in MBoSE compared to MSE.

Figure 9

The illustration shows the retrieved images using (b) boosted spectral embedding (BoSE) and (c) spectral embedding (SE) for (a) the query image (prostate cancer tissue sample). The images that are outlined in green and blue are from the cancer and benign classes, respectively. For the top five retrieved images, content-based image retrieval (CBIR)-BoSE returned more relevant images compared to CBIR-SE MBoSE and MSE shown for (a), (d) D1, (b), (e) D2, and (c), (f) D3 using (a), (b), (c) BoSE and (d), (e), (f) SE. Although the low-dimensional data do not appear as a set of ‘clusters’, we can see a clear class separation on the manifold when using BoSE (top row) compared to SE (bottom row). BoSE: Boosted spectral embedding, SE: Spectral embedding For D2, and decreased as the dimensionality of the data increased [Figure 11]. While and occurred when D2 was reduced to two dimensions and were similar, decreased more drastically compared to . This resulted in being greater than [Table 6]. Another consequence of the difference in the rate of decrease of θ between CBIR-BoSE and CBIR-SE was that was greater compared to [Table 6]. θ, θ, and ψ yielded similar values for both BoSE and SE and no appreciable difference was observed [Table 8].

Figure 11

Quantitative results displaying and over all the dimensions for the breast cancer images is greater than. A second order polynomial was fitted to the data to illustrate the trends in. BoSE: Boosted spectral embedding, SE: Spectral embedding Figure 12 displays the top five images for both the CBIR-BoSE and CBIR-SE systems. CBIR-BoSE retrieved more relevant images and thus illustrated that images from similar classes are mapped closer to each other in MBoSE compared to MSE. MBoSE [Figure 10b] appears to suggest better class separability compared to SE [Figure 10e].

Figure 12

The illustration shows the retrieved images using (b) BoSE and (c) SE for (a) the query image (high grade breast cancer tissue sample). The images that are outlined in green and blue are from high and low grade breast cancer classes, respectively. For the top five retrieved images, CBIR-BoSE returned more relevant images compared to CBIR-SE. BoSE: Boosted spectral embedding, SE: Spectral embedding, CBIR: Content-based image retrieval For D3, was greater compared to . , and were greater compared to , and [Figure 13, Table 6]. and were higher compared to and , but was similar to [Table 8]. The dimensionality of the data had little effect on the and .

Figure 13

Quantitative results displaying and over all the dimensions for the lymphocytic infiltration images. θ for BoSE were greater compared to SE. A second order polynomial was fitted to the data to illustrate the trends in θ. BoSE: Boosted spectral embedding, SE: Spectral embedding Figure 14 displays the top five images for both the CBIR-BoSE and CBIR-SE systems. MBoSE [Figure 10c] appears to show better separation between the images that have low LI and images that have high LI than MSE [Figure 10f].

Figure 14

The illustration shows the retrieved images using (b) BoSE and (c) SE for (a) the query image (low LI breast cancer tissue sample). The images that are outlined in green and blue are from the high LI and low LI classes, respectively. In the top five retrieved images, CBIR-BoSE returned more relevant images compared to CBIR-SE. BoSE: Boosted spectral embedding, SE: Spectral embedding, CBIR: Content-based image retrieval

A Comparison of Content-Based Image Retrieval-BoSE and Content-Based Image Retrieval - Boosted Distance Metric

When comparing CBIR-BoSE against CBIR-BDM, and for D1 and D2 were in general comparable to each other with CBIR-BoSE outperforming CBIR-BDM most of the time [Table 7]. However, this comparison is possibly not a fair comparison because the two metrics are being evaluated in different dimensional spaces. Additionally, apart from the marginal outperformance of the BDM via BoSE, it is highly likely that given the high dimensionality of the feature space, CBIR-BDM will most likely also be more unstable compared to CBIR-BoSE.

Table 7

Quantitative results showing the maximum, minimum, and mean AUPRC values for Experiment 1 (D1), Experiment 2 (D2), and Experiment 3 (D3). is comparable to for D1 and D2. is greater than for D2 and is statistically significant using a P<0.5

Area Under Precision-Recall Curves as a Function of Increasing Dimensionality of MBoSE

decreased as the dimensionality of MBoSE increased for all three experiments. We offer some intuition as to why this happens. Let the blue triangle in Figure 15 denote the query image. When the dataset is embedded into a one-dimensional space, seven of the eight nearest samples are from the same class. Thus, when performing image retrieval, the majority of the top eight retrieved images will be relevant. When the dataset is embedded into a two-dimensional space, only four of the eight nearest images are from the same class. If image retrieval is performed in this space, only half of the top eight images retrieved will be relevant, reducing precision for that query image; however, classification accuracy for the whole dataset is unchanged. Lastly, when the dataset is embedded into a three-dimensional space, a similar situation is encountered. It should be noted that because classification and training is performed each time a dataset is reduced in dimensionality, it is very possible that all of these spaces will yield either similar classification accuracies or improvements in classification accuracy. Consequently, the apparent discrepancy between the trends in AUPRC and accuracy for BoSE and SE across a different number of dimensions exists because in CBIR the order of the retrieved data points affects the AUPRC while the accuracy is unaffected.

Figure 15

The lymphocytic infiltration data embedded into MBoSE in (a) R1, (b) R2, and (c) R3. The filled in blue triangle denotes the query image and the arrows denote its eight nearest images. When the dimensionality of MBoSE is low, most of the eight nearest images are from the same class as the query image. However, as the dimensions are increased more irrelevant images are part of the query image's eight nearest neighbors. Hence, the area under precision-recall curves decreases as the number of dimensions is increased

CONCLUDING REMARKS

In this paper, we presented a CBIR system that utilized BoSE, which employed the BDM in conjunction with SE. The BDM preferentially weights features that discriminate between objects of different classes allowing for a similarity matrix which better describes object similarity. We have created a task-specific embedding technique that improves class separability, yielding better classification and retrieval. In this work we applied the CBIR-BoSE framework in the context of problems in digital pathology. SE has been shown to be less sensitive to the choice of system parameters compared to other popular manifold learning schemes (e.g. Isomap,[12] LLE[13]). The CBIR system presented here could be employed as a teaching tool for pathology residents and fellows. Specifically, we focused on distinguishing between[1] benign and malignant prostate histology, (2) low and high grade ER + breast cancer histology, and (3) low and high levels of LI in HER2 + breast tissue. We compared CBIR-BoSE to CBIR-SE, which uses the EDM to define object similarity. For different numbers of dimensions of the low-dimensional space, for different datasets, for different performance measures (CBIR and classifier based), CBIR-BoSE outperformed CBIR-SE a majority of the time. One of the current limitations of our CBIR system is that for every new query image, the manifold for the query along with all existing database images needs to be computed. This procedure needs to be repeated for each new query instance. In future work we are looking to incorporate out of sample extrapolation schemes[40] which allow for the mapping of a new query instance into an existing lower-dimensional space, without having to recompute the eigenvalue decomposition; thus reducing the overall computational cost of a new retrieval task. We also intend to extend our current scheme to the multi-class case.

21 in total

1. Nonlinear dimensionality reduction by locally linear embedding.

Authors: S T Roweis; L K Saul
Journal: Science Date: 2000-12-22 Impact factor: 47.728

2. A global geometric framework for nonlinear dimensionality reduction.

Authors: J B Tenenbaum; V de Silva; J C Langford
Journal: Science Date: 2000-12-22 Impact factor: 47.728

3. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization.

Authors: Stéphane Lafon; Ann B Lee
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2006-09 Impact factor: 6.226

4. Region growing: a new approach.

Authors: S A Hojjatoleslami; J Kittler
Journal: IEEE Trans Image Process Date: 1998 Impact factor: 10.856

5. Primary lung adenocarcinoma with massive lymphocyte infiltration.

Authors: Koji Tsuta; Genichiro Ishii; Eigaku Kim; Satoshi Shiono; Yutaka Nishiwaki; Yasushi Endoh; Tetsuro Kodama; Kanji Nagai; Kanji Nagai
Journal: Am J Clin Pathol Date: 2005-04 Impact factor: 2.493

6. Inflammation and cancer, the mastocytoma P815 tumor model revisited: triggering of macrophage activation in vivo with pro-tumorigenic consequences.

Authors: Caroline Kamate; Samia Baloul; Sabine Grootenboer; Eric Pessis; Alain Chevrot; Micheline Tulliez; Carmen Marchiol; Mireille Viguier; Didier Fradelizi
Journal: Int J Cancer Date: 2002-08-10 Impact factor: 7.396

7. High expression of lymphocyte-associated genes in node-negative HER2+ breast cancers correlates with lower recurrence rates.

Authors: Gabriela Alexe; Gul S Dalgin; Daniel Scanfeld; Pablo Tamayo; Jill P Mesirov; Charles DeLisi; Lyndsay Harris; Nicola Barnard; Maritza Martel; Arnold J Levine; Shridar Ganesan; Gyan Bhanot
Journal: Cancer Res Date: 2007-11-15 Impact factor: 12.701

8. Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer.

Authors: Lin Zhang; Jose R Conejo-Garcia; Dionyssios Katsaros; Phyllis A Gimotty; Marco Massobrio; Giorgia Regnani; Antonis Makrigiannakis; Heidi Gray; Katia Schlienger; Michael N Liebman; Stephen C Rubin; George Coukos
Journal: N Engl J Med Date: 2003-01-16 Impact factor: 91.245

9. Spectral embedding finds meaningful (relevant) structure in image and microarray data.

Authors: Brandon W Higgs; Jennifer Weller; Jeffrey L Solka
Journal: BMC Bioinformatics Date: 2006-02-16 Impact factor: 3.169

10. Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years.

Authors: H J BLOOM; W W RICHARDSON
Journal: Br J Cancer Date: 1957-09 Impact factor: 7.640

6 in total

1. Evaluating stability of histomorphometric features across scanner and staining variations: prostate cancer diagnosis from whole slide images.

Authors: Patrick Leo; George Lee; Natalie N C Shih; Robin Elliott; Michael D Feldman; Anant Madabhushi
Journal: J Med Imaging (Bellingham) Date: 2016-10-24

2. Fast and scalable search of whole-slide images via self-supervised deep learning.

Authors: Ming Y Lu; Drew F K Williamson; Chengkuan Chen; Tiffany Y Chen; Andrew J Schaumberg; Faisal Mahmood
Journal: Nat Biomed Eng Date: 2022-10-10 Impact factor: 29.234

3. Computer-Aided Laser Dissection: A Microdissection Workflow Leveraging Image Analysis Tools.

Authors: Jason D Hipp; Donald J Johann; Yun Chen; Anant Madabhushi; James Monaco; Jerome Cheng; Jaime Rodriguez-Canales; Martin C Stumpe; Greg Riedlinger; Avi Z Rosenberg; Jeffrey C Hanson; Lakshmi P Kunju; Michael R Emmert-Buck; Ulysses J Balis; Michael A Tangrea
Journal: J Pathol Inform Date: 2018-12-11

Review 4. Deep Learning Approaches and Applications in Toxicologic Histopathology: Current Status and Future Perspectives.

Authors: Shima Mehrvar; Lauren E Himmel; Pradeep Babburi; Andrew L Goldberg; Magali Guffroy; Kyathanahalli Janardhan; Amanda L Krempley; Bhupinder Bawa
Journal: J Pathol Inform Date: 2021-11-01

5. Multi-Pass Adaptive Voting for Nuclei Detection in Histopathological Images.

Authors: Cheng Lu; Hongming Xu; Jun Xu; Hannah Gilmore; Mrinal Mandal; Anant Madabhushi
Journal: Sci Rep Date: 2016-10-03 Impact factor: 4.379

Review 6. Machine Learning Methods for Histopathological Image Analysis.

Authors: Daisuke Komura; Shumpei Ishikawa
Journal: Comput Struct Biotechnol J Date: 2018-02-09 Impact factor: 7.271

6 in total