Literature DB >> 22616029

Compressing pathology whole-slide images using a human and model observer evaluation.

Elizabeth A Krupinski¹, Jeffrey P Johnson, Stacey Jaw, Anna R Graham, Ronald S Weinstein.

Abstract

INTRODUCTION: We aim to determine to what degree whole-slide images (WSI) can be compressed without impacting the ability of the pathologist to distinguish benign from malignant tissues. An underlying goal is to demonstrate the utility of a visual discrimination model (VDM) for predicting observer performance.
MATERIALS AND METHODS: A total of 100 regions of interest (ROIs) from a breast biopsy whole-slide images at five levels of JPEG 2000 compression (8:1, 16:1, 32:1, 64:1, and 128:1) plus the uncompressed version were shown to six pathologists to determine benign versus malignant status.
RESULTS: There was a significant decrease in performance as a function of compression ratio (F = 14.58, P < 0.0001). The visibility of compression artifacts in the test images was predicted using a VDM. Just-noticeable difference (JND) metrics were computed for each image, including the mean, median, ≥90th percentiles, and maximum values. For comparison, PSNR (peak signal-to-noise ratio) and Structural Similarity (SSIM) were also computed. Image distortion metrics were computed as a function of compression ratio and averaged across test images. All of the JND metrics were found to be highly correlated and differed primarily in magnitude. Both PSNR and SSIM decreased with bit rate, correctly reflecting a loss of image fidelity with increasing compression. Observer performance as measured by the Receiver Operating Characteristic area under the curve (ROC Az) was nearly constant up to a compression ratio of 32:1, then decreased significantly for 64:1 and 128:1 compression levels. The initial decline in Az occurred around a mean JND of 3, Minkowski JND of 4, and 99th percentile JND of 6.5.
CONCLUSION: Whole-slide images may be compressible to relatively high levels before impacting WSI interpretation performance. The VDM metrics correlated well with artifact conspicuity and human performance.

Entities: Chemical

Keywords: Compression; human visual system discrimination model; observer performance; pathology whole slide images

Year: 2012 PMID： 22616029 PMCID： PMC3352607 DOI： 10.4103/2153-3539.95129

Source DB: PubMed Journal: J Pathol Inform

BACKGROUND

Despite a great deal of research and technological development in the past few years, there are still important technological issues that remain to be resolved regarding the practical clinical use of whole-slide images (WSI) in pathology.[1-7] One of the major challenges is the size of the digitized images. The image files are quite large affecting the transmission rates at which they are retrieved for display from a server or storage device, and the amount of storage space they occupy. The issue is complicated even further depending on the clinical task - some cases require only a low-resolution (40× objective) scan, while others require resolutions significantly higher (80× or 100× objective).[8] Some scanners create even larger images[9] (especially emerging scanners with z-axis capabilities[3] that create a series of images) and there is concern in the DICOM (Digital Imaging and Communications in Medicine) Pathology Working Group (WG-26) that DICOM cannot handle images larger than 64,000 pixels and 2 GB total size.[10] Compression is one way to deal with this massive amount of data, but it is difficult to define a single acceptable level of compression (hence image quality) for use across all clinical questions.[811] There have been few rigorous studies of the effects of image compression on diagnostic performance with WSI slides. Most studies have been concerned either with the compression schemes[12-14] or with assessing the visibility of compression artifacts.[1516] Our goal in a series of experiments has been to demonstrate the utility of the JPEG 2000 compression/decompression standard in telepathology so that other clinical specialties may utilize the resulting information and perhaps the methods to verify its utility in those applications. Another goal is to demonstrate utility of a visual discrimination model (VDM) for predicting observer performance. Our overriding hypothesis was that it is possible to improve the presentation of compressed telepathology images for accurate diagnoses by tailoring image compression schemes and displays based on information about the capabilities and limitations of the human visual system. This scenario primarily applies to static telepathology applications, but optimizing compression for real time and hybrid systems could be accomplished with the same techniques as well. To test this hypothesis, we have had two goals: (1) Measure the visibility of lossy compression artifacts and evaluate the utility of a VDM for predicting visually lossless compression levels with telepathology WSI slides; and (2) determine with human and model observers the point at which visible compression artifacts negatively impact interpretation and visual search performance with WSI slides. In our first experiment,[16] bit rates corresponding to visually lossless JPEG 2000 compression were measured with human observers for image regions selected from pathology WSI slides. Observer performance in 2AFC (alternative forced choice) trials showed that compression ratios of about 7:1 or four times the reversible compression ratio could be achieved before losses were detectable. Significant differences in visually lossless bit rates and PSNR (point signal-to-noise ratio) were observed across test images due to normal variations in tissue structures, which affect image compressibility. VDM metrics computed for bit rates at the visually lossless thresholds were nearly constant, however, corresponding to equal JND visibility for compression losses. This uniformity suggests that a JND (just noticeable difference) target level corresponding to visually lossless compression could be applied to adaptively compress diverse images to different bit rates and different PSNR values and still achieve uniform image quality defined in terms of compression artifact visibility. In the next study,[17] threshold likelihood functions for visually lossless JPEG 2000 compression were determined experimentally using the Bayesian adaptive QUEST psychometric procedure. Mean thresholds ranged from 8.5:1 to 21.1:1 for 20 test images with a mean of 12.9:1. The statistical significance of variations in threshold likelihood functions across test images was evaluated for various threshold metrics using a likelihood ratio chi-square test. The threshold metric showing the greatest uniformity across images (likelihood ratio chi-square P = 0.84 and ΔJND < 0.5) was the 99th percentile JND computed by the VDM. This result was likely due to a combination of image, encoder, and observer task characteristics, primarily the spatially nonuniform emergence at threshold of noticeable artifacts embedded in highly structured images with significant contrast/texture masking effects, and a discrimination task requiring visual search among distractors. Although both of the previous studies determined at what level of compression artifacts were noticeable, neither of the studies directly examined the impact of compression on pathologists’ ability to render a diagnostic decision. In the present study, an ROC (Receiver Operating Characteristic) experiment was conducted to determine the effects of lossy JPEG 2000 compression on the discrimination of benign and malignant breast tissue in WSI slides. Simulations were performed to determine the correlation between human performance (area under the ROC curve, Az) and image distortion metrics derived from a model of human visual perception. The Siemens Visual Discrimination Model (VDM) was used to compute just-noticeable difference (JND) metrics for the ROC test images as a function of compression bit rate. The primary aim of this study was to establish the rate of compression at which discrimination performance decreases significantly and determine the corresponding image distortion visibility in terms of JNDs.

MATERIALS AND METHODS

A set of 100 (512 × 512 pixels) regions of interest (ROIs) were cropped by an experienced pathologist (not participating in the ROC study) from a set of breast biopsy WSI slides (acquired with the DMetrix scanner; DMetrix, Inc., Tucson, AZ, USA) (half benign, half malignant). All images were initially zoomed to the same level of magnification before cropping. They were then compressed using the Kakadu 6.0[18] implementation of JPEG 2000 to 6 levels (original uncompressed, 8:1, 16:1, 32:1, 64:1, and 128:1) and randomized to create a set of 600 test images. Six pathologists (three Board Certified pathologists; two Fellows; one senior level (PGY4) pathology resident) viewed each set of images on a Barco Coronis Fusion 6MP (Barco NV, Belgium) color display (maximum luminance 400 cd/m2). Their task was to determine whether each image was benign or malignant and report their confidence in that decision using a 6-point scale. They did not have access to the original glass slides. The results were analyzed using the MultiReader MultiCase (MRMC) ROC technique.[19] The Siemens VDM simulates factors in the ocular and early cortical processing of luminance and chrominance stimuli by the human visual system.[2021] Color images were transformed from RGB space to three opponent color channels: black-white (luminance), red-green, and blue-yellow.[22] Initial stages of the model account for the effects of the ocular modulation transfer function and luminance adaptation. The resulting image is processed by a 2D Fourier transform and filtered in the frequency domain by a set of biologically inspired spatial frequency- and orientation-tuned channels using bandpass log-Gabor filters.[23] Local band-limited contrast[24] is computed by dividing the output of each bandpass channel by the output of a low-pass, isotropic Gaussian filter applied to the image. Channel contrasts are then normalized to 1 JND at the detection threshold using a contrast sensitivity function that depends on spatial frequency and luminance.[25] Contrast discrimination sensitivity at suprathreshold (JND>1) contrast levels and interactions between channels are modeled by a combination of nonlinear excitatory and inhibitory (divisive suppression) factors associated with contrast or texture masking.[26] In the final stage, channel JND maps are max-pooled over orientation and frequency at each pixel. Summary JND metrics can then be evaluated by spatial pooling across pixels, typically by computing the mean, a histogram percentile, or Minkowski summation[24] with an exponent of 4. Metrics can be computed across an entire image or within regions or frequency/orientation channels containing specific features of interest. When applied to a pair of uncompressed and compressed images, the VDM generates objective measures of the visibility of compression artifacts in perceptually linear JND units. VDM simulations were performed by pairing each of the 100 uncompressed images selected for the ROC experiment with the same image after lossy JPEG 2000 compression. We used the Kakadu 6.0 implementation of JPEG 2000 with rate control based on mean squared error (MSE) minimization. Images were compressed to the five ratios used in the ROC study: 8:1, 16:1, 32:1, 64:1, and 128:1. JND metrics were computed for each compressed image and then averaged for the 100 images at each compression ratio. Metrics included the mean, median, Minkowski-pooled, 90th, 95th, and 99th percentiles, and maximum JND. Two additional error metrics were computed for comparison: peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).[27] Rate-distortion plots were generated with values of each metric as a function of compression rate in bits per pixel. Parametric functions from the image compression literature were fit to the rate-distortion data using a least-squares error criterion. A log-linear function (Equation 1) with three parameters, d(r) = a1log(r) + a2r + a3w (1) was fit to the JND metrics and PSNR. A five-parameter logistic function (Equation 2), d(r) = a1{0.5-[1+exp(a2(r-a3))]-1} + a4r + a5 (2) was fit to the SSIM metrics.

RESULTS

Three Board Certified pathologists (27, 39, 5 years), 2 fellows (Board certified 1 year), and 1 senior level (PGY4) pathology resident served as readers. There were three males (average age = 52.33, sd = 15.62, range = 31–68) and three females (average age = 33.33, sd = 2.05, range = 31--36). Four of the six readers wore corrective lenses and on average their last eye examination was 34.17 months ago (sd = 39.76, range = 1–120 months). Five of the six readers had been reading WSI slides for 3–5 years, and one for more than 10 years; with two having read 1–100, one having read 101–500, one having read 501–1000, one having read 1001–5000, and one having read 5001–10,000. In the human study, there was a significant decrease in performance as a function of compression level (F = 14.58, P< 0.0001) even though performance at each level was high (1:1 mean Az = 0.959; 1:8 mean Az = 0.960; 1:16 mean Az = 0.959; 1:32 mean Az = 0.957; 1:64 mean Az = 0.937; 1:128 mean Az = 0.877). Post hoc analyses revealed that performance at 64:1 and 128:1 was significantly lower than at the lower compression levels. Figure 1 shows the same image in the original uncompressed state (left), compressed to 16:1 (center, does not affect performance), and compressed to 64:1 (right, starts to impact performance). There were no significant differences between the experienced Board Certified pathologists, fellows, and the resident in terms of where diagnostic performance decreased as a function of compression level.

Figure 1

Example of an image ROI at the original uncompressed level (left), compressed to 16:1 (center) and compressed to 64:1 (right)

Example of an image ROI at the original uncompressed level (left), compressed to 16:1 (center) and compressed to 64:1 (right) Rate-distortion data are presented in Figures 2 and 3. Three representative JND metrics – mean, Minkowski-pooled, and 99th percentile – spanning a wide range of values are plotted in Figure 2. As expected, JNDs increased with decreasing bit rate, corresponding to higher compression ratios. Mean and Minkowski JNDs have been shown in previous studies to be well correlated with human task performance.[7] The 99th percentile JND was found to be the best predictor of bit rates for visually lossless compression in a previously conducted observer performance experiment.[7] All of the JND metrics in the present study were found to be highly correlated and differed primarily in magnitude. Both PSNR and SSIM decreased with bit rate [Figure 3] correctly reflecting a loss of image fidelity with increasing compression.

Figure 2

JND metrics as a function of compression bit rate, averaged for 100 test images. Error bars show the standard deviation

Figure 3

PSNR and SSIM as a function of compression bit rate, averaged for 100 test images. Error bars show the standard deviation

JND metrics as a function of compression bit rate, averaged for 100 test images. Error bars show the standard deviation PSNR and SSIM as a function of compression bit rate, averaged for 100 test images. Error bars show the standard deviation The correlation of observer performance in the ROC experiment with image distortion metrics is shown in Figures 4–6. Observer performance (Az) was nearly constant up to a compression ratio of 32:1, and then decreased significantly for 64:1 and 128:1 compression. The initial decline in Az occurred around a mean JND of 3, Minkowski JND of 4, and 99th percentile JND of 6.5 [Figure 4]. JND values can be interpreted by their correspondence to observer performance in a 2AFC detection or discrimination task:

Figure 4

Correlation between ROC observer performance and JND metrics, averaged for 100 test images at each compression rati

Figure 6

Correlation between ROC observer performance and SSIM, averaged for 100 test images at each compression ratio. Error bars show the standard deviations

Correlation between ROC observer performance and JND metrics, averaged for 100 test images at each compression rati Correlation between ROC observer performance and PSNR, averaged for 100 test images at each compression ratio. Error bars show the standard deviations Correlation between ROC observer performance and SSIM, averaged for 100 test images at each compression ratio. Error bars show the standard deviations JND: 75% correct - barely detectable JND: 94% correct - evident but sometimes missed JND: 98% correct - conspicuous, rarely missed. A mean value of 3 JND corresponds to compression losses that are readily visible in side-by-side comparisons with uncompressed images. This interpretation is consistent with our observation that images compressed by 64:1 had conspicuous artifacts (primarily blurring) over much of the image area, while the distortions at 32:1 were evident but generally more subtle and localized. The decrease in Az between 32:1 and 64:1 compression corresponded to PSNR of about 32 dB and SSIM near 0.90 [Figures 5 and 6]. Unlike JNDs, however, there is no established correspondence of these metrics with artifact conspicuity or detection task performance.

Figure 5

Correlation between ROC observer performance and PSNR, averaged for 100 test images at each compression ratio. Error bars show the standard deviations

CONCLUSIONS

The previous observer performance study[7] established the superior utility of high-percentile JNDs for predicting variable compression bit rates corresponding to visually lossless compression. The predictive value of the 99th percentile JND was significantly better for that purpose than compression ratio, PSNR, SSIM, or lower JND metrics. A similar comparison of metrics is not possible for the current ROC study, however, because the Az values and distortion metrics are averaged over 100 test images to produce a single value for each compression ratio. If the ROC experiment were repeated with different sets of test images, ideally with different structural characteristics affecting their compressibility and different compression ratios at which a significant decline in discrimination performance occurs, we could hypothesize greater consistency in JND metric values at that performance threshold compared to PSNR, SSIM, or other nonperceptual metrics. If this hypothesis were confirmed, it would establish the greater utility of a perceptual metric over compression ratio or other metrics for maximizing the compression of individual images or WSI slide regions without sacrificing interpretation accuracy. Although discrimination performance was quite high with all levels of compression, there was a steady and ultimately statistically significant drop in performance at the 64:1 compression level. What was interesting and somewhat surprising was the fact that the presence of compression artifacts did not impact discrimination at lower levels of compression. It was encouraging that there were no significant differences between the experienced Board Certified pathologists, Fellows and the resident in terms of where performance decreased as a function of compression level. However, it should be noted that the task used in this study was only one of discriminating benign vs malignant status, and the overall pathologic interpretation task is much more complex. It is quite likely that more significant differences would have been observed between the Board Certified pathologists and the Fellows and residents if we had evaluated performance in a more complex task. These results combined with the VDM results suggest that it may be possible to compress regions of diagnostically relevant tissue in breast biopsy virtual slides to at least 32:1 before impacting diagnosis.. The potential for compressing entire virtual slides using scalable, region-of-interest methods, such as JPEG 2000, is much greater due to the large fraction of the total area that typically consists of irrelevant tissue and air. The actual benefit of adaptive, ROI compression is likely to be much higher. Given the very high levels of performance the question is whether higher levels of compression could be achieved before impacting performance clinically. Further study is obviously required, especially using the entire image rather than select ROIs as well as asking readers to provide a complete diagnosis rather than just a discrimination of benign vs malignant, but an earlier study we conducted suggested that higher levels might be appropriate depending on the image content. In the future we will be testing higher levels of compression and their impact on observer performance. One limitation of our study is that it is based solely on breast tissue evaluation. Breast tissue has a range of proliferative patterns ranging from clearly benign to conclusively malignant. Based on this spectrum, even minor compression artifacts make it difficult to reach a definitive diagnosis. In future, it may be worthwhile to consider carrying out the study with images having fewer nuances of diagnostic patterns (e.g., a moderately to well-differentiated squamous cell carcinoma). More overtly malignant cell features seen in these tumors may be perceptible despite compression artifact. Likewise, in evaluating tissue invasion by squamous cell carcinoma, the diagnostic pattern may continue to be recognizable despite image degradation by compression. As already noted previously, another limitation is the use of select ROIs rather than the entire image, and the limited task of discriminating benign from malignant rather than addressing the complete interpretation task.

19 in total

1. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method.

Authors: D D Dorfman; K S Berbaum; C E Metz
Journal: Invest Radiol Date: 1992-09 Impact factor: 6.016

2. Image quality assessment: from error visibility to structural similarity.

Authors: Zhou Wang; Alan Conrad Bovik; Hamid Rahim Sheikh; Eero P Simoncelli
Journal: IEEE Trans Image Process Date: 2004-04 Impact factor: 10.856

3. Contrast in complex images.

Authors: E Peli
Journal: J Opt Soc Am A Date: 1990-10 Impact factor: 2.129

4. Digital imaging in pathology: the case for standardization.

Authors: Yukako Yagi; John R Gilbertson
Journal: J Telemed Telecare Date: 2005 Impact factor: 6.184

Review 5. Digital slides: present status of a tool for consultation, teaching, and quality control in pathology.

Authors: Rafael Rocha; José Vassallo; Fernando Soares; Keith Miller; Helenice Gobbi
Journal: Pathol Res Pract Date: 2009-06-07 Impact factor: 3.250

6. Overview of telepathology, virtual microscopy, and whole slide imaging: prospects for the future.

Authors: Ronald S Weinstein; Anna R Graham; Lynne C Richter; Gail P Barker; Elizabeth A Krupinski; Ana Maria Lopez; Kristine A Erps; Achyut K Bhattacharyya; Yukako Yagi; John R Gilbertson
Journal: Hum Pathol Date: 2009-06-24 Impact factor: 3.466

7. Standardizing the use of whole slide images in digital pathology.

Authors: Christel Daniel; Marcial García Rojo; Jacques Klossa; Vincenzo Della Mea; David Booker; Bruce A Beckwith; Thomas Schrader
Journal: Comput Med Imaging Graph Date: 2011-01-15 Impact factor: 4.790

8. Creation of a fully digital pathology slide archive by high-volume tissue slide scanning.

Authors: André Huisman; Arnoud Looijen; Steven M van den Brink; Paul J van Diest
Journal: Hum Pathol Date: 2010-02-04 Impact factor: 3.466

Review 9. Informatics for practicing anatomical pathologists: marking a new era in pathology practice.

Authors: Manal Y Gabril; George M Yousef
Journal: Mod Pathol Date: 2010-01-15 Impact factor: 7.842

10. Whole-slide imaging digital pathology as a platform for teleconsultation: a pilot study using paired subspecialist correlations.

Authors: David C Wilbur; Kalil Madi; Robert B Colvin; Lyn M Duncan; William C Faquin; Judith A Ferry; Matthew P Frosch; Stuart L Houser; Richard L Kradin; Gregory Y Lauwers; David N Louis; Eugene J Mark; Mari Mino-Kenudson; Joseph Misdraji; Gunnlauger P Nielsen; Martha B Pitman; Andrew E Rosenberg; R Neal Smith; Aliyah R Sohani; James R Stone; Rosemary H Tambouret; Chin-Lee Wu; Robert H Young; Artur Zembowicz; Wolfgang Klietmann
Journal: Arch Pathol Lab Med Date: 2009-12 Impact factor: 5.534

10 in total

1. Impact of JPEG 2000 compression on deep convolutional neural networks for metastatic cancer detection in histopathological images.

Authors: Farhad Ghazvinian Zanjani; Svitlana Zinger; Bastian Piepers; Saeed Mahmoudpour; Peter Schelkens; Peter H N de With
Journal: J Med Imaging (Bellingham) Date: 2019-04-24

2. Digital Microscopy, Image Analysis, and Virtual Slide Repository.

Authors: Famke Aeffner; Hibret A Adissu; Michael C Boyle; Robert D Cardiff; Erik Hagendorn; Mark J Hoenerhoff; Robert Klopfleisch; Susan Newbigging; Dirk Schaudien; Oliver Turner; Kristin Wilson
Journal: ILAR J Date: 2018-12-01

3. American Telemedicine Association clinical guidelines for telepathology.

Authors: Liron Pantanowitz; Kim Dickinson; Andrew J Evans; Lewis A Hassell; Walter H Henricks; Jochen K Lennerz; Amanda Lowe; Anil V Parwani; Michael Riben; Col Daniel Smith; J Mark Tuthill; Ronald S Weinstein; David C Wilbur; Elizabeth A Krupinski; Jordana Bernard
Journal: J Pathol Inform Date: 2014-10-21