Whereas FDA-approved methods of assessment of estrogen receptor (ER) are 'fit for purpose', they represent a 30-year-old technology. New quantitative methods, both chromogenic and fluorescent, have been developed and studies have shown that these methods increase the accuracy of assessment of ER. Here, we compare three methods of ER detection and assessment on two retrospective tissue microarray (TMA) cohorts of breast cancer patients: estimates of percent nuclei positive by pathologists and by Aperio's nuclear algorithm (standard chromogenic immunostaining), and immunofluorescence as quantified with the automated quantitative analysis (AQUA) method of quantitative immunofluorescence (QIF). Reproducibility was excellent (R(2)>0.95) between users for both automated analysis methods, and the Aperio and QIF scoring results were also highly correlated, despite the different detection systems. The subjective readings show lower levels of reproducibility and a discontinuous, bimodal distribution of scores not seen by either mechanized method. Kaplan-Meier analysis of 10-year disease-free survival was significant for each method (Pathologist, P=0.0019; Aperio, P=0.0053, AQUA, P=0.0026); however, there were discrepancies in patient classification in 19 out of 233 cases analyzed. Out of these, 11 were visually positive by both chromogenic and fluorescent detection. In 10 cases, the Aperio nuclear algorithm labeled the nuclei as negative; in 1 case, the AQUA score was just under the cutoff for positivity (determined by an Index TMA). In contrast, 8 out of 19 discrepant cases had clear nuclear positivity by fluorescence that was unable to be visualized by chromogenic detection, perhaps because of low positivity masked by the hematoxylin counterstain. These results demonstrate that automated systems enable objective, precise quantification of ER. Furthermore, immunofluorescence detection offers the additional advantage of a signal that cannot be masked by a counterstaining agent. These data support the usage of automated methods for measurement of this and other biomarkers that may be used in companion diagnostic tests.
Whereas FDA-approved methods of assessment of estrogen receptor (ER) are 'fit for purpose', they represent a 30-year-old technology. New quantitative methods, both chromogenic and fluorescent, have been developed and studies have shown that these methods increase the accuracy of assessment of ER. Here, we compare three methods of ER detection and assessment on two retrospective tissue microarray (TMA) cohorts of breast cancerpatients: estimates of percent nuclei positive by pathologists and by Aperio's nuclear algorithm (standard chromogenic immunostaining), and immunofluorescence as quantified with the automated quantitative analysis (AQUA) method of quantitative immunofluorescence (QIF). Reproducibility was excellent (R(2)>0.95) between users for both automated analysis methods, and the Aperio and QIF scoring results were also highly correlated, despite the different detection systems. The subjective readings show lower levels of reproducibility and a discontinuous, bimodal distribution of scores not seen by either mechanized method. Kaplan-Meier analysis of 10-year disease-free survival was significant for each method (Pathologist, P=0.0019; Aperio, P=0.0053, AQUA, P=0.0026); however, there were discrepancies in patient classification in 19 out of 233 cases analyzed. Out of these, 11 were visually positive by both chromogenic and fluorescent detection. In 10 cases, the Aperio nuclear algorithm labeled the nuclei as negative; in 1 case, the AQUA score was just under the cutoff for positivity (determined by an Index TMA). In contrast, 8 out of 19 discrepant cases had clear nuclear positivity by fluorescence that was unable to be visualized by chromogenic detection, perhaps because of low positivity masked by the hematoxylin counterstain. These results demonstrate that automated systems enable objective, precise quantification of ER. Furthermore, immunofluorescence detection offers the additional advantage of a signal that cannot be masked by a counterstaining agent. These data support the usage of automated methods for measurement of this and other biomarkers that may be used in companion diagnostic tests.
For decades, the value of estrogen receptor (ER) as a prognostic and predictive marker in breast cancer has been an unparalleled example of the impact of biomarker research on patient care (1–3). Its importance is such that recent discoveries of high error rates in clinical testing for ER, in both Canada and the United States, spurred an immediate reaction towards improved standardization in ER assessment (4–7) resulting in publication of guidelines for tissue processing and analysis to optimize companion diagnostic testing of ER in breast cancer specimens. As a result, research into pre-analytical variables that may influence biomarker test results has expanded dramatically (8, 9), though somewhat less attention has been paid to analytical variables, specifically, those concerned with methods of estrogen receptor detection and quantification/measurement.Prior to the current IHC-based standard, estrogen receptor expression was widely evaluated using the ligand-binding assay (LBA). This test incubated breast tissue lysate with radiolabeled estradiol and resulted in an absolute quantification (fmol/mg) of the estrogen receptor (3). However, LBAs are limited by the large tissue requirement and their inability to provide contextual information including the capability to distinguish ER expression in benign versus malignant cells (10). Upon development of specific monoclonal antibodies (11, 12), the practical ease and cost effectiveness of immunohistochemistry (IHC) led to rapid implementation of a new clinical standard for in situ assessment of protein expression after demonstration of their prognostic and predictive value (13, 14). However, the advantages of this in situ detection method of ER, were confounded by the introduction of the human eye as a measurement tool resulting in significant reader variability (15, 16).Over the past few decades, many platforms have endeavored to eliminate this intra- and inter-observer variability and achieve consistent evaluation of diagnostic specimens. Systems such as the CAS-200 (10) and ChromaVision’s ACIS (17) function on a principle of color deconvolution; for estrogen receptor and other nuclear markers, this allows optical density measurements of positive target staining within a nuclear counterstain (18, 19). Recently, technology has allowed the development of more rapid and sophisticated methods of digital image analysis. One such platform, the Aperio ScanScope and Digital Image Analysis Suite, combines both high-resolution image capture and quantitative assessment, and is FDA-approved to assist pathologists in ER, PR, Her2, and Ki67 measurement in breast cancer (20–22). In spite of the FDA approval, adoption is still limited. A recent CAP survey (2014) shows that less than 25% of over 1100 labs surveyed use automated assessment for ER.Despite these advances, any system relying on chromogenic immunostaining is subject to the inherent limitations of absorbance measurement, such as a low dynamic range and saturation of the signal intensity based on enzymatic visualization of the antibody. Most widely used is 3,3′-diaminobenzidine (DAB), a highly thermochemically stable polybenzimidazole that provides brown-colored staining (23). The chromogen deposition occurs through a redox reaction catalyzed by an enzyme that allows direct bright-field light microscopy assessment (24–27). Fluorescent systems of visualization and measurement are not subject to thelimitations of high density and saturation. Optical detection and quantification of fluorescent signal depends on excitation and photon emission of specific wavelengths, resulting in signal intensity directly proportional to the concentration of the target of interest (28). The dynamic range of common assays with fluorphores that emit in the visible region of the spectrum is 2 to 3 times the dynamic range of chromogenic stains. Multicolor detection by using fluorescent target labeling, that can be spectrally resolved, make it possible to examine several markers at once (29, 30). Several methods of quantification of fluorescent staining have been described (31). Here we use the AQUA technology since it does not require feature-based image fractionation, but rather it allows detection of biomarker expression within specific subcellular compartments, as defined by antibody-conjugated fluorophore labelling and co-localization of the target of interest with cytoplasmic or nuclear staining (32). The fluorescent intensity is measured and divided by the compartment area to yield a quantitative, continuous, and reproducible score for each field of view. This technology has been extensively previously validated in tissue microarrays as well as whole tissue sections (33, 34).To assess the problem of user and methodological bias in quantification of estrogen receptor expression in breast cancer, we chose a three-pronged experimental approach to compare both automated (Aperio) and visual (pathologist) scoring of chromogenic staining, as well as to evaluate both of these techniques against QIF-based ER detection. Each method of staining and detection was performed with two common clinical ER antibody clones (1D5 and SP1).
Materials and Methods
Patient Cohorts and Tissue Microarray Construction
Two retrospective breast cancer cohorts were constructed consisting of tissue obtained from the Archives of the Pathology Department at Yale University (New Haven, CT) and used to create two representative tissue microarrays (TMAs), as previously described. Briefly, YTMA 49 consists of 621 patients diagnosed between 1962 and 1982. This cohort is completely annotated with clinic-pathological and follow up information. YTMA 128 contains 235 patients diagnosed between 2003 and 2008. Cohort characteristics are summarized in supplemental table 1 and 2. For both cohorts, 0.6 mm cores were taken from each specimen and combined into randomized tissue microarrays, which were cut into 5 uM sections and adhered to glass slides for immunostaining. An Index TMA consisting of cell lines with known concentration of ER and of patient samples with variable ER expression pattern (described previously in Welsh et al.(35)) was run alongside each experiment for standardization and reproducibility purposes and to determine the threshold of detection for ER positivity for the different staining and reading methods described here.
Immunostaining with SP1
To visualize estrogen receptor expression with the rabbit monoclonal SP1 antibody (ThermoScientific, Waltham, MA), slides were baked at 60°C for 30 minutes to remove excess paraffin. Deparaffinization was performed in xylenes for two periods of 20 minutes each, after which slides were transferred to 100% ethanol and rehydrated to water in grades of ethanol. Heat-induced antigen retrieval took place in a PT module (LabVision, Kalamazoo, MI), where slides were immersed in sodium citrate buffer (pH6) for 20 minutes at 97°C. Slides were then rinsed in distilled water, transferred to a solution of 0.75% H2O2 in methanol for 30 minutes at room temperature to block endogenous peroxidases, and rinsed again in distilled water. They were then transferred to a Labvision autostainer, where the remaining staining steps were performed at room temperature and rinsed with tris-buffered saline/0.05% Tween-20 (TBST) between each stage. Nonspecific antigens were blocked by 30 minutes in 0.3% bovine serum albumin (BSA) diluted in TBST.For chromogenic visualization, slides were incubated for 1 hour with SP1 antibody (1:100) in BSA-TBST, then anti-rabbit EnVision (Dako) for 1 hour. Signal was developed for 5 minutes in 3,3′-diaminobenzidine solution (Dako; prepared according to manufacturer instructions), followed by counterstaining for 1 minute with hematoxylin (Tacha’s automated hematoxylin, BioCare Medical, Concord, CA). Slides were removed from autostainer and coverslipped with Prolong Gold mounting medium (Life Technologies).For slides to be visualized with fluorescence, a cocktail of SP1 antibody (1:100) and mouse pan-cytokeratin (Dako, Carpinteria, CA; 1:100) in 0.3% BSA-TBST was added for 1 hour. The slides were then incubated a secondary antibody cocktail of goat anti-mouseAlexaFluor 546 (Life Technologies) diluted 1:100 in anti-rabbit EnVision (Dako) for 1 hour. Signal was amplified with Cy5-tyramide (Perkin Elmer, Waltham, MA) for 10 minutes, and nuclear staining was accomplished with 10 ug/mL DAPI (Life Technologies) in BSA-TBST for 20 minutes. Slides were then removed from the autostainer and coverslipped using Prolong Gold mounting medium (Life Technologies).
Immunostaining with 1D5
For estrogen receptor visualization with the 1D5 antibody (Dako) and subsequent analysis with Aperio’s FDA-approved nuclear algorithm, slides were stained according to the clinical site protocol for 1D5 as described previously (22).ER 1D5 slides intended for fluorescent visualization were immunostained according to the same protocol as described for ER SP1. Slides were incubated in a primary antibody cocktail containing 1D5 (1:50) and pan-cytokeratin (rabbit polyclonal, Dako) at 1:100 in BSA-TBST for 30 minutes, followed by a secondary cocktail of goat anti-rabbitAlexaFluor 546 (1:100) in anti-mouse EnVision (Dako) for 30 minutes, as well as signal amplification with Cy5 and DAPI staining.
Aperio Nuclear Algorithm
For analysis with Aperio’s nuclear algorithms, chromogenic slides were scanned to create bright field digital images using the ScanScope CS (Aperio, Vista, CA). All digital images were viewed in ImageScope and analysis performed in Spectrum, elements of the Aperio image review and analysis suite. Slide images were first segmented to obtain a single image for each tissue microarray spot, after which the pen tool was used to circle (“annotate”) tumor areas for each spot. This was refined by use of a negative pen tool to subtract stromal areas enclosed by tumor, to ensure analysis would be restricted to tumor only.For ER 1D5 scoring with the FDA-approved nuclear algorithm on YTMA 128, tissue microarray spot images were first annotated to exclude stroma and restrict analysis to tumor areas only. The algorithm was then run on each spot to generate both a markup image (showing scoring for individual nuclei) and a percent positive nuclei score for each spot.For ER SP1 scoring, the unlocked nuclear algorithm was modified to take into account a darker counterstain and improve color de-convolution, but was otherwise not altered from the settings of the FDA-approved nuclear algorithm. The nuclear algorithm input includes a section for red, green, and blue absorbance (OD) values for the hematoxylin counterstain in order to facilitate de-convolution from the nuclear stain, which has its own set of OD values. ImageScope’s Image Quality feature was used to measure the RGB OD values within negative control spots. These were then averaged for the slide, substituted for the defaults, and the resultant algorithm saved and used to generate ER scores as percent positive nuclei in annotated spot images. The counterstain RGB values were determined separately for each slide stained with SP1, to account for subtle variations in hematoxylin counterstaining between slides.
Pathologist Scoring
YTMA 49 and YTMA 128 slides with estrogen receptor staining visualized by DAB were submitted to 3 board-certified pathologists (Path1, Path2 and Path3), who estimated percent positive nuclei using the digital images acquired by Aperio’s ScanScope CS. Tissue microarray spots denoted by a pathologist to contain no invasive breast cancer were excluded from further analysis in all three ER assessment methods, as were spots with diffuse cytoplasmic staining instead of specific nuclear signal.
Automated Quantitative Analysis (AQUA)
Immunofluorescence staining for both SP1 and 1D5 antibodies was quantified using automated quantitative analysis (AQUA) as previously described (32). Briefly, monochromatic images for each of the DAPI, Cy3, and Cy5 channels were captured after for each tissue microarray spot, using an automated PM-2000 microscope platform (Genoptix/Novartis). The cytokeratin expression (Cy3) was used to binarize pixels to create an epithelial tumor mask. DAPI staining within this tumor mask was used to create a nuclear compartment, in which estrogen receptor expression (Cy5) was measured as the sum of all pixel intensities, divided by the area of the nuclear compartment. Scores were then individually normalized according to exposure time, bit depth, and lamp hours to allow direct comparison between spots on the same slide.
Statistical Analysis
Regression analysis to assess method and assay reproducibility was performed in Microsoft Excel 2010, and results were confirmed in the StatView software platform (SAS Institute, Inc., Cary, NC), by means of Pearson coefficients and ANOVA testing. Kaplan-Meier survival analysis was performed using StatView for each ER scoring method, and statistical significance was assessed using the log-rank test.
Results
Fluorescent and Chromogenic Assessment
To evaluate methods of estrogen receptor visualization and measurement, immunostaining was performed on serial sections of two breast cancer tissue microarray cohorts collected at Yale, as previously described (35). Figure 1 shows examples of low and high estrogen receptor expression with both chromogenic and fluorescent detection methods on serial sections. Digital images of each slide were then captured for further analysis (Figure 2).
Figure 1
Examples of estrogen receptor staining in breast cancer tissue microarrays by both chromogenic and fluorescent methods. a) Low and b) high expression as visualized by 3-diaminobenzidene; corresponding on serial sections c) low and d) high expression as seen via conjugation with Cy5-tyramide.
Figure 2
A demonstration of the components of fluorescent and chromogenic quantification as utilized by AQUA and Aperio’s nuclear algorithm, respectively. Panel A shows simultaneous visualization of nuclei (blue, DAPI), pan-cytokeratin (green, AlexaFluor 546), and estrogen receptor (red, Cy5-tyramide) in a single tissue microarray spot. The AQUA program generates a tumor mask compartment from cytokeratin expression, further refines it into a nuclear compartment using DAPI positivity, and measures target signal intensity in the nuclear compartment. Panel B illustrates typical chromogenic staining for estrogen receptor in a strongly-positive case, as visualized by diaminobenzidine (DAB) and counterstained with hematoxylin. The tumor areas are manually outlined (annotated; green line) by the user to exclude stromal nuclei. Aperio’s nuclear algorithm then uses morphological characteristics and the hematoxylin counterstain to identify nuclei. DAB intensity is then measured on a per-cell basis to determine positivity, and a markup image generated to illustrate results. Nuclei are binned into four categories to mimic pathologist intensity scoring: negative (blue = 0), weak positive (yellow = 1), positive (orange = 2), and strong positive (red = 3).
Fluorescent detection slides were scanned at 20X to collect images from the DAPI, Cy3 (cytokeratin) and Cy5 (ER) channels (Figure 2A). These images were then analyzed with the AQUA software, which created an epithelial tumor mask from cytokeratin expression, then used DAPI expression within this mask to form a nuclear compartment. ER signal was quantified as the sum of pixel intensities divided by the nuclear compartment area and normalized to generate a Nuclear AQUA Score for each patient.Chromogenic detection slides were scanned using Aperio’s ScanScope CS digital image acquisition system, and board-certified pathologists scored percent positive nuclei for each tissue microarray spot using these digital images. The images were then manually annotated by a trained technician to exclude stromal areas, and analyzed with Aperio’s nuclear algorithm. Nuclei are binned into four categories (negative nuclei or weak, medium, and strong positive nuclei), and a markup image created to reflect scoring results (Figure 2B). Aperio’s nuclear algorithm quantifies the annotated tissue for percent positive nuclei as well as staining intensity according to predefined four categories resulting in a semi-quantitative scoring system.
Antibody and User Variability
Our first step was to examine the relationship between ER 1D5 and ER SP1 scoring on YTMA128 by all three methods of assessment (Figure 3). While all methods show a correlation between the 1D5 and SP1 scores (Figure 3C), the relationship changes as a function of the method.. Despite following the clinical site protocol precisely, we observed a titration independent, light brown haze over the tissue stained with the 1D5 antibody that was not present with SP1. As we wished to omit antibody-specific variables confounding reading and interpretation of the slides, all further analysis was performed using the ER SP1 clone.
Figure 3
A comparison of antibody clones 1D5 and SP1 as they affect manual and automated assessment of estrogen receptor expression on YTMA128. a) Pathologist scoring of 1D5 vs. pathologist scoring of SP1. b) Aperio’s FDA-approved nuclear algorithm scoring of 1D5 vs. scoring of SP1 by a modified version of Aperio’s nuclear algorithm. c) AQUA scoring of 1D5 vs. SP1 in the nuclear compartment.
To assess operator-based reproducibility, each assay analysis method was completed by two different operators allowing assessment of the subjective component of each scoring method (Figure 4). The Pearson coefficients (R) were above 0.9 for all methods, but both automated scoring methods had higher reproducibility (R > 0.95) between different operators. The regression R2 between pathologists 1 and 2 as assessed by traditional visual scoring methods, was 0.92. The non-continuity of the scores can also be seen in figure 4A. The regression between the Aperio scores for two users was 0.96, showing better performance that traditional scoring but still suggesting some element of subjectivity. When 2 different users completed the AQUA scoring, the regression as nearly perfect (0.995) suggesting minimal user variation.
Figure 4
Inter-user reproducibility for methods used to quantify estrogen receptor expression. a) Pathologist scoring and b) Aperio nuclear algorithm assessment of ER positivity were reported as percent positive nuclei (chromogenic visualization), and c) AQUA quantification as Nuclear AQUA Score (fluorescent visualization).
Assessment Methods Comparison
We then examined variability between methods using a linear regression analysis for continuous data (Figure 5). Although the pathologist data is not truly continuous, the estimations of percentage of positive nuclei were assumed to be continuous for the purposes of this assay. The regression between either pathologists’ percent positive nuclei scores and the score from Aperio’s nuclear algorithm showed a non-linear relationship where the pathologist scores were consistently higher than those generated by the Aperio nuclear algorithm (Figure 5A). There were essential no cases were the pathologist estimate was below the Aperio score. A similar pattern was seen with AQUA scores. Although AQUA measures pixel intensity of the target of interest (ER in this study) as opposed to percent positivity, it has a similar relationship when compared to pathologist scoring (Figure 5B). The closest relationship between any two methods is clearly between the two types of automated scoring, despite the different detection techniques (Figure 5C). However, comparing the 2 automated scoring methods reveals the lower dynamic range and enzymatic saturation of the DAB signal as compared to fluorescent measurement.
Figure 5
Relationships between methods used to assess estrogen receptor. a) Aperio’s nuclear algorithm vs. pathologist scoring; b) AQUA vs. pathologist scoring; and c) AQUA vs. Aperio’s nuclear algorithm.
Survival Analysis and Discordance
While regressions help us examine the similarities and differences in ER quantification methods, they do not provide any case-specific information on patient classification into the ER-negative or ER-positive groups. Furthermore comparison of tests is more valuable when the test comparison can be assessed as a function of patient outcome. To see how the three assessment methods compared on this basis, we looked at their determination of ER status for patients on YTMA49, a large, historic cohort collected at Yale between 1962 and 1982. The 10-year disease-free survival Kaplan-Meier curves are very similar between all three methods (Figure 6), but their differences can be seen in the summary table (Table 1). When the continuous scores are binarized to generate positive or negative output, only 19 of 233 total cases, were discordant: There was only 1 case that was positive by pathologist and Aperio scoring, but negative by AQUA. In contrast there were10 cases that were positive by pathologist and AQUA, but negative by Aperio. There were 3 cases that were positive by pathologist, and negative by the AQUA and Aperio methods; and finally, 5 cases were positive by AQUA, and negative by pathologist and Aperio scoring. The number of discordant cases is too small to evaluate which method better correlates with outcome.
Figure 6
Kaplan-Meier survival analysis of breast cancer patients on YTMA 49 with estrogen receptor negative (blue) and positive (red) tissue, as measured by: a) pathologist, b) Aperio’s nuclear algorithm, and c) AQUA. The cutoff used for pathologist scoring and Aperio’s nuclear algorithm was 1% positive nuclei, as per ASCO-CAP guidelines. The ER positivity threshold for AQUA was determined using an Index TMA with positive and negative cell lines stained alongside YTMA 49. Number of positive and negative cases in each group are summarized in Table 1.
Table 1
Summary of ER assessment method discordance on YTMA 49.
AQUA
Pathologist
Aperio
Positive
Negative
Total
Positive
Positive
170
1
171
Negative
10
3
13
Total
180
4
184
Negative
Positive
0
0
0
Negative
5
44
49
Total
5
44
49
These discordant cases were carefully reviewed by an independent pathologist, who was not involved in previous readings, to determine reasons for discordance (images not shown). In the 1 case positive by the pathologist and Aperio, but negative by AQUA, there was clear nuclear fluorescent staining visual by eye, but the nuclear AQUA score for that case was 107, just barely below the threshold of 110 (in a set of scores which ranged from 0 to 12,500). In contrast, for the 5 cases positive by AQUA and negative by pathologist and Aperio scoring, low but clearly positive fluorescent nuclear staining can be seen by eye, whereas by chromogenic detection, no nuclear staining is detectable. This may be due to masking by the hematoxylin counterstain on these particular spots. Similarly, the 10 cases positive by pathologist and AQUA, but negative by Aperio, have clearly visual nuclear staining on both the fluorescent and chromogenic detection slides, but, for unknown reasons, the hematoxylin counterstain appears somewhat darker than most spots on the slide and was not detected by the Aperio algorithm. Finally, in the 3 cases which were positive by pathologist scoring and negative by the AQUA and Aperio algorithms, closer pathologist examination was unable to determine whether the cells considered positive contained extremely strong hematoxylin, or were in fact positive diaminobenzidine (spots appeared black).In an effort to test the flexibility and performance of the Aperio nuclear algorithm, we attempted to further adjust the RGB values for the counterstain levels to see if the algorithm would pick up the 10 false-negative cases. However, we were unable to find a set of values that would satisfy all cases. When settings were changed that allowed the algorithm to recognize these 10 cases as positive, the altered algorithm then classified clearly negative nuclei as positive in other cases, or picked up far fewer nuclei than were actually present.
Discussion
The 2010 ASCO-CAP guidelines for estrogen receptor assessment recommend image analysis to quantify percent positive tumor cells (5), especially as it is difficult to reliably score to a 1% threshold without laboriously counting individual cells. Aside from assisting pathologists, automated analysis systems such as the Aperio ScanScope XT and its associated algorithms have also been shown to be useful in discovery of more complex relationships between biomarkers (36). Here we show that one method of automated chromogenic assessment shows good reproducibility and prognostic value, but compared to fluorescence, is limited by the nature of chromogenic staining itself. Chromogenic staining requires a counterstain to provide context, but this counterstain introduces inherent complications to objective scoring. It is well-known that the quality and intensity of hematoxylin counterstaining varies among preparations, vendors and protocols, over the lifetime of the reagent, and also between cell and tissue types. The CAS-200 platform is an example of a system that required adjustments to account for counterstain differences between slides and batches (37). In the clinic, when a patient case has an obvious problem with the counterstain, the slide can be sent back and another stain requested. But, there is still a chance that even “acceptable” counterstaining can mask low-level chromogenic staining, whether by eye or by automated color-deconvolution (or spectral unmixing) analysis, as occurred in 5 cases in this study (38). Previous unpublished work from our lab suggests that there are a number of cases where dark staining with hematoxylin, due either to tissue variation or pathologist preference, has obscured low level ER expression to generate a false negative test.Fluorescent detection avoids the disadvantages and limitations of the hematoxylin counterstain, but has other limitations. Specifically, the absence of hematoxylin makes it challenging to generate the cellular context with a conventional IHC appearance. While additional fluorophores can be used to visualize other tissue features, the image is still quite different from conventional IHC. QIF is also generally costlier than traditional IHC.Unfortunately the cost analysis of automated ER evaluation in clinical lab settings is beyond the scope of this manuscript and this information is not accessible to us. One could imagine though that routine ER assessment might be performed using regular DAB based immunohistochemistry as established and just the cases that are negative by this assay could be sent out to laboratories that offer fluorescent based assays, taking advantage of increased sensitivity of this assay for low expressing biomarkers. Other advantages of QIF consist of broader dynamic range, dynamic adjustment of exposure time and decreased requirement for human interface for tumor selection.Perhaps the greatest advantage of quantitative immunofluorescence lies in the potential to generate a standard curve which can be used to establish a defined, reproducible cutoff for every assay. This method also has the potential to enable more accurate quantification of biomarker expression (38). Recent studies have demonstrated that quantification by ELISA can provide more accurate assessment of patient outcome than qualitative immunohistochemistry, and may even demonstrate a distinct benefit between negative, moderate, and strong ER positivity rather than just between positive and negative groups (39). This advantage extends beyond analysis of estrogen receptor in breast cancer to most accurate quantification of biomarker expression levels in various cancer and tissue types.While this study of comparison of different methods of ER analysis was performed in a rigorous and tightly controlled manner, it is subject to a number of limitations. Evaluation of ER expression was performed on TMAs, which allows a high through put approach, but does not truly represent the clinical setting where biopsies or whole tissue sections are routinely stained and evaluated for the biomarker in question. One can argue that discordances in ER assessment are due to the small amount of tumor represented in a 0.6mm TMA core. This might be a valid argument regarding ER heterogeneity, as 0.6 mm might not always represent the ER status of whole tissue sections. However the different staining methods were performed on serial sections, reducing heterogeneity between methods to a minimum. Also, it does not resolve the issue of false negative reading due to variability in hematoxylin staining intensity. Moreover the 3 methods of ER analysis were also compared on a number of whole tissue sections (around 25 samples for this study). These data were not shown in the manuscript, because they did not render additional information. The results of ER analysis on whole tissue sections using the different methods of assessment did not show any discrepancies, probably due to the low number of cases. Another limitation of this study is that, staining and analysis was performed within a single institution. While this approach guarantees consistency for pre-analytical tissue processing and analytical procedures, these results would be more robust if more than one laboratory participated in the study.Also, this study does not reveal a significant difference of ER reading methods in regards to survival analysis. However, this observation might be due to the relative small number of patients included in survival analysis. To determine the best prognostic and predictive value of these tests by Kaplan-Meier analysis a larger number of patients would need to be analyzed with all the 3 methods.In summary each of the methods of in situ protein detection in FFPE tissue samples has its strengths and weaknesses. While conventional DAB based IHC is a well-established and inexpensive procedure, reproducibility and sensitivity of the scoring is dependent on the counterstain and the reading method – by eye or automated. Quantitative immunofluorescence on the other hand offers an automated and standardized approach to biomarker evaluation. Higher sensitivity of the assay and broader dynamic range facilitate more exact measurements of protein concentrations. Increased costs of QIF and the absence of hematoxylin generating the cellular context with a conventional IHC appearance need to be considered.In theory, quantitative immunofluorescence can combine the best of both worlds – in situ evaluation of a biomarker and rigorous quantification. Our data here and previous work by others and us suggest that patient care may be improved with quantitative assessment. While the percentage of discordant cases in this study (8.2%) is relatively low, and in keeping with expected variability compared to other studies (40), a more objective estimate of ER positivity could benefit hundreds of thousands of women worldwide.
Table 2
Hazard Ratios for ER positivity in unselected breast cancer cohort YTMA 49 as diagnosed by different reading methods:
Authors: Loris De Cecco; Valeria Musella; Silvia Veneroni; Vera Cappelletti; Italia Bongarzone; Maurizio Callari; Barbara Valeri; Marco A Pierotti; Maria Grazia Daidone Journal: BMC Cancer Date: 2009-11-24 Impact factor: 4.430
Authors: Joel C Sunshine; Peter L Nguyen; Genevieve J Kaunitz; Tricia R Cottrell; Sneha Berry; Jessica Esandrio; Haiying Xu; Aleksandra Ogurtsova; Karen B Bleich; Toby C Cornish; Evan J Lipson; Robert A Anders; Janis M Taube Journal: Clin Cancer Res Date: 2017-04-20 Impact factor: 12.531
Authors: Aline Ramos Maia Lobba; Ana Claudia Oliveira Carreira; Otto Luiz Dutra Cerqueira; André Fujita; Carlos DeOcesano-Pereira; Cynthia Aparecida Bueno Osorio; Fernando Augusto Soares; Pranela Rameshwar; Mari Cleide Sogayar Journal: PLoS One Date: 2018-06-27 Impact factor: 3.240