Miglena Komforti1, Erinn Downs-Kelly1, Francisco Sapunar2, Sameera R Wijayawardana3, Aaron M Gruver4, Sunil S Badve5,6. 1. Robert J. Tomsich Pathology & Laboratory Medicine Institute, Cleveland Clinic, Cleveland, OH. 2. Global Medical Affairs, Eli Lilly and Company, Basingstoke, UK. 3. Global Statistical Sciences - Oncology. 4. Clinical Laboratory Sciences, Eli Lilly and Company. 5. Department of Pathology & Laboratory Medicine, Indiana University School of Medicine, Indianapolis IN. 6. Pathology & Laboratory Medicine, Emory University School of Medicine, Atlanta, GA.
Abstract
The objective of this study was to measure concordance of results obtained from the US Food and Drug Administration-approved Ki-67 immunohistochemistry MIB-1 pharmDx assay performed on the Dako Omnis automated staining instrument (Omnis) versus results produced from the assay reagents applied using an optimized protocol on the more widely available Autostainer Link 48 (ASL48) platform. Tissue sections obtained from 40 formalin-fixed paraffin-embedded breast carcinoma samples, with available Oncotype DX Breast Recurrence Score (RS) results, were stained. Three certified pathologists scored slides at 3 timepoints, totaling 360 observations for each instrument (N=720 total) using the approved scoring approach. Using the ≥20% cutoff, agreement was calculated with corresponding 2-sided 95% percentile bootstrap confidence intervals (CIs). Pairwise comparisons (N=360) from the interinstrument evaluation, performed with all observers, resulted in 325 (90.3%) concordant outcomes (244 negative and 81 positive) and 35 (9.7%) discordant outcomes. The overall agreement was 90.3% (95% confidence interval, 85.6% to 94.4%). No significant systematic differences were observed between instruments. Specimens scored from the Omnis were on average <1% higher than ASL48, with high correlation and little bias between the continuous Ki-67 scores (concordance correlation coefficient=0.916). Most specimens with a Ki-67 score ≥20% had a RS >25. This study demonstrated that good concordance can be achieved with the reagents run on the ASL48 instrument when using an optimized protocol and standardized scoring.
The objective of this study was to measure concordance of results obtained from the US Food and Drug Administration-approved Ki-67 immunohistochemistry MIB-1 pharmDx assay performed on the Dako Omnis automated staining instrument (Omnis) versus results produced from the assay reagents applied using an optimized protocol on the more widely available Autostainer Link 48 (ASL48) platform. Tissue sections obtained from 40 formalin-fixed paraffin-embedded breast carcinoma samples, with available Oncotype DX Breast Recurrence Score (RS) results, were stained. Three certified pathologists scored slides at 3 timepoints, totaling 360 observations for each instrument (N=720 total) using the approved scoring approach. Using the ≥20% cutoff, agreement was calculated with corresponding 2-sided 95% percentile bootstrap confidence intervals (CIs). Pairwise comparisons (N=360) from the interinstrument evaluation, performed with all observers, resulted in 325 (90.3%) concordant outcomes (244 negative and 81 positive) and 35 (9.7%) discordant outcomes. The overall agreement was 90.3% (95% confidence interval, 85.6% to 94.4%). No significant systematic differences were observed between instruments. Specimens scored from the Omnis were on average <1% higher than ASL48, with high correlation and little bias between the continuous Ki-67 scores (concordance correlation coefficient=0.916). Most specimens with a Ki-67 score ≥20% had a RS >25. This study demonstrated that good concordance can be achieved with the reagents run on the ASL48 instrument when using an optimized protocol and standardized scoring.
Approximately 90% of patients with breast cancer are diagnosed at an early disease stage, and the hormone receptor-positive (HR+), human epidermal growth factor receptor 2-negative (HER2−) subtype accounts for nearly 73% of all breast cancers.1 These patients are treated with curative intent and are candidates for surgery with or without radiotherapy depending upon the extent of regional disease.2,3 After surgery, adjuvant treatment is based on the estimated risk of disease recurrence and predicted sensitivity to available systemic therapies.2,3 Validated clinical and pathologic features that indicate a higher risk of distant disease recurrence include large primary tumor size, axillary lymph node involvement, and high histologic grade.4–6 As many as 41% of women with HR+ breast cancer initially diagnosed with early-stage disease experience distant recurrence after standard of care, including adjuvant endocrine therapy (ET), depending upon lymph node status and tumor grade.7The Ki-67 antigen is a nuclear protein expressed during all active phases of the mammalian cell cycle (G1, S, G2, and M-phases) and downregulated in resting cells (G0-phase).8 Ki-67 expression has been widely studied as a marker of cell proliferation and as an independent prognostic factor in early breast cancer (EBC).9 In HR+ breast cancer, patients with high levels of Ki-67 after surgery have been shown to have higher recurrence rates while receiving adjuvant ET.10Abemaciclib was the first cyclin-dependent kinase 4 and 6 (CDK 4/6) inhibitor to receive US Food and Drug Administration (FDA) approval, in combination with ET (tamoxifen or an aromatase inhibitor), for the adjuvant treatment of adult patients with HR+, HER2−, node-positive EBC at high risk of recurrence and a Ki-67 score of ≥20%, as determined by an FDA-approved test.11 This approval was based on the data from the monarchE trial, which is a randomized, open-label, global phase 3 study comparing treatment with abemaciclib combined with standard adjuvant ET versus ET alone in patients with high risk, node-positive, early stage, HR+, HER2− breast cancer.12 In this population, adjuvant abemaciclib combined with ET resulted in clinically meaningful improvements in invasive disease-free and distant recurrence-free survival outcomes.13,14Until recently, the lack of standardized procedures or accepted cutoff definitions have limited the application of Ki-67 assessment during breast cancer workup in some geographies.9,15 Therefore, an investigational assay performed with a standardized protocol and interpretation guidelines was developed to help determine the eligibility of patients with high-risk EBC as part of the monarchE trial.16 With the abemaciclib approval, described earlier, came the simultaneous US FDA approval of the investigational Ki-67 immunohistochemistry (IHC) assay used in monarchE to serve as a companion diagnostic to aid in identifying patients with EBC at high risk of disease recurrence for whom adjuvant treatment with abemaciclib in combination with ET is being considered.17 Although Ki-67 IHC is a commonly performed assay using a variety of commercially available in vitro diagnostic reagents, many diagnostic laboratories do not have access to specific automated IHC instruments, which are a part of Class III devices approved by the US FDA (ie, the Dako Omnis). However, similar instruments such as the ASL48 are more available in US laboratories because of the proliferation of PD-L1 IHC testing.18,19 Knowing this limitation, the following study was conducted to evaluate the analytic performance of Ki-67 IHC MIB-1 pharmDx reagents applied using the Dako Omnis automated staining instrument (Omnis) compared with an optimized protocol on the more prevalent Dako Autostainer Link 48 (ASL48) instrument. Here, we test the hypothesis that these reagents can produce similar results on either staining platform.
MATERIALS AND METHODS
Deidentified human tissue was obtained according to the protocols and procedures of the Institutional Review Board at Indiana University School of Medicine (#2001022404). The specimens selected were HR+, HER2−, as determined by the IHC staining for estrogen receptor, progesterone receptor, and HER2 expression before study inclusion. All specimens were also previously tested via the Oncotype DX assay (Exact Sciences, Redwood City, CA), and historical RS information was used for comparisons.20 Sections from these blocks were cut at 4 µm thickness, mounted on positively charged glass slides, oven dried at 58±2°C for ~1 hour, and stored in the dark at 2 to 8°C until staining (within 1 mo of microtomy). The Ki-67 IHC MIB-1 pharmDx (Dako Omnis) (GE020) kit, necessary additional materials referenced in the package insert,21 and automated IHC instruments were obtained from Agilent Technologies Inc. (Santa Clara, CA). An optimal protocol for the ASL48 instrument was selected before study initiation, and staining was performed as described in Table 1. The FDA-approved assay was performed according to the instructions for use.
TABLE 1
Immunohistochemistry Procedure on the Omnis Versus ASL48 Automated IHC Instrument
Staining Step
Reagent
Omnis Procedure Incubation, min:s (Number of Cycles)
ASL48 Procedure Incubation, min:s (Number of Cycles)
Immunohistochemistry Procedure on the Omnis Versus ASL48 Automated IHC InstrumentASL48 indicates Dako Autostainer Link 48; DI, deionized; HRP, horseradish peroxidase; IHC, immunohistochemistry; min, minutes; s, seconds.The final test set comprised 30 negative (Ki-67 score <20%) and 10 positive (Ki-67 score ≥20%) samples, as determined by the consensus IHC status of the Omnis scores across all observers and reads. Of the 40 samples, 19 (47.5%) were near the diagnostic cutoff established for the monarchE phase 3 clinical study (defined as a Ki-67 score in the 10% to 30% range16). Specimens were examined by 3 certified pathologists, at 3 different timepoints (40 slides × 9 observations/slide=360 readings for each instrument or 720 data points total) (Fig. 1). Evaluations were performed utilizing whole-slide images captured using a PANNORAMIC 250 scanner (3DHISTECH, Budapest, Hungary) at ×40 magnification and displayed utilizing the PathoTrainer whole-slide image viewer (CellCarta, Belgium). Immunoreactivity was assessed using the Ki-67 pharmDx Score, as described.22 The run-to-run variation for the Ki-67 IHC MIB-1 pharmDx assay (eg, interday, interinstrument, interlot, and interrack) has been previously described.16 For the accuracy of results and the reduction of recall bias, observers were blinded to the identifiers for specimens used in all studies. A minimum 14-day washout period was applied between reads (Fig. 1).
FIGURE 1
Study design for comparison of the Ki-67 immunohistochemistry (IHC) MIB-1 pharmDx reagents run on the Dako Omnis and Dako Autostainer Link48 instruments. aA minimum 14-day washout period was applied between reads. ASL48 indicates Dako Autostainer Link 48; BC, breast carcinoma; H&E, hematoxylin and eosin; NCR, negative control reagent; OMNIS, Dako Omnis automated staining instrument.
Study design for comparison of the Ki-67 immunohistochemistry (IHC) MIB-1 pharmDx reagents run on the Dako Omnis and Dako Autostainer Link48 instruments. aA minimum 14-day washout period was applied between reads. ASL48 indicates Dako Autostainer Link 48; BC, breast carcinoma; H&E, hematoxylin and eosin; NCR, negative control reagent; OMNIS, Dako Omnis automated staining instrument.For the concordance analysis, 2-sided 95% confidence intervals (CIs) were calculated using intraobserver and intraread pairwise comparisons. Comparisons were made on the IHC status (positive/negative) between each test condition (ASL48) and the consensus reference condition (Omnis) for each specimen within each read by each observer. The total number of comparisons per specimen is therefore equal to the total number of test scores, excluding the reference score (3 observers × 3 reads per observer=9 comparisons per sample). Calculations of negative percent agreement, positive percent agreement, and overall percent agreement were performed. As multiple ASL48 sections per specimen were compared with the reference, a nonparametric percentile bootstrap was used to calculate 2-sided 95% CIs. Continuous Ki-67 percentage scores from both instruments were evaluated using scatter and difference plots. The concordance correlation coefficient (CCC) with corresponding 95% CI was also calculated to evaluate correlation and bias in the data. The CCC values close to 1 represent paired data in which high correlation and no bias between the 2 conditions is observed.23 Mean Ki-67 percentage scores (instrument/observer/read) were used to evaluate the relationship with RS.
RESULTS AND DISCUSSION
Qualitative assessment of the stained slides demonstrated the immunoreactivity produced with reagents run on the ASL48 was similar but slightly more intense when compared with the Omnis instrument on average (Fig. 2). Concordance analysis resulted in a point estimate for positive percent agreement of 79.4% (95% CI, 69.3% to 88.5%), negative percent agreement of 94.6% (95% CI, 91.1% to 97.5%), and overall percent agreement of 90.3% (95% CI, 85.6% to 94.4%), indicating good overall concordance in samples when stained using either the Omnis or ASL48 instruments based on consensus reference scores (Table 2).
FIGURE 2
Representative images captured from tumors with low (A and B), medium (C and D), and high (E and F) Ki-67 immunoreactivity. Tissues shown in A, C, and E were stained with the ASL48, whereas those in B, D, and F were stained with the Omnis platform. Scanned slide images were obtained at ×10 zoom level.
Representative images captured from tumors with low (A and B), medium (C and D), and high (E and F) Ki-67 immunoreactivity. Tissues shown in A, C, and E were stained with the ASL48, whereas those in B, D, and F were stained with the Omnis platform. Scanned slide images were obtained at ×10 zoom level.Summary Interinstrument Comparison ResultsNPA indicates negative percent agreement; OA, overall percent agreement; PPA, positive percent agreement.Continuous score analysis indicates high correlation and little to no bias between the 2 staining platforms. The CCC point estimate is 0.916 (Fig. 3A), and the average score obtained from the Omnis instrument was <1% higher than the corresponding ASL48 score (Fig. 3B). There was also no apparent bias in score variability between or within observers associated with the overall mean Ki-67 percentage score when results were evaluated by specimen (Fig. 3C). When comparing the mean Ki-67 percentage score against the RS (Fig. 4), the Ki-67 percentage score tended to increase with greater RS when RS ≥25. In this limited data set, this tendency is suggestive of possible higher concordance of Ki-67 (using the assay cutoff of ≥20%) and the corresponding RS when RS ≥25.
FIGURE 3
Continuous score analysis to evaluate correlation and bias between 2 staining platforms. A, The scatter plot visualizes the paired Omnis and ASL48 Ki-67 scores within each read and observer. Points that fall along the dotted diagonal line represent perfectly correlated scores between the instruments. The Ki-67 ≥20% cutoff is visualized by the horizontal and vertical solid black lines. Points falling within the bottom left and top right quadrant of the cutoff lines represent true negative and true positive comparisons, respectively, when using the outcome of the Omnis score as a reference. Points that fall within the top left and bottom right quadrants represent false positive and false negative comparisons, respectively, when using the Omnis score as reference. CCC indicates concordance correlation coefficient. B, The difference plot compares the delta in paired scores (y-axis = Omnis − ASL48) against the corresponding Omnis score (x-axis) for a given pair. The dashed horizontal line represents a delta of 0, meaning no difference in Ki-67 score between Omnis and ASL48. The solid horizontal line represents the mean delta across all observations. C, Intraobserver scores per specimen, split by instrument and observer.
FIGURE 4
Mean Ki-67 percentage scores plotted with RS obtained from the same tissue blocks. The box insert highlights scores in the 11 to 25 RS range. RS, recurrence score.
Continuous score analysis to evaluate correlation and bias between 2 staining platforms. A, The scatter plot visualizes the paired Omnis and ASL48 Ki-67 scores within each read and observer. Points that fall along the dotted diagonal line represent perfectly correlated scores between the instruments. The Ki-67 ≥20% cutoff is visualized by the horizontal and vertical solid black lines. Points falling within the bottom left and top right quadrant of the cutoff lines represent true negative and true positive comparisons, respectively, when using the outcome of the Omnis score as a reference. Points that fall within the top left and bottom right quadrants represent false positive and false negative comparisons, respectively, when using the Omnis score as reference. CCC indicates concordance correlation coefficient. B, The difference plot compares the delta in paired scores (y-axis = Omnis − ASL48) against the corresponding Omnis score (x-axis) for a given pair. The dashed horizontal line represents a delta of 0, meaning no difference in Ki-67 score between Omnis and ASL48. The solid horizontal line represents the mean delta across all observations. C, Intraobserver scores per specimen, split by instrument and observer.Mean Ki-67 percentage scores plotted with RS obtained from the same tissue blocks. The box insert highlights scores in the 11 to 25 RS range. RS, recurrence score.The recent FDA-approved Ki-67 IHC assay is a qualitative test using a MIB-1 monoclonal mouse anti–Ki-67 antibody with a polymer-based detection system on the Dako Omnis platform.24 Although the results from the UK National External Quality study highlight the importance of matching primary antibody clones and other factors to ensure quality of Ki-67 testing,25 access to specific automated IHC instruments is a real-world limitation for many laboratories. This report demonstrates that good concordance is achievable when samples are stained with these reagents using the ASL48 instrument. Furthermore, these data suggest that there is no inherent bias in pathologist scores from the same sample stained using the Ki-67 IHC MIB-1 pharmDx reagents on the ASL48 instrument when applying the ≥20% cutoff with the standardized scoring instructions.The results of this technical study should be considered within the context of some limitations, including a relatively small sample size adequate for the initial evaluation of nonpredictive factor assays,26 and a distribution enriched toward cases with lower levels of Ki-67 immunoreactivity around the diagnostic cutoff. These limitations were counterbalanced by the use of multiple pathologists who provided independent reads for each tissue, assessed by both automated IHC instruments, during multiple reading sessions using controls to reduce recall bias. Pathologists should assess individual IHC assay performance in their local environment when evaluating the purpose and potential role of a laboratory-developed test.27 Future studies to evaluate how the FDA-approved test performs compared with other commonly used in vitro diagnostic assays, and its correlation with the RS, will further inform local testing procedures for the assessment of Ki-67 IHC in high-risk EBC.
Authors: Soonmyung Paik; Steven Shak; Gong Tang; Chungyeul Kim; Joffre Baker; Maureen Cronin; Frederick L Baehner; Michael G Walker; Drew Watson; Taesung Park; William Hiller; Edwin R Fisher; D Lawrence Wickerham; John Bryant; Norman Wolmark Journal: N Engl J Med Date: 2004-12-10 Impact factor: 91.245
Authors: Emina E Torlakovic; Carol C Cheung; Corrado D'Arrigo; Manfred Dietel; Glenn D Francis; C Blake Gilks; Jacqueline A Hall; Jason L Hornick; Merdol Ibrahim; Antonio Marchetti; Keith Miller; J Han van Krieken; Soren Nielsen; Paul E Swanson; Mogens Vyberg; Xiaoge Zhou; Clive R Taylor Journal: Appl Immunohistochem Mol Morphol Date: 2017-03
Authors: Eleftherios P Mamounas; Gong Tang; Soonmyung Paik; Frederick L Baehner; Qing Liu; Jong-Hyeon Jeong; S Rim Kim; Steven M Butler; Farid Jamshidian; Diana B Cherbavaz; Amy P Sing; Steven Shak; Thomas B Julian; Barry C Lembersky; D Lawrence Wickerham; Joseph P Costantino; Norman Wolmark Journal: Breast Cancer Res Treat Date: 2017-11-11 Impact factor: 4.872
Authors: Hongchao Pan; Richard Gray; Jeremy Braybrooke; Christina Davies; Carolyn Taylor; Paul McGale; Richard Peto; Kathleen I Pritchard; Jonas Bergh; Mitch Dowsett; Daniel F Hayes Journal: N Engl J Med Date: 2017-11-09 Impact factor: 91.245
Authors: Mitch Dowsett; Torsten O Nielsen; Roger A'Hern; John Bartlett; R Charles Coombes; Jack Cuzick; Matthew Ellis; N Lynn Henry; Judith C Hugh; Tracy Lively; Lisa McShane; Soon Paik; Frederique Penault-Llorca; Ljudmila Prudkin; Meredith Regan; Janine Salter; Christos Sotiriou; Ian E Smith; Giuseppe Viale; Jo Anne Zujewski; Daniel F Hayes Journal: J Natl Cancer Inst Date: 2011-09-29 Impact factor: 13.506