| Literature DB >> 28721378 |
Samuel C Y Leung1, Torsten O Nielsen1, Lila Zabaglo2, Indu Arun3, Sunil S Badve4, Anita L Bane5, John M S Bartlett6, Signe Borgquist7, Martin C Chang8, Andrew Dodson2, Rebecca A Enos9, Susan Fineberg10, Cornelia M Focke11, Dongxia Gao1, Allen M Gown12, Dorthe Grabau7, Carolina Gutierrez13, Judith C Hugh14, Zuzana Kos15, Anne-Vibeke Lænkholm16, Ming-Gang Lin17, Mauro G Mastropasqua18, Takuya Moriya19, Sharon Nofech-Mozes20, C Kent Osborne13, Frédérique M Penault-Llorca21, Tammy Piper22, Takashi Sakatani23, Roberto Salgado24, Jane Starczynski25, Giuseppe Viale26, Daniel F Hayes27, Lisa M McShane28, Mitch Dowsett2.
Abstract
Pathological analysis of the nuclear proliferation biomarker Ki67 has multiple potential roles in breast and other cancers. However, clinical utility of the immunohistochemical (IHC) assay for Ki67 immunohistochemistry has been hampered by unacceptable between-laboratory analytical variability. The International Ki67 Working Group has conducted a series of studies aiming to decrease this variability and improve the evaluation of Ki67. This study tries to assess whether acceptable performance can be achieved on prestained core-cut biopsies using a standardized scoring method. Sections from 30 primary ER+ breast cancer core biopsies were centrally stained for Ki67 and circulated among 22 laboratories in 11 countries. Each laboratory scored Ki67 using three methods: (1) global (4 fields of 100 cells each); (2) weighted global (same as global but weighted by estimated percentages of total area); and (3) hot-spot (single field of 500 cells). The intraclass correlation coefficient (ICC), a measure of interlaboratory agreement, for the unweighted global method (0.87; 95% credible interval (CI): 0.81-0.93) met the prespecified success criterion for scoring reproducibility, whereas that for the weighted global (0.87; 95% CI: 0.7999-0.93) and hot-spot methods (0.84; 95% CI: 0.77-0.92) marginally failed to do so. The unweighted global assessment of Ki67 IHC analysis on core biopsies met the prespecified criterion of success for scoring reproducibility. A few cases still showed large scoring discrepancies. Establishment of external quality assessment schemes is likely to improve the agreement between laboratories further. Additional evaluations are needed to assess staining variability and clinical validity in appropriate cohorts of samples.Entities:
Year: 2016 PMID: 28721378 PMCID: PMC5515324 DOI: 10.1038/npjbcancer.2016.14
Source DB: PubMed Journal: NPJ Breast Cancer ISSN: 2374-4677
Summary of ICC values for different scoring methods
| Unweighted global | 0.87 (95% CI: 0.81–0.93) | 0.88 (95% CI: 0.81–0.93) |
| Weighted global | 0.87 (95% CI: 0.7999–0.93) | 0.87 (95% CI: 0.80–0.93) |
| Hot-spot | 0.84 (95% CI: 0.77–0.92) | 0.84 (95% CI: 0.77–0.92) |
Abbreviations: CI, credible interval; ICC, intraclass correlation coefficient.
Figure 1Ki67 scores (a, unweighted global; b, weighted global; c, hot-spot) of all 22 laboratories (by group): black for Group 1, medium gray for Group 2, and light gray for Group 3. Laboratories are ordered (within each group) by the median scores. The bottom/top of the box in each box plot represent the first (Q1)/third (Q3) quartiles, the bold line inside the box represents the median and the two bars outside the box represent the lowest/highest datum still within 1.5×the interquartile range (Q3–Q1). Outliers are represented with empty circles.
Figure 2Variance component analysis. Variation due to different components are presented in a bar plot to show the relative magnitude differences between them. Numeric values of the variance components estimates and the corresponding credible intervals are shown in Supplementary Table 5.
Figure 3Variability in Ki67 scores (a, c and e correspond to Group 1; b, d and f correspond to Group 3). Each line represents Ki67 scores from one laboratory. Shaded region indicates Ki67 scores between 10 and 20%. Scores from Group 2 are not shown since there are only two laboratories in this group.
Figure 4Heat map of Ki67 scores (a: unweighted global; b: weighted global; c: hot-spot). Rows represent cases and columns represent laboratories. Green color indicate that the score is <10%, yellow 10–20%, and red >20%. Cases are ordered by the median scores (across laboratories), which are shown in parentheses beside the specimen number. Laboratories are ordered (within each group) by the median scores (across cases). The three colon-separated numbers to the right of the table represent the number of laboratories giving scores falling into different ranges: <10% (left-most), 10–20% (middle) and >20% (right-most). For example, ‘15:6:1’ indicates that 15 laboratories gave a score of <10%, six laboratories between 10 and 20% and one laboratory >20%.
Figure 5Hot-spot field selection by different laboratories on the same core-cut biopsy slide. (a) Selections (indicated by red circles) on some example core biopsies. (b) Example of a single-core biopsy (median score: 12%) with zoomed-in fields. Each laboratory was asked to circle the area considered by that laboratory to be the hot-spot (b-i). Most pathologists honed in on the same area of the core, although individual-selected circular scoring fields do not always overlap. (b-iii, b-iv) Segments of the same area chosen by two different laboratories to read Ki67. (b-v) The ‘outlier’ field selected by only one laboratory as the hot-spot.