| Literature DB >> 27499890 |
William J Howat1, Fiona M Blows2, Elena Provenzano3, Mark N Brook4, Lorna Morris5, Patrycja Gazinska6, Nicola Johnson1, Leigh-Anne McDuffus1, Jodi Miller1, Elinor J Sawyer7, Sarah Pinder8, Carolien H M van Deurzen9, Louise Jones10, Reijo Sironen11, Daniel Visscher12, Carlos Caldas1, Frances Daley13, Penny Coulson4, Annegien Broeks14, Joyce Sanders15, Jelle Wesseling15, Heli Nevanlinna16, Rainer Fagerholm16, Carl Blomqvist17, Päivi Heikkilä18, H Raza Ali1, Sarah-Jane Dawson1, Jonine Figueroa19, Jolanta Lissowska20, Louise Brinton19, Arto Mannermaa11, Vesa Kataja21, Veli-Matti Kosma11, Angela Cox22, Ian W Brock22, Simon S Cross23, Malcolm W Reed22, Fergus J Couch12, Janet E Olson24, Peter Devillee25, Wilma E Mesker26, Caroline M Seyaneve27, Antoinette Hollestelle27, Javier Benitez28, Jose Ignacio Arias Perez29, Primitiva Menéndez30, Manjeet K Bolla31, Douglas F Easton32, Marjanka K Schmidt33, Paul D Pharoah32, Mark E Sherman19, Montserrat García-Closas34.
Abstract
Breast cancer risk factors and clinical outcomes vary by tumour marker expression. However, individual studies often lack the power required to assess these relationships, and large-scale analyses are limited by the need for high throughput, standardized scoring methods. To address these limitations, we assessed whether automated image analysis of immunohistochemically stained tissue microarrays can permit rapid, standardized scoring of tumour markers from multiple studies. Tissue microarray sections prepared in nine studies containing 20 263 cores from 8267 breast cancers stained for two nuclear (oestrogen receptor, progesterone receptor), two membranous (human epidermal growth factor receptor 2 and epidermal growth factor receptor) and one cytoplasmic (cytokeratin 5/6) marker were scanned as digital images. Automated algorithms were used to score markers in tumour cells using the Ariol system. We compared automated scores against visual reads, and their associations with breast cancer survival. Approximately 65-70% of tissue microarray cores were satisfactory for scoring. Among satisfactory cores, agreement between dichotomous automated and visual scores was highest for oestrogen receptor (Kappa = 0.76), followed by human epidermal growth factor receptor 2 (Kappa = 0.69) and progesterone receptor (Kappa = 0.67). Automated quantitative scores for these markers were associated with hazard ratios for breast cancer mortality in a dose-response manner. Considering visual scores of epidermal growth factor receptor or cytokeratin 5/6 as the reference, automated scoring achieved excellent negative predictive value (96-98%), but yielded many false positives (positive predictive value = 30-32%). For all markers, we observed substantial heterogeneity in automated scoring performance across tissue microarrays. Automated analysis is a potentially useful tool for large-scale, quantitative scoring of immunohistochemically stained tissue microarrays available in consortia. However, continued optimization, rigorous marker-specific quality control measures and standardization of tissue microarray designs, staining and scoring protocols is needed to enhance results.Entities:
Keywords: automated scoring; breast tumours; digital pathology; immunohistochemistry; tissue microarrays
Year: 2014 PMID: 27499890 PMCID: PMC4858117 DOI: 10.1002/cjp2.3
Source DB: PubMed Journal: J Pathol Clin Res ISSN: 2056-4538
Description of study populations and TMA designs used by participating studies
| Study Acronym | Country | Cases | Age at diagnosis, mean (range) | TMA blocks | Cores per case | Cores per TMA | Core size (mm) | Total cores per study |
|---|---|---|---|---|---|---|---|---|
| ABCS | Netherlands | 1000 | 43 (23 50) | 26 | 1–6 | 12–241 | 0.6 | 3 314 |
| CNIO‐BCS | Spain | 171 | 60 (35 81) | 3 | 2–2 | 86–148 | 1.0 | 342 |
| HEBCS | Finland | 1154 | 56 (22 95) | 17 | 2–8 | 56–400 | 0.6 | 4 880 |
| KBCP | Finland | 392 | 59 (23 92) | 12 | 3–3 | 96–99 | 1.0 | 1 176 |
| MCBCS | USA | 348 | 58 (26 87) | 4 | 4–4 | 280–400 | 0.6 | 1 392 |
| ORIGO | Netherlands | 233 | 56 (27 88) | 3 | 3–9 | 237–310 | 0.6 | 841 |
| PBCS | Poland | 1406 | 56 (27 75) | 9 | 2–7 | 363–474 | 0.6 | 3 790 |
| SBCS | UK | 358 | 60 (30 92) | 11 | 3–8 | 90–156 | 0.6 | 1 320 |
| SEARCH | UK | 3205 | 52 (24 70) | 19 | 1–2 | 152–172 | 0.6 | 3 208 |
| Totals | 8267 | 53 (22 95) | 104 | 1–9 | 12–474 | 0.6–1.0 | 20 263 |
Distribution of quality control measures for tissue cores stained for ER, PR and HER2 in the virtual TMAs
| Quality control category | ER | PR | HER2 | |||
|---|---|---|---|---|---|---|
|
| % |
| % |
| % | |
| Satisfactory Core (invasive tumour) | 649 | 69 | 679 | 71 | 672 | 67 |
| DCIS only | 61 | 6 | 52 | 5 | 82 | 8 |
| No Tumour, few tumour cells | 123 | 13 | 98 | 10 | 126 | 13 |
| No core | 38 | 4 | 32 | 3 | 48 | 5 |
| Unsatisfactory core for other reasons | 71 | 8 | 91 | 10 | 70 | 7 |
| Total | 942 | 952 | 998 | |||
Figure 1Distribution of ER continuous automated scores and ER visual ordinal scores in virtual TMAs. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the Allred score for the corresponding core by visual scoring. The red curve represents the cut‐off point for positive/negative status by the ROC method. (B) Distribution of Allred visual scores (rater 1). (C) Distribution intensity*percent automated scores used in the ROC method. (D) Boxplot of the distribution of the intensity*percent automated score by categories of the Allred visual score. (E) Boxplot of the distribution of the intensity*percent automated score by visual positive/negative status. Red lines in C–E show the positive/negative cut‐off points for the corresponding automated score.
Figure 2Distribution of PR continuous automated scores and PR visual ordinal scores in virtual TMAs. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the Allred score for the corresponding core by visual scoring. The red curve represents the cut‐off point for positive/negative status by the ROC method. (B) Distribution of Allred visual scores (rater 1). (C) Distribution of intensity*percent automated scores. (D) Boxplot of the intensity*percent automated score by categories of the Allred visual score. (E) Boxplot of the intensity*percent automated score by visual positive/negative status. Red lines in C–E show the positive/negative cut‐off points for the corresponding automated score.
Inter‐rater agreement and agreement between each rater and Ariol automated quantitative ER, PR scores for cores in the virtual TMA
| Marker | Comparison | N | % Pos. | Continuous automated score | Dichotomous automated score | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| AUC (95%CI) | Observed agreement | Kappa (95%CI) | Se (%) | Sp (%) | PPV (%) | NPV (%) | ||||
| ER | Rater 1 vs rater 2 | 615 | 76.3 | n/a | 96.7 | 0.91 (0.83, 0.99) | 98.3 | 91.8 | 97.5 | 94.4 |
| Ariol vs rater 1 | 587 | 75.0 | 0.97 (0.95, 0.98) | 90.1 | 0.76 (0.68, 0.84) | 89.5 | 91.8 | 97.0 | 74.6 | |
| Ariol vs rater 2 | 636 | 76.4 | 0.96 (0.95, 0.98) | 90.1 | 0.75 (0.67, 0.83) | 88.9 | 94.0 | 98.0 | 72.3 | |
| PR | Rater 1 vs rater 2 | 655 | 67.0 | n/a | 96.8 | 0.93 (0.85, 1.00) | 97.5 | 95.4 | 97.7 | 94.9 |
| Ariol vs rater 1 | 624 | 67.3 | 0.93 (0.91, 0.95) | 83.8 | 0.65 (0.57, 0.73) | 82.9 | 85.8 | 92.3 | 70.9 | |
| Ariol vs rater 2 | 634 | 66.6 | 0.93 (0.91, 0.95) | 84.4 | 0.66 (0.59, 0.74) | 83.6 | 85.8 | 92.2 | 72.5 | |
Raters scores are dichotomous (positive/negative), and Ariol automated scores are considered as continuous and dichotomous.
% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.
Inter‐rater agreement and agreement between each rater and Ariol automated quantitative HER2 scores for cores in the virtual TMA
| HER2 semi‐quantitative score (0/1, 2,3) |
| Observed agreement | Kappa (95% CI) |
|---|---|---|---|
| Comparisons | |||
| Rater 1 vs rater2 | 660 | 92.7 | 0.71 (0.65, 0.78) |
| Ariol vs rater 1 | 693 | 90.7 | 0.62 (0.56, 0.68) |
| Ariol vs rater 2 | 716 | 93.7 | 0.71 (0.65, 0.77) |
% Pos.=% positive cores for reference rater
Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.
Figure 3Distribution of ER (A–C) and PR (D–F) continuous automated scores (subject level) and positive/negative status in BCAC database, including 6424 cases for ER and 6385 cases for PR from nine studies. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the BCAC ER status (red for positive and blue for negative). The red curve represents the cut‐off point for positive/negative status by the ROC method. The smaller inserted plots show ER‐positive and ER‐negative cases separately (B) Distribution of intensity*percent automated scores. (C) Boxplot of the intensity*percent automated score by BCAC ER status. Red lines show the positive/negative cut‐off points for the corresponding automated score. Figures D–F show similar plots for PR.
Agreement between Ariol automated quantitative ER, PR and HER2 scores for each subject and marker status from clinical/study records
| Marker | N | % Pos. | Continuous automated score | Dichotomous automated score | |||||
|---|---|---|---|---|---|---|---|---|---|
| AUC (95%CI) | Observed agreement | Kappa (95%CI) | Se (%) | Sp (%) | PPV (%) | NPV (%) | |||
| ER | 6424 | 74.5 | 0.89 (0.89, 0.90) | 84.1 | 0.62 (0.59, 0.64) | 84.6 | 82.7 | 93.4 | 64.7 |
| PR | 6385 | 63.6 | 0.87 (0.86, 0.88) | 80.0 | 0.57 (0.55, 0.60) | 82.5 | 75.7 | 85.6 | 71.2 |
| HER2 2+ | 6322 | 15.5 | – | 88.9 | 0.62 (0.59, 0.64) | 77.2 | 91.0 | 61.3 | 95.6 |
| HER2 3+ | 6322 | 15.5 | – | 89.2 | 0.43 (0.41, 0.44) | 31.8 | 99.7 | 95.4 | 88.8 |
Clinical/study scores are dichotomous (positive/negative), and ER, PR Ariol scores are considered both as continuous and dichotomous.
% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.
Figure 4Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by ER, based on 6135 subjects and 981 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), (B) dichotomized (positive/negative) automated scores and (C) automated scores classified in quintiles.
Figure 5Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6115 subjects and 998 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), (B) dichotomized (positive/negative) automated scores and (C) automated scores classified in quintiles.
Figure 6Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6039 subjects and 997 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), and (B) semi‐quantitative automated scores.
Inter‐rater agreement and agreement between Ariol automated CK56 and EGFR scores (dichotomized using the ROC method) for cores in TMAs from participating studies
| Marker | Comparison | N | % Pos. | Observed Agreement | Kappa (95%CI) | Se (%) | Sp (%) | PPV (%) | NPV (%) |
|---|---|---|---|---|---|---|---|---|---|
| CK5/6 | Inter‐rater agreement | 357 | 11.9 | 91.6 | 0.74 (0.66, 0.83) | 96.6 | 90.6 | 67.1 | 99.3 |
| Ariol vs rater | 1897 | 10.4 | 89.4 | 0.49 (0.44, 0.53) | 61.6 | 92.6 | 49.2 | 95.4 | |
| Ariol vs rater | 1107 | 6.4 | 89.1 | 0.41 (0.35, 0.46) | 71.8 | 90.3 | 33.6 | 97.9 | |
| Ariol vs rater | 360 | 21.1 | 86.9 | 0.57 (0.47, 0.67) | 56.6 | 95.1 | 75.4 | 89.1 | |
| EGFR | Inter‐rater agreement | 760 | 10.5 | 94.5 | 0.73 (0.66, 0.81) | 90.7 | 94.9 | 66.0 | 98.9 |
| Ariol vs rater | 1914 | 9.8 | 84.1 | 0.44 (0.40, 0.48) | 87.7 | 83.7 | 36.9 | 98.4 | |
| Ariol vs rater | 1041 | 1.3 | 86.2 | 0.14 (0.11, 0.17) | 100.0 | 86.0 | 8.9 | 100.0 | |
| Ariol vs rater | 342 | 39.5 | 82.2 | 0.63 (0.53, 0.74) | 83.7 | 81.2 | 74.3 | 88.4 |
% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.
Includes data from CNIO‐BCS, MCBCS, ORIGO, SBCS.
Includes data from SEARCH re‐analysis.
Includes data from ABCS, CNIO‐BCS, KBCP, MCBCS, ORIGO, SBCS.