Literature DB >> 27499890

Performance of automated scoring of ER, PR, HER2, CK5/6 and EGFR in breast cancer tissue microarrays in the Breast Cancer Association Consortium.

William J Howat¹, Fiona M Blows², Elena Provenzano³, Mark N Brook⁴, Lorna Morris⁵, Patrycja Gazinska⁶, Nicola Johnson¹, Leigh-Anne McDuffus¹, Jodi Miller¹, Elinor J Sawyer⁷, Sarah Pinder⁸, Carolien H M van Deurzen⁹, Louise Jones¹⁰, Reijo Sironen¹¹, Daniel Visscher¹², Carlos Caldas¹, Frances Daley¹³, Penny Coulson⁴, Annegien Broeks¹⁴, Joyce Sanders¹⁵, Jelle Wesseling¹⁵, Heli Nevanlinna¹⁶, Rainer Fagerholm¹⁶, Carl Blomqvist¹⁷, Päivi Heikkilä¹⁸, H Raza Ali¹, Sarah-Jane Dawson¹, Jonine Figueroa¹⁹, Jolanta Lissowska²⁰, Louise Brinton¹⁹, Arto Mannermaa¹¹, Vesa Kataja²¹, Veli-Matti Kosma¹¹, Angela Cox²², Ian W Brock²², Simon S Cross²³, Malcolm W Reed²², Fergus J Couch¹², Janet E Olson²⁴, Peter Devillee²⁵, Wilma E Mesker²⁶, Caroline M Seyaneve²⁷, Antoinette Hollestelle²⁷, Javier Benitez²⁸, Jose Ignacio Arias Perez²⁹, Primitiva Menéndez³⁰, Manjeet K Bolla³¹, Douglas F Easton³², Marjanka K Schmidt³³, Paul D Pharoah³², Mark E Sherman¹⁹, Montserrat García-Closas³⁴.

Abstract

Breast cancer risk factors and clinical outcomes vary by tumour marker expression. However, individual studies often lack the power required to assess these relationships, and large-scale analyses are limited by the need for high throughput, standardized scoring methods. To address these limitations, we assessed whether automated image analysis of immunohistochemically stained tissue microarrays can permit rapid, standardized scoring of tumour markers from multiple studies. Tissue microarray sections prepared in nine studies containing 20 263 cores from 8267 breast cancers stained for two nuclear (oestrogen receptor, progesterone receptor), two membranous (human epidermal growth factor receptor 2 and epidermal growth factor receptor) and one cytoplasmic (cytokeratin 5/6) marker were scanned as digital images. Automated algorithms were used to score markers in tumour cells using the Ariol system. We compared automated scores against visual reads, and their associations with breast cancer survival. Approximately 65-70% of tissue microarray cores were satisfactory for scoring. Among satisfactory cores, agreement between dichotomous automated and visual scores was highest for oestrogen receptor (Kappa = 0.76), followed by human epidermal growth factor receptor 2 (Kappa = 0.69) and progesterone receptor (Kappa = 0.67). Automated quantitative scores for these markers were associated with hazard ratios for breast cancer mortality in a dose-response manner. Considering visual scores of epidermal growth factor receptor or cytokeratin 5/6 as the reference, automated scoring achieved excellent negative predictive value (96-98%), but yielded many false positives (positive predictive value = 30-32%). For all markers, we observed substantial heterogeneity in automated scoring performance across tissue microarrays. Automated analysis is a potentially useful tool for large-scale, quantitative scoring of immunohistochemically stained tissue microarrays available in consortia. However, continued optimization, rigorous marker-specific quality control measures and standardization of tissue microarray designs, staining and scoring protocols is needed to enhance results.

Entities: Chemical

Keywords: automated scoring; breast tumours; digital pathology; immunohistochemistry; tissue microarrays

Year: 2014 PMID： 27499890 PMCID： PMC4858117 DOI： 10.1002/cjp2.3

Source DB: PubMed Journal: J Pathol Clin Res ISSN： 2056-4538

Introduction

Breast cancer is a biologically heterogeneous disease, which comprises multiple distinctive subtypes that are distinguishable by immunohistochemistry (IHC) 1, 2 or molecular analysis such as transcriptomic profiling 3, 4, 5. Clinically, IHC staining for oestrogen receptor (ER), progesterone receptor (PR) and epidermal growth factor receptor 2 (HER2) is routinely performed in most diagnostic laboratories to help select adjuvant treatment and to assess prognosis 6, 7. Research studies demonstrate that expanding this IHC panel to include markers of basal breast cancers, such as cytokeratin 5/6 (CK5/6) and epidermal growth factor receptor 1 (EGFR or HER1), can enable more detailed molecular subtyping, approximating taxonomies based on molecular profiling 1, 8, 9. Evaluating differences across breast cancer subtypes is central to etiological and clinical research. However, such studies require large sample sizes in order to include sufficient numbers of the less common subtypes, many of which are clinically important. Tissue microarrays (TMAs) can be used to assess IHC results for multiple cases in one tissue section 10, enabling standardized IHC staining and facilitating scoring. Given that visual scoring is labour intensive and suffers from imperfect inter‐rater agreement, automated quantitative image analysis has been proposed as an alternative that may offer logistical advantages with good reliability. Automated analysis of pathology images has been in use for more than 20 years 11 and has been applied extensively in recent years in the study of breast cancer with increasingly complex algorithms and improved concordance with visual scores 12, 13, 14, 15, 16, 17, 18. However, most comparisons are based on TMAs of a few hundred to a few thousand tumours constructed and stained in a single pathology laboratory. Although centralized construction and staining of TMAs is desirable to obtain comparable data 19, this is not always practical in large collaborative investigations that aggregate pathology samples from multiple studies. This article details the application of fully automated image analysis of 8267 breast cancers collated from nine studies within the Breast Cancer Association Consortium (BCAC) 20. Automated image analysis was applied to score nuclear (ER, PR), membranous (HER2, EGFR) and cytoplasmic (CK5/6) markers to determine the usefulness and pitfalls of this approach and to identify limitations that might be addressed with methodological research.

Materials and methods

Study populations

This report includes nine BCAC studies with formalin‐fixed, paraffin‐embedded tumour blocks that had been previously prepared as TMAs (supplementary material Table 1). Relevant research ethics committees approved all studies; samples were anonymized before being sent to two coordinating centres at Strangeways Research Laboratory (University of Cambridge, Cambridge, UK) and Breakthrough Pathology Core Facility (Institute of Cancer Research, London, UK) for analysis. A total of 8267 cases with information on clinico‐pathological characteristics of the tumour, obtained from clinical records or centralized review of cases, were included in the analyses (supplementary material Table 2).

TMA immunohistochemistry

Three studies (ABCS, PBCS and SEARCH) provided previously stained TMA slides of ER and PR, four studies (ABCS, HEBCS, PBCS and SEARCH) of HER2, three studies (ABCS, KBCP, PBCS) of CK5/6 and three studies (HEBCS, KBCP, PBCS) of EGFR. Studies lacking pre‐existing stained TMAs for specific stains provided unstained TMA slides for centralized staining. Staining centres and protocols are detailed in supplementary material Table 3.

Automated Ariol scanning and scoring of TMAs

All TMA slides were scanned and analysed on the Leica Ariol system (Leica Biosystems, Newcastle upon Tyne, UK) using standard procedures and predefined algorithms tuned by an image analysis expert (see details in supplementary material). A single tuned algorithm was then applied to all TMAs. For ER and PR nuclear staining, we obtained automated measures of average stain intensity and percentage of cells stained. For HER2, the system calculated the HercepTest score 21 (0, 1+, 2+, 3+). For CK5/6 and EGFR, we obtained a continuous automated score (0–300) based on a weighted sum of the percentage of positive cells in three bins of weakly, intermediate and strongly positive cells. Quality control procedures are described in the supplementary material.

Visual scoring of TMAs

Randomly selected cores from each study were re‐arrayed in ‘virtual TMAs' for visual scoring (see supplementary material). This resulted on a total of 942, 952 and 998 core images being visually scored in duplicate by two pathologists (M.E.S. and E.P.) for ER, PR and HER2, respectively. The Allred scoring system and intensity score was used for ER and PR 22. Stains for ER and PR were considered positive if the Allred score was ≥3. For HER2, the Herceptest scoring system was used for visual scoring. Positive stains for HER2 were defined in two groups as having an intensity score of 2 or 3 (HER2 2+) or 3 only (HER2 3+). TMA slides of CK5/6 from four studies (CNIO‐BCS, MCBCS, ORIGO, SBCS) and slides of EGFR from six studies (ABCS, CNIO‐BCS, MCBCS, KBCP, ORIGO, SBCS) that had been centrally stained at CRUK‐CI were visually scored using the SlidePath system (see supplementary material). Ten scorers scored a total of 5771 cores for CK5/6 and 8259 for EGFR. MES served as the reference pathologist and scored a random sample of up to 100 cores per study/centre assigned to each of the other scores to evaluate inter‐scorer agreement. CK5/6 and EGFR positive score by visual scoring was defined as >10% of positive cells. Scorers assigned each core the following quality control categories: 1) satisfactory core (invasive tumour), 2) DCIS only, 3) no tumour/few tumour cells, 4) no core and 5) unsatisfactory for other reasons.

Statistical methods

The correlation between automated continuous scores and visual ordinal scores was evaluated by the Spearman's correlation coefficient, using data from the virtual TMA. The area under the curve (AUC) of receiver operating characteristic (ROC) graphs was used to evaluate the discriminatory accuracy of the ER, PR combined‐automated scores (intensity*percentage) to distinguish between visual positive and negative scores. The automated score that optimized the sensitivity and specificity in the ROC graph was applied as the cut‐off point to define marker status for all analysed cores (not just the ones in the virtual TMA). We also evaluated an alternative method to define the cut‐off for positive and negative scores, as described by Ali et al 15. Briefly, the cut‐off under this method is determined by the distribution of automated percentage and intensity scores for all cores, ie, it does not use information on visual scores from a subset of tumours in the virtual TMAs to define a cut‐off point. The kappa statistic was used as a measure of agreement between dichotomous or semi‐quantitative scores. Sensitivity and specificity were calculated as measures of validity using the visual score as the reference; positive predictive value (PPV) and negative predictive value (NPV) were calculated as a measure of the value of automated dichotomous scores to predict visual dichotomous scores. Comparisons between automated scores and visual scores were performed at the core level for cores in the virtual TMAs. Subject‐level scores for ER, PR, HER2 were derived by selecting the maximum score of all available cores for a given subject, after having excluded cores identified as having few or no tumour cells or no cores by the pathologist. These were compared to positive/negative status in the BCAC database, based primarily on medical records, or centralized reviews by study centres. Kaplan–Meier survival plots were used to plot survival functions by subject‐level IHC scores. Associations with 10‐year breast cancer‐specific survival were assessed using a Cox proportional‐hazards model, providing estimates of hazard ratio (HR) and 95% confidence interval (95% CI). Violations of the proportional‐hazards assumption were accounted for by the T coefficient that varied as a function of log time. We used penalized‐likelihood criteria, ie, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), to compare model parsimony and fit of alternative non‐nested Cox regression models including visual versus automated scores. Models with lower values for AIC or BIC have a better balance between model parsimony and fit. All statistical analyses were conducted in Stata/MP version 12.1 (StataCorp, College Station, TX, USA).

Results

Differences in TMAs and clinico‐pathological characteristics of cases across studies

The nine studies used different TMA designs including a total of 20 263 tissue cores in 104 TMA blocks from 8267 BCAC breast cancer cases (Table 1 and supplementary material Table 1). The average age at diagnosis was 53 years. There were substantial differences in the distribution of age and clinico‐pathological characteristics across studies (supplementary material Table 2). A range of 75–77% of cores across virtual TMAs for ER, PR, HER2 were satisfactory for scoring (5–8% of which had only DCIS component), 10–13% had no tumour or few tumour cells, 3–5% had missing cores and 7–10% had unsatisfactory cores for other reasons (eg, blurred image, folded cores; see Table 2).

Table 1

Description of study populations and TMA designs used by participating studies

Study Acronym	Country	Cases	Age at diagnosis, mean (range)	TMA blocks	Cores per case	Cores per TMA	Core size (mm)	Total cores per study
ABCS	Netherlands	1000	43 (23 50)	26	1–6	12–241	0.6	3 314
CNIO‐BCS	Spain	171	60 (35 81)	3	2–2	86–148	1.0	342
HEBCS	Finland	1154	56 (22 95)	17	2–8	56–400	0.6	4 880
KBCP	Finland	392	59 (23 92)	12	3–3	96–99	1.0	1 176
MCBCS	USA	348	58 (26 87)	4	4–4	280–400	0.6	1 392
ORIGO	Netherlands	233	56 (27 88)	3	3–9	237–310	0.6	841
PBCS	Poland	1406	56 (27 75)	9	2–7	363–474	0.6	3 790
SBCS	UK	358	60 (30 92)	11	3–8	90–156	0.6	1 320
SEARCH	UK	3205	52 (24 70)	19	1–2	152–172	0.6	3 208
Totals		8267	53 (22 95)	104	1–9	12–474	0.6–1.0	20 263

Table 2

Distribution of quality control measures for tissue cores stained for ER, PR and HER2 in the virtual TMAs

Quality control category	ER		PR		HER2
Quality control category	N	%	N	%	N	%
Satisfactory Core (invasive tumour)	649	69	679	71	672	67
DCIS only	61	6	52	5	82	8
No Tumour, few tumour cells	123	13	98	10	126	13
No core	38	4	32	3	48	5
Unsatisfactory core for other reasons	71	8	91	10	70	7
Total	942		952		998

Description of study populations and TMA designs used by participating studies Distribution of quality control measures for tissue cores stained for ER, PR and HER2 in the virtual TMAs

Core‐level comparison between ER, PR, HER2 automated and visual scores in virtual TMAs

The distributions of continuous automated scores and ordinal Allred visual scores for ER and PR are shown in Figures 1 and 2, respectively. The automated and ordinal visual scores were highly correlated and there was a clear separation of the distribution of automated scores by the visual positive/negative scores (Figures 1D, 1E and 2D, 2E). There were differences in distributions of automated scores across studies that could reflect different clinico‐pathological characteristics of the tumours or staining quality (supplementary material Figures 3 and 4).

Figure 1

Figure 2

Distribution of PR continuous automated scores and PR visual ordinal scores in virtual TMAs. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the Allred score for the corresponding core by visual scoring. The red curve represents the cut‐off point for positive/negative status by the ROC method. (B) Distribution of Allred visual scores (rater 1). (C) Distribution of intensity*percent automated scores. (D) Boxplot of the intensity*percent automated score by categories of the Allred visual score. (E) Boxplot of the intensity*percent automated score by visual positive/negative status. Red lines in C–E show the positive/negative cut‐off points for the corresponding automated score.

Distribution of ER continuous automated scores and ER visual ordinal scores in virtual TMAs. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the Allred score for the corresponding core by visual scoring. The red curve represents the cut‐off point for positive/negative status by the ROC method. (B) Distribution of Allred visual scores (rater 1). (C) Distribution intensity*percent automated scores used in the ROC method. (D) Boxplot of the distribution of the intensity*percent automated score by categories of the Allred visual score. (E) Boxplot of the distribution of the intensity*percent automated score by visual positive/negative status. Red lines in C–E show the positive/negative cut‐off points for the corresponding automated score. Distribution of PR continuous automated scores and PR visual ordinal scores in virtual TMAs. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the Allred score for the corresponding core by visual scoring. The red curve represents the cut‐off point for positive/negative status by the ROC method. (B) Distribution of Allred visual scores (rater 1). (C) Distribution of intensity*percent automated scores. (D) Boxplot of the intensity*percent automated score by categories of the Allred visual score. (E) Boxplot of the intensity*percent automated score by visual positive/negative status. Red lines in C–E show the positive/negative cut‐off points for the corresponding automated score. The AUC for ER and PR showed excellent discrimination (Table 3). For dichotomous scores, there was excellent inter‐rater agreement for ER and PR and substantial agreement between automated and visual scores, which were better for ER than PR (Table 3, see supplementary material Table 4 for cross‐tabulations). The automated system had good sensitivity and specificity. The NPV was substantially lower for the automated to rater comparisons than the inter‐rater comparison (∼70% versus 95%). Use of study‐specific cut‐off points for negative versus positive scores did not substantially improve the measures of agreement (data not shown). Measures of relative performance of automated versus visual scoring were similar when we used the Ali et al 15 method to select a cut‐off point for positive and negative automated score (data not shown).

Table 3

Inter‐rater agreement and agreement between each rater and Ariol automated quantitative ER, PR scores for cores in the virtual TMA

Marker	Comparison	N	% Pos.	Continuous automated score		Dichotomous automated score
Marker	Comparison	N	% Pos.	AUC (95%CI)	Observed agreement	Kappa (95%CI)	Se (%)	Sp (%)	PPV (%)	NPV (%)
ER	Rater 1 vs rater 2	615	76.3	n/a	96.7	0.91 (0.83, 0.99)	98.3	91.8	97.5	94.4
	Ariol vs rater 1	587	75.0	0.97 (0.95, 0.98)	90.1	0.76 (0.68, 0.84)	89.5	91.8	97.0	74.6
	Ariol vs rater 2	636	76.4	0.96 (0.95, 0.98)	90.1	0.75 (0.67, 0.83)	88.9	94.0	98.0	72.3
PR	Rater 1 vs rater 2	655	67.0	n/a	96.8	0.93 (0.85, 1.00)	97.5	95.4	97.7	94.9
	Ariol vs rater 1	624	67.3	0.93 (0.91, 0.95)	83.8	0.65 (0.57, 0.73)	82.9	85.8	92.3	70.9
	Ariol vs rater 2	634	66.6	0.93 (0.91, 0.95)	84.4	0.66 (0.59, 0.74)	83.6	85.8	92.2	72.5

Raters scores are dichotomous (positive/negative), and Ariol automated scores are considered as continuous and dichotomous.

% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.

Inter‐rater agreement and agreement between each rater and Ariol automated quantitative ER, PR scores for cores in the virtual TMA Raters scores are dichotomous (positive/negative), and Ariol automated scores are considered as continuous and dichotomous. % Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value. The kappa statistics for HER2 Herceptest score showed substantial agreement for both inter‐rater and automated to visual comparisons (kappa = 0.62–0.71; Table 4). Although the agreement for the HER2 2+ dichotomous classification was substantial for both inter‐rater and rater‐automated comparisons, the agreement for HER2 3+ was only moderate for one of the raters. Sensitivity to identify HER2 3+ cores was low, both in inter‐rater and rater‐automated comparisons (Table 4). When we examined cross‐tabulations to evaluate the sources of disagreement (supplementary material Table 4), it could be seen that extreme discrepancies, ie, Ariol scores of 0 where pathologist scores were 3 were very infrequent. Of the 13 discrepant cores, five were determined as pathologist error and re‐evaluated; four were due to poor tissue or staining quality (either through folds, high level of background staining or edge artifact, small tumour fragment) and four were due to Ariol error. The kappa statistics for rater‐automated agreement changed little when pathology errors and staining errors were removed from the analysis (data not shown).

Table 4

Inter‐rater agreement and agreement between each rater and Ariol automated quantitative HER2 scores for cores in the virtual TMA

HER2 semi‐quantitative score (0/1, 2,3)	N	Observed agreement	Kappa (95% CI)
Comparisons
Rater 1 vs rater2	660	92.7	0.71 (0.65, 0.78)
Ariol vs rater 1	693	90.7	0.62 (0.56, 0.68)
Ariol vs rater 2	716	93.7	0.71 (0.65, 0.77)

% Pos.=% positive cores for reference rater

Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.

Inter‐rater agreement and agreement between each rater and Ariol automated quantitative HER2 scores for cores in the virtual TMA % Pos.=% positive cores for reference rater Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.

Subject‐level comparison for ER, PR, HER2 automated scores to positive/negative scores in BCAC database

Figure 3 shows scatter plots and distributions of automated scores for all cases (6424 cases for ER and 6385 cases for PR) by positive/negative status previously assigned by each individual study. The agreement between subject‐level automated scores and marker status was substantial to moderate, generally lower than the core‐level comparisons in the virtual TMAs (Table 5; see supplementary material Table 5 for cross‐tabulations). There were substantial differences in the measures of agreement by study (supplementary material Table 6).

Figure 3

Table 5

Agreement between Ariol automated quantitative ER, PR and HER2 scores for each subject and marker status from clinical/study records

Marker	N	% Pos.	Continuous automated score		Dichotomous automated score
Marker	N	% Pos.	AUC (95%CI)	Observed agreement	Kappa (95%CI)	Se (%)	Sp (%)	PPV (%)	NPV (%)
ER	6424	74.5	0.89 (0.89, 0.90)	84.1	0.62 (0.59, 0.64)	84.6	82.7	93.4	64.7
PR	6385	63.6	0.87 (0.86, 0.88)	80.0	0.57 (0.55, 0.60)	82.5	75.7	85.6	71.2
HER2 2+	6322	15.5	–	88.9	0.62 (0.59, 0.64)	77.2	91.0	61.3	95.6
HER2 3+	6322	15.5	–	89.2	0.43 (0.41, 0.44)	31.8	99.7	95.4	88.8

Clinical/study scores are dichotomous (positive/negative), and ER, PR Ariol scores are considered both as continuous and dichotomous.

% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.

Distribution of ER (A–C) and PR (D–F) continuous automated scores (subject level) and positive/negative status in BCAC database, including 6424 cases for ER and 6385 cases for PR from nine studies. (A) Scatter plot of the intensity and percentage automated scores colour coded according to the BCAC ER status (red for positive and blue for negative). The red curve represents the cut‐off point for positive/negative status by the ROC method. The smaller inserted plots show ER‐positive and ER‐negative cases separately (B) Distribution of intensity*percent automated scores. (C) Boxplot of the intensity*percent automated score by BCAC ER status. Red lines show the positive/negative cut‐off points for the corresponding automated score. Figures D–F show similar plots for PR. Agreement between Ariol automated quantitative ER, PR and HER2 scores for each subject and marker status from clinical/study records Clinical/study scores are dichotomous (positive/negative), and ER, PR Ariol scores are considered both as continuous and dichotomous. % Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value. To evaluate the impact of core quality on measures of agreement, we used automated estimates of the number of tumour nuclei to identify cores with no or few tumour cells. Measures of agreement improved only slightly after these exclusions; however, this resulted in a substantial reduction in the number of subjects with valid scores (data not shown). We, therefore, decided not to make these exclusions in the remaining analyses.

Survival analysis for ER, PR, HER2 automated scores compared to positive/negative scores from individual studies

Kaplan–Meier survival curves drawn from the full subject‐level dataset demonstrated that the automated analysis generated the expected survival associations for ER, PR and HER2 (Figures 4, 5, 6). While estimates of HR for automated data showed weaker associations with survival for dichotomous scores, automated scores allowed classification of cases into meaningful quantitative levels of ER and PR expression. Quintiles of the automated scores resulted in a refinement of the associations with survival (Figures 4 and 5). However, models with automated scores had a worse fit than models with dichotomous visual scores (see AIC/BIC values in Figures 4 and 5).

Figure 4

Figure 5

Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6115 subjects and 998 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), (B) dichotomized (positive/negative) automated scores and (C) automated scores classified in quintiles.

Figure 6

Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6039 subjects and 997 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), and (B) semi‐quantitative automated scores.

Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by ER, based on 6135 subjects and 981 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), (B) dichotomized (positive/negative) automated scores and (C) automated scores classified in quintiles. Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6115 subjects and 998 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), (B) dichotomized (positive/negative) automated scores and (C) automated scores classified in quintiles. Kaplan–Meier survival curves and hazard ratios (HR) for 10‐year breast cancer survival by PR, based on 6039 subjects and 997 breast cancer specific deaths, using (A) pathologists data from study sites (positive/negative), and (B) semi‐quantitative automated scores. The HRs for women in the lowest quintiles for ER and PR were similar to those for receptor negative cases according to the BCAC database (representing 25.3% of the cases for ER 36.2% of cases for PR). The percentage of cores classified as negative in the BCAC database included in each of the quintiles for ER and PR is shown in supplementary material Table 8. Automated scores for HER2 also allowed estimation of HR by HER2 semi‐quantitative scores, showing increasing hazard for increasing scores (Figure 6). However, as for ER and PR, the model fit was worse for automated that visual scores (see AIC/BIC values in Figure 6)

Comparison of automated and visual scores for CK5/6 and EGFR

An initial analysis for CK5/6 and EGFR in the entire TMA dataset resulted in very poor performance of automated scoring compared with visual scores by rater 1 or rater 2 (data not shown). A subsequent re‐analysis was performed only in the SEARCH study to demonstrate if limiting the tuning and analysis to a single study helped. Although this resulted in a marked improvement, the PPV was still poor (49.2% for CK5/6 and 30.0% for EGFR) reflecting a large number of false positives (Table 6). Performance was better for ER‐negative than ER‐positive tumours, the former including a higher percentage of CK5/6 and EGFR‐positive tumours. Examination of discordant cores showed that the disagreements were primarily related to false positives due to scoring of normal cells by Ariol. We, therefore, scored visually all cores that had not been previously scored by individual studies (ie, 5771 cores stained for CK5/6 and 8259 cores stained for EGFR). The distribution of quality control scores for these TMAs was similar to those seen for ER, PR and HER2 (supplementary material Table 9). Examination of inter‐rater agreement on a subset of 357 CK5/6 cores and 760 EGFR cores scored visually by a reference pathologist for QC showed a better agreement than the automated versus visual agreement seen in the SEARCH study; however, the PPV was also relatively low (Table 6). Evaluation of discordant pairs revealed that disagreements between visual scores were primarily due to disagreements between pathologists in identifying whether immunostained cells were normal cells versus cancer cells.

Table 6

Inter‐rater agreement and agreement between Ariol automated CK56 and EGFR scores (dichotomized using the ROC method) for cores in TMAs from participating studies

Marker	Comparison	N	% Pos.	Observed Agreement	Kappa (95%CI)	Se (%)	Sp (%)	PPV (%)	NPV (%)
CK5/6	Inter‐rater agreementa	357	11.9	91.6	0.74 (0.66, 0.83)	96.6	90.6	67.1	99.3
	Ariol vs raterb ‐ all	1897	10.4	89.4	0.49 (0.44, 0.53)	61.6	92.6	49.2	95.4
	Ariol vs raterb – ER+	1107	6.4	89.1	0.41 (0.35, 0.46)	71.8	90.3	33.6	97.9
	Ariol vs raterb – ER−	360	21.1	86.9	0.57 (0.47, 0.67)	56.6	95.1	75.4	89.1
EGFR	Inter‐rater agreementc	760	10.5	94.5	0.73 (0.66, 0.81)	90.7	94.9	66.0	98.9
	Ariol vs raterb	1914	9.8	84.1	0.44 (0.40, 0.48)	87.7	83.7	36.9	98.4
	Ariol vs raterb – ER+	1041	1.3	86.2	0.14 (0.11, 0.17)	100.0	86.0	8.9	100.0
	Ariol vs raterb – ER−	342	39.5	82.2	0.63 (0.53, 0.74)	83.7	81.2	74.3	88.4

% Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value.

Includes data from CNIO‐BCS, MCBCS, ORIGO, SBCS.

Includes data from SEARCH re‐analysis.

Includes data from ABCS, CNIO‐BCS, KBCP, MCBCS, ORIGO, SBCS.

Inter‐rater agreement and agreement between Ariol automated CK56 and EGFR scores (dichotomized using the ROC method) for cores in TMAs from participating studies % Pos., % positive cores for reference rater; Se, sensitivity; Sp, Specificity; PPV, positive predictive value; NPV, negative predictive value. Includes data from CNIO‐BCS, MCBCS, ORIGO, SBCS. Includes data from SEARCH re‐analysis. Includes data from ABCS, CNIO‐BCS, KBCP, MCBCS, ORIGO, SBCS.

Discussion

Automated image analysis of TMAs using many different systems has been shown to perform well for multiple markers 12, 13, 14, 15, 16, 17, 18, 23, 24. However, most studies have been based on relatively small comparisons of TMAs from one or few centres. Our report is a large‐scale evaluation of the performance of automated image analysis in the scoring of TMAs from different source institutions across different countries in a consortium of breast cancer studies. The core‐level measures of agreement between automated and visual scores for the virtual TMAs in our report are most comparable to those in previous reports as they were based on comparisons of the same exact images. For ER, PR and HER2, they were lower than previously reported by our group using the Ariol system 14, or an automated scoring algorithm adapted from astronomy 15, possibly reflecting the greater variability in tissue preparation related to multiple specimen sources. Bolton et al 14 used TMAs stained by ER, PR and HER2 from PBCS, and Ali et al 15 used ER and HER2 stained TMAs from SEARCH. While these TMAs were also included in our study, the automated image analysis of the TMAs was done independently using different methods. Patient characteristics, modes of tumour detection, pathologic features and tissue handling in this report were likely highly variable because of the inclusion of multiple studies, but representative of ‘real world' population‐based samples collected over many years in international collaborations. As expected, the agreement for subject‐level comparisons was lower than for core‐level comparisons since the latter are comparing scores based on different pieces of the tumour tissue, and the visual scores came from multiple sources (mainly clinical records and central review of cases by individual studies). Arguably, however, these comparisons are most relevant for answering scientific questions. A key advantage of automated image analysis is that it does not use pathologists's time and can be run continuously, including overnight. The analysis time is dependent on the type of stain, size of cores and number of cores per TMA. For instance, the time to score a TMA with 183 cores of 0.6 mm diameter can range from 25 min for a simple nuclear analysis (ER) to 70 min for a cytoplasmic analysis (CK5/6). The entire dataset was analysed over the course of a week using four batch processors. This is in comparison to approximately 35–40 min for a simple manual ER score of a similar TMA by a skilled pathologist using computer‐assisted scoring methods. A limitation of the automated approach is that 20–25% of cores in TMAs are unsatisfactory for scoring, but imaging systems do not perform well in triaging such cores. QC assessment of each core by visual inspection to identify unsatisfactory cores would improve the performance of the automated scoring. Similarly, study‐specific training of algorithms could also improve performance. Although during TMA production, tissue cores are targeted to tumour areas, contamination by normal elements is unavoidable. Identification of tumour cells by semi‐automated systems including manual demarcation of tumour areas prior to automate scoring could improve the performance of automated systems. However, these additional procedures are time consuming, and the added efforts to improve scoring diminish the relative value of automation. Automated pattern‐recognition software to identify tumour areas such as Definiens 24 are promising but it is still difficult to get accurate identification of breast tumour cells, particularly in heterogeneous sets of tissue samples such as those derived from international consortia. The performance of automated scoring for PR was worse than ER stains, possibly partly explained by a higher regional heterogeneity in positive staining for PR than ER 16, 25. HER2 scoring performed using FDA‐approved commercial algorithms in brightfield 26, 27, 28 and in fluorescence 29 has demonstrated substantial agreement with visual assessment in studies of varying size and design. We observed a substantial agreement for HER2 semi‐quantitative scores for inter‐rater or automated‐rater comparisons, which was similar to that demonstrated previously on the Ariol system in our hands 14. Dichotomous classification of automated scores for ER, PR and HER2 achieved less separation of prognostic groups by marker expression than the clinical/study scores. Because these three markers are routinely determined in most clinical settings, the main advantage of the automated scores was providing quantitative measures of expression that allowed refinement in the groups of patients with different prognosis. Although semi‐quantitative scores can also be obtained from clinical records, the reporting is not homogeneous and this information is not available in many epidemiological studies. The performance of automated analysis of the cytoplasmic CK5/6 and membranous EGFR stains was much worse than for the time‐tested nuclear ER/PR and membranous HER2 antibodies, resulting in many false positive results. Automated scoring of cytoplasm stains such CK5/6 is particularly challenging since most systems use colour de‐convolution to remove the nuclear counterstain from the brown stain in order to identify nuclei and determine the cell type. This method reduces resolution so the accuracy of identification decreases. The poor performance for CK5/6 and EGFR was also an issue for the inter‐rater comparison, although to a lesser extent. Examination of discordant scores revealed that both the inter‐rater and automated‐visual discordances were often due to scoring immunopositive normal cells. Automated image analyses for CK5/6 and EGFR may provide useful triage; negative results may be considered final, whereas positive results require visual confirmation. This would potentially reduce scoring workloads by about 75%, and could be further refined by limiting visual review to ER‐negative or triple negative (ER−/PR−/HER2−) cancers expressing basal markers. However, image management could present challenges for targeted visual reviews. In conclusion, using automated image analysis of TMAs stained by ER, PR and HER2 can be a useful tool to obtain quantitative scores for these markers in large collaborative studies including heterogeneous TMAs. However, automated scoring does not result in an improved performance of survival models, compared to visual scores. Automated scoring of CK5/6 and EGFR may permit triage of negative cores but positive results require visual review. Efforts to improve the performance of automated analysis should focus on standardization of specimen handling, TMA construction 30, 31 and use of centralized optimized IHC‐staining protocols. Improved standardization and optimization of key steps in these procedures combined with technical advances in automated analysis of IHC stains would facilitate large population‐based studies of breast cancer.

Author contributions

WJH, FMB, EP, PDP, MES, MG‐C conceived and carried out the study; WJH, LM, PG, NJ, L‐AMcD, JM, FD carried out the centralized laboratory work; EP, MES, EJS, SP, CHMvanD, LJ, RS, DV performed visual scoring; PC performed data management; MG‐C, MNB analysed data; CC, AB, JS, JW, HN, RF, CB, PH, HRA, S‐JD, JF, JL, LB, AM, VK, V‐MK, AC, IWB, SSC, MWR, FJC, JEO, PD, WEM, CMS, AH, JB, JIAP, PM, MKB, DFE, MKS contributed to data collection and/or data management. All authors were involved in writing the paper and gave final approval of the submitted and published versions. The following supplementary material may be found online. Table S1. Description of study populations included in the analyses. Table S2. Distribution of clinico‐pathological characteristics by study, for 8267 BCAC breast cancer cases included in the analyses. Table S3. Staining protocols used by different studies for ER, PR, HER2, CK56 and EGFR Table S4. Cross‐classification of visual (rater 1 and rater 2) and Ariol automated scores for ER, PR and HER2 stains in Virtual TMA. Table S5. Cross‐classification between Ariol automated quantitative ER, PR and HER2 scores for each subject and marker status from clinical/study records. Clinical/study scores are dichotomous (positive/negative), and ER, PR Ariol scores are considered both as continuous and dichotomous. Table S6. Agreement between Ariol automated quantitative ER, PR and HER2 scores for each subject and marker status from clinical/study records, by study. Table S7. Inter‐rater agreement of CK56 and EGFR scoring by study. Table S8. Cross‐classification of subjects by ER and PR status (positive/negative) according to BCAC case data and quintiles of the combined automated Ariol score. Table S9. Distribution of quality control measures for tissue cores stained for CK56 and EGFR TMAs from participating studies. Click here for additional data file. Figure S1. Representative images of ER staining demonstrating the level of variation in DAB and Haematoxylin staining across the sample set. (A) SEARCH study; (B) ABCS study (BOOG_E TMA); (C) ABCS study (BOOG_J TMA); (D) KBCP study. Click here for additional data file. Figure S2. Screengrab images from the Ariol system visualizing the algorithm training for representative images detailed in Supplementary material Figure 1. (A1, B1, C1, D1) DAB colour recognition (red) and haematoxylin colour recognition (green) demonstrating the effect of cytoplasmic ER staining and dark heamatoxylin staining on colour recognition. (A2, B2, C2, D2) Nuclear segmentation, based on the colour recognition. Yellow dots delineate ER‐positive nuclei. Pink dots delineate ER‐negative tumour cells according to the tuned algorithm A) SEARCH study; (B) ABCS study (BOOG_E TMA); (C) ABCS study (BOOG_J TMA); (D) KBCP study. Click here for additional data file. Figure S3. Distribution of Ariol automated intensity (A), percentage (B), and combined (C) scores for ER, by study. Figure S4. Distribution of Ariol automated intensity (A), percentage (B), and combined (C) scores for ER, by study. Click here for additional data file. Supporting Information Click here for additional data file.

30 in total

1. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival.

Authors: Andrew H Beck; Ankur R Sangoi; Samuel Leung; Robert J Marinelli; Torsten O Nielsen; Marc J van de Vijver; Robert B West; Matt van de Rijn; Daphne Koller
Journal: Sci Transl Med Date: 2011-11-09 Impact factor: 17.956

Review 2. The manufacture and assessment of tissue microarrays: suggestions and criteria for analysis, with breast cancer as an example.

Authors: Sarah E Pinder; John P Brown; Cheryl Gillett; Colin A Purdie; Valerie Speirs; Alastair M Thompson; Abeer M Shaaban
Journal: J Clin Pathol Date: 2012-10-19 Impact factor: 3.411

3. Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer.

Authors: J M Harvey; G M Clark; C K Osborne; D C Allred
Journal: J Clin Oncol Date: 1999-05 Impact factor: 44.544

4. Molecular portraits of human breast tumours.

Authors: C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal: Nature Date: 2000-08-17 Impact factor: 49.962

5. Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma.

Authors: Torsten O Nielsen; Forrest D Hsu; Kristin Jensen; Maggie Cheang; Gamze Karaca; Zhiyuan Hu; Tina Hernandez-Boussard; Chad Livasy; Dave Cowan; Lynn Dressler; Lars A Akslen; Joseph Ragaz; Allen M Gown; C Blake Gilks; Matt van de Rijn; Charles M Perou
Journal: Clin Cancer Res Date: 2004-08-15 Impact factor: 12.531

6. Assessment of automated image analysis of breast cancer tissue microarrays for epidemiologic studies.

Authors: Kelly L Bolton; Montserrat Garcia-Closas; Ruth M Pfeiffer; Máire A Duggan; William J Howat; Stephen M Hewitt; Xiaohong R Yang; Robert Cornelison; Sarah L Anzick; Paul Meltzer; Sean Davis; Petra Lenz; Jonine D Figueroa; Paul D P Pharoah; Mark E Sherman
Journal: Cancer Epidemiol Biomarkers Prev Date: 2010-03-23 Impact factor: 4.254

7. Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis.

Authors: Gulisa Turashvili; Jan Bouchal; Karl Baumforth; Wenbin Wei; Marta Dziechciarkova; Jiri Ehrmann; Jiri Klein; Eduard Fridman; Jozef Skarda; Josef Srovnal; Marian Hajduch; Paul Murray; Zdenek Kolar
Journal: BMC Cancer Date: 2007-03-27 Impact factor: 4.430

Review 8. Guidelines and considerations for conducting experiments using tissue microarrays.

Authors: Mohammad Ilyas; Heike Grabsch; Ian O Ellis; Chris Womack; Robert Brown; Dan Berney; Dean Fennell; Manuel Salto-Tellez; Martin Jenkins; Goran Landberg; Richard Byers; Darren Treanor; David Harrison; Andrew R Green; Graham Ball; Peter Hamilton
Journal: Histopathology Date: 2013-04-12 Impact factor: 5.087

9. PREDICT Plus: development and validation of a prognostic model for early breast cancer that includes HER2.

Authors: G C Wishart; C D Bajdik; E Dicks; E Provenzano; M K Schmidt; M Sherman; D C Greenberg; A R Green; K A Gelmon; V-M Kosma; J E Olson; M W Beckmann; R Winqvist; S S Cross; G Severi; D Huntsman; K Pylkäs; I Ellis; T O Nielsen; G Giles; C Blomqvist; P A Fasching; F J Couch; E Rakha; W D Foulkes; F M Blows; L R Bégin; L J van't Veer; M Southey; H Nevanlinna; A Mannermaa; A Cox; M Cheang; L Baglietto; C Caldas; M Garcia-Closas; P D P Pharoah
Journal: Br J Cancer Date: 2012-07-31 Impact factor: 7.640

10. Astronomical algorithms for automated analysis of tissue protein expression in breast cancer.

Authors: H R Ali; M Irwin; L Morris; S-J Dawson; F M Blows; E Provenzano; B Mahler-Araujo; P D Pharoah; N A Walton; J D Brenton; C Caldas
Journal: Br J Cancer Date: 2013-01-17 Impact factor: 7.640

16 in total

1. Targeting Tissue Factor for Immunotherapy of Triple-Negative Breast Cancer Using a Second-Generation ICON.

Authors: Zhiwei Hu; Rulong Shen; Amanda Campbell; Elizabeth McMichael; Lianbo Yu; Bhuvaneswari Ramaswamy; Cheryl A London; Tian Xu; William E Carson
Journal: Cancer Immunol Res Date: 2018-04-05 Impact factor: 11.151

2. Performance of Three-Biomarker Immunohistochemistry for Intrinsic Breast Cancer Subtyping in the AMBER Consortium.

Authors: Emma H Allott; Stephanie M Cohen; Joseph Geradts; Xuezheng Sun; Thaer Khoury; Wiam Bshara; Gary R Zirpoli; C Ryan Miller; Helena Hwang; Leigh B Thorne; Siobhan O'Connor; Chiu-Kit Tse; Mary B Bell; Zhiyuan Hu; Yan Li; Erin L Kirk; Traci N Bethea; Charles M Perou; Julie R Palmer; Christine B Ambrosone; Andrew F Olshan; Melissa A Troester
Journal: Cancer Epidemiol Biomarkers Prev Date: 2015-12-28 Impact factor: 4.254

3. Immunohistochemistry scoring of breast tumor tissue microarrays: A comparison study across three software applications.

Authors: Gabrielle M Baker; Vanessa C Bret-Mounet; Tengteng Wang; Mitko Veta; Hanqiao Zheng; Laura C Collins; A Heather Eliassen; Rulla M Tamimi; Yujing J Heng
Journal: J Pathol Inform Date: 2022-06-28

4. Prognostic value of automated KI67 scoring in breast cancer: a centralised evaluation of 8088 patients from 10 study groups.

Authors: Mustapha Abubakar; Nick Orr; Frances Daley; Penny Coulson; H Raza Ali; Fiona Blows; Javier Benitez; Roger Milne; Herman Brenner; Christa Stegmaier; Arto Mannermaa; Jenny Chang-Claude; Anja Rudolph; Peter Sinn; Fergus J Couch; Peter Devilee; Rob A E M Tollenaar; Caroline Seynaeve; Jonine Figueroa; Mark E Sherman; Jolanta Lissowska; Stephen Hewitt; Diana Eccles; Maartje J Hooning; Antoinette Hollestelle; John W M Martens; Carolien H M van Deurzen; Manjeet K Bolla; Qin Wang; Michael Jones; Minouk Schoemaker; Jelle Wesseling; Flora E van Leeuwen; Laura Van 't Veer; Douglas Easton; Anthony J Swerdlow; Mitch Dowsett; Paul D Pharoah; Marjanka K Schmidt; Montserrat Garcia-Closas
Journal: Breast Cancer Res Date: 2016-10-18 Impact factor: 6.466

5. Detection of Human Cytomegalovirus Proteins in Paraffin-Embedded Breast Cancer Tissue Specimens-A Novel, Automated Immunohistochemical Staining Protocol.

Authors: Joel Touma; Yan Liu; Afsar Rahbar; Mattia Russel Pantalone; Nerea Martin Almazan; Katja Vetvik; Cecilia Söderberg-Nauclér; Jürgen Geisler; Torill Sauer
Journal: Microorganisms Date: 2021-05-13

6. Quantitative Image Analysis for Tissue Biomarker Use: A White Paper From the Digital Pathology Association.

Authors: Haydee Lara; Zaibo Li; Esther Abels; Famke Aeffner; Marilyn M Bui; Ehab A ElGabry; Cleopatra Kozlowski; Michael C Montalto; Anil V Parwani; Mark D Zarella; Douglas Bowman; David Rimm; Liron Pantanowitz
Journal: Appl Immunohistochem Mol Morphol Date: 2021-08-01

7. Reliability of a computational platform as a surrogate for manually interpreted immunohistochemical markers in breast tumor tissue microarrays.

Authors: Michelle R Roberts; Gabrielle M Baker; Yujing J Heng; Michael E Pyle; Kristina Astone; Bernard A Rosner; Laura C Collins; A Heather Eliassen; Rulla M Tamimi
Journal: Cancer Epidemiol Date: 2021-08-02 Impact factor: 2.890

8. High-throughput automated scoring of Ki67 in breast cancer tissue microarrays from the Breast Cancer Association Consortium.

Authors: Mustapha Abubakar; William J Howat; Frances Daley; Lila Zabaglo; Leigh-Anne McDuffus; Fiona Blows; Penny Coulson; H Raza Ali; Javier Benitez; Roger Milne; Herman Brenner; Christa Stegmaier; Arto Mannermaa; Jenny Chang-Claude; Anja Rudolph; Peter Sinn; Fergus J Couch; Rob A E M Tollenaar; Peter Devilee; Jonine Figueroa; Mark E Sherman; Jolanta Lissowska; Stephen Hewitt; Diana Eccles; Maartje J Hooning; Antoinette Hollestelle; John Wm Martens; Carolien Hm van Deurzen; Manjeet K Bolla; Qin Wang; Michael Jones; Minouk Schoemaker; Annegien Broeks; Flora E van Leeuwen; Laura Van't Veer; Anthony J Swerdlow; Nick Orr; Mitch Dowsett; Douglas Easton; Marjanka K Schmidt; Paul D Pharoah; Montserrat Garcia-Closas
Journal: J Pathol Clin Res Date: 2016-04-06

9. Crowdsourcing for translational research: analysis of biomarker expression using cancer microarrays.

Authors: Jonathan Lawson; Rupesh J Robinson-Vyas; Janette P McQuillan; Andy Paterson; Sarah Christie; Matthew Kidza-Griffiths; Leigh-Anne McDuffus; Karwan A Moutasim; Emily C Shaw; Anne E Kiltie; William J Howat; Andrew M Hanby; Gareth J Thomas; Peter Smittenaar
Journal: Br J Cancer Date: 2016-12-13 Impact factor: 7.640

10. Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer.

Authors: Francisco J Candido Dos Reis; Stuart Lynn; H Raza Ali; Diana Eccles; Andrew Hanby; Elena Provenzano; Carlos Caldas; William J Howat; Leigh-Anne McDuffus; Bin Liu; Frances Daley; Penny Coulson; Rupesh J Vyas; Leslie M Harris; Joanna M Owens; Amy F M Carton; Janette P McQuillan; Andy M Paterson; Zohra Hirji; Sarah K Christie; Amber R Holmes; Marjanka K Schmidt; Montserrat Garcia-Closas; Douglas F Easton; Manjeet K Bolla; Qin Wang; Javier Benitez; Roger L Milne; Arto Mannermaa; Fergus Couch; Peter Devilee; Robert A E M Tollenaar; Caroline Seynaeve; Angela Cox; Simon S Cross; Fiona M Blows; Joyce Sanders; Renate de Groot; Jonine Figueroa; Mark Sherman; Maartje Hooning; Hermann Brenner; Bernd Holleczek; Christa Stegmaier; Chris Lintott; Paul D P Pharoah
Journal: EBioMedicine Date: 2015-05-09 Impact factor: 8.143