Literature DB >> 34218258

Interobserver variability in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple-negative invasive breast carcinoma influences the association with pathological complete response: the IVITA study.

Serdar Altinay¹, Laurent Arnould², Maschenka Balkenhol³, Glenn Broeckx⁴, Octavio Burguès⁵, Cecile Colpaert⁶, Franceska Dedeurwaerdere⁷, Benjamin Dessauvagie^8,9, Valérie Duwel¹⁰, Giuseppe Floris^11,12, Stephen Fox¹³, Clara Gerosa¹⁴, Delfyne Hastir¹⁵, Shabnam Jaffer¹⁶, Eline Kurpershoek¹⁷, Magali Lacroix-Triki¹⁸, Andoni Laka¹⁹, Kathleen Lambein²⁰, Gaëtan Marie MacGrogan²¹, Caterina Marchiò^22,23, Maria-Dolores Martin Martinez²⁴, Sharon Nofech-Mozes²⁵, Dieter Peeters^26,27, Alberto Ravarino¹⁴, Emily Reisenbichler²⁸, Erika Resetkova²⁹, Souzan Sanati³⁰, Anne-Marie Schelfhout³¹, Vera Schelfhout²⁶, Abeer Shaaban³², Renata Sinke¹⁷, Claudia M Stanciu-Pop³³, Carolien H M van Deurzen³⁴, Koen K Van de Vijver³⁵, Anne-Sophie Van Rompuy¹¹, Anne Vincent-Salomon³⁶, Hannah Y Wen³⁷, Serena Wong²⁸, Mieke R Van Bockstal³⁸, Aline François³⁹, Caroline Bouzin⁴⁰, Christine Galant^39,41.

Abstract

High stromal tumor-infiltrating lymphocytes (sTILs) in triple-negative breast cancer (TNBC) are associated with pathological complete response (pCR) after neoadjuvant chemotherapy (NAC). Histopathological assessment of sTILs in TNBC biopsies is characterized by substantial interobserver variability, but it is unknown whether this affects its association with pCR. Here, we aimed to investigate the degree of interobserver variability in an international study, and its impact on the relationship between sTILs and pCR. Forty pathologists assessed sTILs as a percentage in digitalized biopsy slides, originating from 41 TNBC patients who were treated with NAC followed by surgery. Pathological response was quantified by the MD Anderson Residual Cancer Burden (RCB) score. Intraclass correlation coefficients (ICCs) were calculated per pathologist duo and Bland-Altman plots were constructed. The relation between sTILs and pCR or RCB class was investigated. The ICCs ranged from -0.376 to 0.947 (mean: 0.659), indicating substantial interobserver variability. Nevertheless, high sTILs scores were significantly associated with pCR for 36 participants (90%), and with RCB class for eight participants (20%). Post hoc sTILs cutoffs at 20% and 40% resulted in variable associations with pCR. The sTILs in TNBC with RCB-II and RCB-III were intermediate to those of RCB-0 and RCB-I, with lowest sTILs observed in RCB-I. However, the limited number of RCB-I cases precludes any definite conclusions due to lack of power, and this observation therefore requires further investigation. In conclusion, sTILs are a robust marker for pCR at the group level. However, if sTILs are to be used to guide the NAC scheme for individual patients, the observed interobserver variability might substantially affect the chance of obtaining a pCR. Future studies should determine the 'ideal' sTILs threshold, and attempt to fine-tune the patient selection for sTILs-based de-escalation of NAC regimens. At present, there is insufficient evidence for robust and reproducible sTILs-guided therapeutic decisions.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34218258 PMCID： PMC8595512 DOI： 10.1038/s41379-021-00865-z

Source DB: PubMed Journal: Mod Pathol ISSN： 0893-3952 Impact factor: 7.842

INTRODUCTION

Triple-negative breast cancers (TNBCs) lack the expression of estrogen receptor (ER), progesterone receptor (PR) and HER2 [1], and are associated with a higher risk of regional recurrence, lower distant recurrence-free survival and lower overall survival in comparison with other molecular subtypes [2,3]. The majority of TNBCs are invasive carcinomas of no special type (NST), and the most frequent special type TNBC is metaplastic carcinoma [4]. TNBC patients who present with clinically node-positive and/or at least T1c disease are generally treated with anthracycline- and taxane-based neoadjuvant chemotherapy (NAC), with optional addition of carboplatin, according to the ASCO guideline [5]. Pathological complete response (pCR) after NAC guides subsequent clinical decision-making, and is defined as the absence of residual invasive carcinoma in the breast and lymph nodes [5]. Achieving a pCR is an independent predictor of better disease-free survival in TNBC [6,7]. Many classification systems were developed to objectify the post-NAC therapeutic response. The well validated MD Anderson Residual Cancer Burden (RCB) applies an equation which contains information on both the cellularity and the size of residual carcinoma in the breast and lymph nodes [7]. It is considered the gold standard for assessment of pathological response in NAC clinical trials, shows excellent interobserver agreement, and is characterized by a highly reproducible long-term prognostic significance [8,9]. Two randomized clinical trials showed that high levels of stromal tumor-infiltrating lymphocytes (sTILs) are predictive for achieving a pCR in TNBC [10,11]. This was confirmed in retrospective studies beyond trial-setting [12-14]. High TILs levels also provide prognostic information, as they are associated with better distant recurrence-free survival in TNBC patients treated with and without NAC [10,15]. The International Immuno-oncology Biomarkers Working Group developed a method to quantify the amount of sTILs in the peritumoral stroma of solid tumors such as breast cancer [16,17]. This method evaluates sTILs for the stromal compartment within the borders of the invasive tumor, and the area of stromal tissue serves as the denominator to determine the percentage of sTILs [17]. Small-scale studies on interobserver variability among two to four pathologists reported variable concordance rates, ranging from substantial agreement to a relatively high level of imprecision [18-20]. Larger studies, wherein nine to thirty-two pathologists evaluated sTILs in a predefined set of breast cancers, consistently reported acceptable and moderate agreement [21-23]. However, none of these studies investigated the impact of interobserver variability on the predictive value of sTILs for achieving a pCR. We therefore aimed to investigate the interobserver agreement and association of individual pathologists’ sTILs scores with the therapeutic response, defined as either pCR or RCB class. We organized a large-scale international study on ‘interobserver variability in TILs assessment’ (IVITA), by using a consecutive real-life set of TNBC biopsies outside the randomized clinical trial setting.

MATERIALS & METHODS

Tissue samples & clinic-pathological data

Archived hematoxylin and eosin (HE) stained slides of the pre-NAC biopsy and post-NAC resection specimen were collected for a consecutive series of TNBC patients at the Cliniques universitaires Saint-Luc (Brussels, Belgium). All patients included in this study were diagnosed with TNBC and underwent surgery between 1 January 2015 and 30 September 2020. Hormone receptor status and HER2 status were defined according to the ASCO/CAP guidelines [24,25]. The standard NAC scheme included anthracyclines and cyclophosphamide, followed by paclitaxel. Patients with poor response after anthracyclines and cyclophosphamide also received carboplatin. Information on patient age at diagnosis, type of surgery, time interval between the biopsy and surgery, post-NAC nodal status, macroscopic and microscopic tumor bed size, hormone receptor status and HER2 status was retrieved from the electronic histopathological reports (LIS DaVinci, MIPS, Ghent, Belgium). The institutional ethics committee approved this study (file number: RETRO-TNBC-15-2019/03JUL/297).

Histopathological central review

All biopsies were immediately fixed in 10% neutral-buffered formalin for 6-72 hours. Macroscopic examination of post-NAC lumpectomy and mastectomy specimens was performed according to the MD Anderson residual cancer burden (RCB) protocol [7]. All resection specimens were sliced at 5 mm intervals and fixed in 10% neutral-buffered formalin for 6-72 hours, in line with the ASCO/CAP guidelines [24]. Histopathological assessment of the biopsies and the resection specimens was performed as previously described [12], and comprised the Nottingham grade, and presence of a ductal carcinoma in situ (DCIS) component and unequivocal lympho-vascular invasion. The H&E stained slides of all resection specimens were reviewed by two pathologists (AF and MRVB). Archived immunohistochemical stains for p63 and smooth muscle myosin heavy chain (SMMHC) were available to discern residual DCIS from invasive carcinoma. The therapeutic response after neoadjuvant chemotherapy was objectified by using an online calculator for the RCB score (http://www3.mdanderson.org/app/medcalc/index.cfm?pagename=jsconvert3) [7]. For each patient, the RCB score and corresponding RCB class were noted. An RCB score of zero (RCB-0) was considered as a pCR.

sTILs assessment

The extent of the stromal inflammatory infiltrate in the pre-NAC biopsy was assessed according to the standardized method as described in detail by the International Immuno-oncology Biomarkers Working Group [16]. The number of sTILs was noted as the percentage of mononuclear inflammatory cells related to the total peri- and intra-tumor stromal surface area, which served as a denominator [16]. The number of fields was not specified: participants had to evaluate the entire area occupied by invasive carcinoma. No training set was provided, but all participants were provided with the appropriate literature [16,17,21], as well as the tutorial of the website www.tilsinbreastcancer.org, which served as a guideline during the sTILs assessment. A similar method has been applied before [21]. All participants evaluated the same set of digitalized pre-NAC core needle biopsy slides. For each patient, one biopsy slide was digitalized by an automated slide scanner with Z-stack feature (NanoZoomer 2.0-RS, Hamamatsu Photonics K.K., Hamamatsu City, Japan). Evaluation of the post-NAC resection specimen was not requested.

Participating pathologists

Participating pathologists with a special interest in breast disease had to actively work as reporting pathologist, either in academic or non-academic laboratories. As an inclusion criterion, all participants had to assess a minimum of 50 primary (oncologic) breast cancer resection specimens per year, in line with the EUSOMA-criteria for dedicated breast pathologists [26]. Most participants previously participated in the digital DCISion study [27]. The following data on the observers were collected via a questionnaire with twenty questions: number of years in practice (including training), the work environment (academic or non-academic laboratory), the daily work method (conventional light microscopy or digital pathology), and the weekly breast pathology work load expressed as a percentage of a fulltime week schedule. Information on the habits of evaluating and reporting sTILs was also collected. All participants had digital access to the 41 scanned H&E slides, which were available on the password-protected Cytomine platform [28]. The identity of each participant was anonymized as P1, P2, P3, etc by one pathologist (MRVB), who collected all participants’ sTILs scores.

Statistical analysis

The questionnaire results were analyzed, and pie charts and radar diagrams were constructed in Excel (Excel Windows 10, Microsoft Corporation, Redmond, WA, USA). Statistical analyses were performed with IBM SPSS statistics 26.0 (IBM Chicago, IL, USA). Tests for normality were performed with the Shapiro-Wilk test, which showed that the sTILs scores of each participant were not normally distributed (p<0.05; Supplementary Table 1). Therefore, the median (instead of the average) sTILs value was selected for each case to serve as the ‘gold standard’, based on the assessment of all participants. This ‘median’ (nonexistent) pathologist was designated ‘Px’, and a histogram and stem-and-leaf plot were constructed to illustrate the non-normal distribution. Associations between the median Px sTILs scores and different histopathological characteristics were investigated by applying Mann-Whitney U and Kruskal-Wallis tests, depending on the number of categories of the characteristic of interest. Mann-Whitney U tests and Kruskal-Wallis tests were also performed to investigate associations between the individual sTILs scores (as a continuous variable) and either pCR or RCB class, respectively. Box-and-whisker plots visualized these associations. Next, all sTILs scores were dichotomized post hoc according to seven different thresholds (5, 10, 20, 30, 40, 50 and 60%), which included previously reported cut-offs for dichotomization [10,16]. Low TILs were defined as sTILs lower than or equaling (≤) each threshold. High TILs were defined as sTILs greater than (>) each threshold. Chi-square tests were performed to investigate associations between these sTILs estimates and pCR, and both absolute numbers and column percentages were reported in cross tables. Lastly, the range between the 25th and 75th percentile of the sTILs scores was calculated for each case as a ‘surrogate’ measure for interobserver variability, and the association of this range with the different histopathological features was investigated, by using Mann-Whitney U and Kruskal-Wallis tests. All tests were two-sided and the significance level was set at p<0.05, except for Kruskal-Wallis tests, where we applied a post hoc Bonferroni correction for multiple testing (p<0.0083). Interobserver variability was quantified by calculation of the intraclass correlation coefficients (ICC) for sTILs scores, as previously described [27]. Interpretation was performed according to Koo and Li [29]. ICC settings were: two-way random, single measures, absolute agreement. Bland-Altman plots were constructed to visualize the degree of deviation from the median sTILs score Px, by using both the mean of and the difference between each pathologist’s sTILs scores and Px sTILs scores.

RESULTS

Profile of the participants

Forty-one pathologists were invited to participate. All pathologists completed the questionnaire, and forty pathologists (98%) assessed sTILs in the series of digitalized biopsy slides. The participants represented thirty-four laboratories from eleven countries (Australia, Belgium, Canada, France, Italy, Spain, Switzerland, The Netherlands, Turkey, the United Kingdom, and the United States of America). The participants had been practicing pathology for 18,6 years on average (range 3-35 years). Twenty-eight pathologists (68%) worked in academic laboratories; eleven pathologists (27%) worked in non-academic laboratories and two pathologists (5%) worked in both settings. Conventional light microscopy and digital pathology were used on a daily basis by thirty (73%) and four (10%) pathologists, respectively. Seven pathologists (17%) used both techniques in routine practice. The estimated time spent on breast pathology, based on a fulltime working schedule, is shown in Figure 1A. Thirty-five participants (85%) were aware of the ‘International Immuno-Oncology Biomarker Working Group on Breast Cancer’ before their participation in the IVITA study, while five (12%) had not yet heard about the Working Group and one (2%) was uncertain. Thirty-one participants (76%) had already visited the website of the Working Group before participating in IVITA, whereas (24%) ten participants did not. One participant (2%) reported to have never assessed the post-NAC therapeutic response in TNBC; four (10%) and two (5%) participants reported using the Pinder regression score or the Miller-Payne system, respectively. Twenty-five participants (61%) applied the MD Anderson RCB score in routine practice. Additionally, three participants (7%) combined the RCB score and the Pinder regression score, and two participants (5%) used both the RCB score and the Miller-Payne system. One participant (2%) mentioned the use of the ‘Residual Disease in Breast and Nodes’ system, whereas two participants (5%) mentioned the EUSOMA recommendations. One participant (2%) indicated ‘other classification system’, without further specifications. None of the participants used the Chevallier classification, Sataloff’s classification or Nottingham Clinico-Pathological Response Index.

Figure 1.

Pie charts.

(a) Distribution of the time spent on breast pathology (a), as reported by each pathologist based on a fulltime week schedule. (b) Specimens used for sTILs assessment in general, regardless the molecular subtype, as reported by 33 participants. (c) Specimens used for sTILs assessment in TNBC, as reported by 33 participants.

sTILs reporting practice of the participants

Eight pathologists (20%) never mentioned sTILs in the reports of invasive breast cancer patients. Eighteen (44%) and fifteen (37%) pathologists always or sometimes assessed sTILs in invasive breast cancer, respectively. In this subgroup of 33 pathologists, 25 (76%) reported sTILs for all molecular subtypes. One pathologist (3%) only mentioned sTILs in TNBC, whereas four pathologists (12%) assessed sTILs in both TNBC and HER2-positive breast cancer. Two pathologists (6%) stated that they only mentioned sTILs when the stromal immune infiltrate is marked, regardless the molecular subtype. The specimen type used for sTILs assessment in general is displayed in Figure 1B. Reporting practices for sTILs in TNBC according to specimen type are shown in Figure 1C. Nineteen pathologists (46%) did not report sTILS in DCIS, fourteen (34%) pathologists sometimes mentioned sTILs in pure DCIS, whereas six (15%) pathologists always reported TILs in DCIS. Twenty-one pathologists (64%) assessed sTILs as a percentage of the stromal surface area, as described by the ‘International Immuno-Oncology Biomarker Working Group on Breast Cancer’ [16]. Ten pathologists (30%) provided a semi-quantitative score based on their own personal interpretation of the degree of stromal inflammation, and two pathologists (6%) only added a comment when the stromal inflammatory infiltrate was marked. When pathologists mentioned sTILs as a percentage, twenty-three participants (82%) did not use a cut-off, whereas five (18%) did use a threshold to indicate whether a particular case has ‘low TILs’, ‘intermediate TILs’ or ‘high TILs’. Each of these five participants used different thresholds, ranging from 5% to 50%.

Perception of sTILs assessment and its consequences

All participants were asked to estimate the difficulty of sTILs assessment on a scale from 0 to 10, which was most often reported to be moderate (Figure 2A). The need for standardization of sTILs assessment in daily routine practice was questioned in a similar way and was estimated to be rather high (Figure 2B).

Figure 2.

Radar diagrams illustrating the perceived difficulty of sTILs assessment (a) and the perceived importance of standardization of sTILs assessment in daily routine practice (b), as reported by 41 pathologists.

Thirty-five participants (85%) reported to regularly attend multidisciplinary meetings to discuss the clinical management of breast cancer patients. Twenty-four participants (59%) indicated that clinicians actively ask for sTILs assessment during these meetings, either on a regular basis or occasionally. Fifteen pathologists (37%) reported that clinicians never ask for sTILs during these multidisciplinary meetings, and three participants had no opinion (7%). According to fourteen participants (34%), sTILs scores never influenced the NAC treatment scheme for TNBC patients, whereas two additional participants (5%) indicated that this was not yet the case, but very likely to happen in the near future. Seven (17%) and fourteen (34%) participants responded that sTILs influenced the NAC treatment scheme in TNBC on a regular basis, or occasionally, respectively.

Histopathological characteristics

The TNBC dataset contained two biopsies (5%) of pleomorphic invasive lobular carcinoma, and 39 cases (95%) of invasive ductal carcinoma of no special type (NST). The mean age at diagnosis was 55 years (range 31-83). The mean interval between the biopsy and the surgical resection was 5.8 months (range 2.5 – 10.3 months). This interval did not significantly correlate with pCR (p=0.262). Ten TNBC (24%) were of grade 2, and thirty-one (76%) were grade 3. Three TNBC (7%) presented with lympho-vascular invasion in the biopsy, and seven TNBC (17%) contained DCIS. The RCB classes in this dataset were as follows: sixteen cases of RCB-0 (39%), five RCB-I (12%), thirteen RCB-II (32%) and seven RCB-III (17%). The sTILs dataset contained three missing values, represented by two cases which were not assessed by two pathologists because they were considered as extensive DCIS without clear invasion. These cases were not excluded from the analysis. Figure 3 contains a histogram and corresponding stem-and-leaf plot that illustrate the non-normal distribution of the median sTILs score (Px) for each biopsy included in this study (Shapiro-Wilk test: p<0.001). Median Px sTILs were not associated with grade (p=0.346), the presence of lympho-vascular invasion (p=0.629), the presence of an in situ component in the biopsy (p=0.176), or age at diagnosis (p=0.775).

Figure 3.

Histogram (a) and stem-and-leaf plot (b) illustrating the non-normal distribution of the median sTILs scores (Px) in this series of 41 TNBC biopsies.

Quantification of interobserver variability

Supplementary Table 2 contains the ICC values for each pathologist duo. The ICCs range from −0.376 to 0.947, with a mean value of 0.659, indicating an overall substantial interobserver variability [29]. Based on the mean of each pathologist’s sTILs scores and Px, as well as the difference between each pathologist’s sTILs scores with Px, Bland-Altman plots were constructed to visualize the degree of discordance (Supplementary Figure 1; Figure 4). Overall, ‘low’ sTILs cases show less variability than cases with ‘intermediate’ or ‘high’ sTILs. TNBC with higher sTILs levels are generally characterized by a wider range among the different sTILs ratings by the participants. However, the observed interobserver variability was not related to any of the histopathological characteristics. For instance, the range between the 25th and 75th percentile of Px was not associated with the presence of a DCIS component (p=0.543) or tumor grade (p=0.394). The interobserver variability was not associated with any of the laboratory settings or sTILs reporting habits (p>0.05).

Figure 4.

Example of three Bland-Altman plots, showing a substantial lower rating of P8 when compared with Px (a), near-perfect agreement between P9 and Px (b), and a substantial higher rating of P32 when compared with Px (c). Other Bland-Altman plots are shown in Supplementary Figure 1. The full red line is the mean difference, and the dashed and dotted green lines represent the upper and lower limits of the 95% confidence interval of the mean.

Associations between sTILs and therapeutic response

Table 1 contains the descriptive values for the sTILs scores for each individual pathologist and the median Px. We observed a statistically significant association between high sTILs scores and the presence of a pCR for 36 out of forty pathologists (90%). The sTILs scores of one pathologist (2%) were inversely associated with pCR, i.e. high sTILs scores were associated with lack of a pCR. Similar analyses were performed for associations with the RCB class, wherein ‘absent pCR’ was represented by RCB-I, -RCB-II and RCB-III. Here, a post hoc Bonferroni correction for multiple testing was applied, i.e. the level of significance was set at 0.0083. sTILs were associated with RCB class in only eight out of forty (20%) pathologists. Box-and-whisker plots (Supplementary Figure 2) show that TNBC with RCB-II and RCB-III usually have sTILs levels that are intermediate to those of RCB-0 and RCB-I, with the highest sTILs levels observed in RCB-0 and the lowest observed in RCB-I. This was also observed for the median Px sTILs (Figure 5).

Table 1.

Descriptive statistics and associations between TILs and either pCR or RCB class per pathologist.

Rater	TILs versus pCRp-value	TILs versus RCB classp-value	MeanTILs	MinTILs	MaxTILs	RangeTILs	Pe25TILs	Pe50TILs	Pe75TILs
P01	0.020*	0.073	23	0	80	80	5	20	35
P02	0.003*	0.031	22	0	77	77	5	10	40
P03	0.001*	0.004*	24	2	90	88	5.5	10	42.5
P04	0.001*	0.004*	42	5	100	95	20	40	70
P05	0.101	0.394	36	2	90	88	7.5	30	60
P06	0.002*	0.011	35	5	80	75	10	30	55
P07	0.003*	0.019	43	5	90	85	17.5	40	70
P08	0.003*	0.006*	17	0	80	80	2	10	20
P09	0.001*	0.007*	29	2	80	78	15	20	40
P10	0.004*	0.021	20	1	80	79	5	10	30
P11	0.014*	0.083	31	1	80	79	5	20	30
P12	0.004*	0.021	29	5	90	85	8	15	50
P13	0.007*	0.020	31	1	90	89	10	30	40
P14	0.002*	0.011	17	0	90	90	0	5	40
P15	0.011*	0.045	30	3	80	77	8	18	55
P16	0.003*	0.021	44	5	100	95	20	40	75
P17	0.005*	0.034	38	1	90	89	10	30	65
P18	0.120	0.333	25	4	80	76	15	25	62.5
P19	0.019*	0.081	27	0	90	90	50	10	45
P20	0.009*	0.041	25	1	80	79	5	20	45
P21	0.024*	0.054	26	2	90	88	5	10	30
P22	0.002*	0.018	26	2	90	88	12.5	25	55
P23	0.013*	0.016	20	1	70	69	5	17.5	40
P24	0.001*	0.009	31	5	90	85	10	20	70
P25	0.006*	0.030	31	2	90	88	10	40	70
P26	0.016*	0.036	25	0	80	80	5	10	35
P27	0.006*	0.044	29	5	80	75	5	30	70
P28	0.014*	0.092	38	1	95	94	1	10	80
P29	0.004*	0.018	29	2	95	93	10	30	55
P30	0.001*	0.008*	23	2	90	88	7.5	15	30
P31	0.013[$]	0.055	30	1	90	89	20	30	75
P32	0.002*	0.006*	27	0	90	90	10	30	60
P33	0.024*	0.049	25	0	80	80	5	20	40
P34	0.002*	0.007*	30	1	90	89	7.5	20	60
P35	0.159	0.561	28	1	90	89	5	10	35
P36	0.005*	0.009	20	5	75	70	10	30	50
P37	0.003*	0.009	32	3	100	97	10	40	75
P38	0.014*	0.075	35	10	80	70	20	30	50
P39	0.002*	0.007*	38	0	100	100	5	10	35
P40	0.001*	0.009	34	1	90	89	7.5	20	70
Px	0.004*	0.020	29	5	80	75	9.5	20	50

Statistically significant association, with the significance level set at 0.05 (for TILs versus pCR, Mann-Whitney U test) or 0.0083 (for TILs versus RCB class, Kruskal-Wallis test with post hoc Bonferroni correction).

Inverse statistically significant association, i.e. pCR was associated with lower TILs levels.

Max: maximum TILs value; Min: minimum TILs value; Pe25: 25th percentile; Pe50: 50th percentile or median; Pe75: 75th percentile; pCR: pathological complete response; Range: difference between maximum and minimum TILs; TILs: tumor-infiltrating lymphocytes

Figure 5.

Box-and-whisker plots illustrating the association between median sTILs (Px) scores and the absence or presence of pCR (a), and the association between median sTILs (Px) scores and the RCB class (b). Circles represent outliers; asterisks represent extremes. The bold line within each box represents the median value (50th percentile), the upper and lower limits of the boxes represent the 75th and 25th percentiles, respectively.

Post hoc dichotomization using different sTILs thresholds

To identify a cut-off that could be used to select patients who are more likely to achieve a pCR in routine clinical practice, seven thresholds were explored. All sTILs scores of each pathologist were dichotomized as low sTILs versus high sTILs. The 5% cut-off resulted in a significant association between sTILs classification and pCR for only 9 pathologists (23%), whereas the 10% cut-off resulted in a similar association for 19 pathologists (48%; Table 2 and Supplementary Table 3). The 20%, 30% and 40% thresholds resulted in a significant association between sTILs and pCR for 30, 31 and 28 out of 40 pathologists, respectively (75%, 78% and 70%). The 50% and 60% cut-off resulted in a similar association for 25 and 22 out of 40 pathologists, respectively (63% and 55%). Overall, pathologists who generally limit their sTILs score in a narrow range in the lower half of the spectrum do not benefit from a high threshold such as the 40% or 50% cut-off, as too many pCR cases are considered to have low TILs. This was the case for pathologists P1, P8, P21, P26, P30, P31 and P33. On the other hand, pathologists who tend to give high sTILs estimates show a correlation with pCR at a higher sTILs threshold, such as pathologists P13, P15, P17, P32 and P36 (Supplementary Table 3), because a low threshold results in few TNBC being designated as having low TILs.

Table 2.

p-values illustrating the association between sTILs and pCR per pathologist by applying seven different cut-offs to discern low sTILs from high sTILs.

	Applied threshold for low sTILs versus high sTILs
	5%	10%	20%	30%	40%	50%	60%
TILs (P1)	0.010*	0.028*	0.087	0.118	0.305	0.636	0.834
TILs (P2)	0.059	0.007*	0.001*	0.007*	0.002*	0.305	0.305
TILs (P3)	0.030*	0.007*	<0.001*	<0.001*	0.002*	0.305	0.744
TILs (P4)	0.418	0.141	0.033*	0.002*	<0.001*	<0.001*	0.001*
TILs (P5)	0.054	0.124	0.202	0.072	0.323	0.323	0.706
TILs (P6)	0.150	0.154	0.018*	0.001*	<0.001*	0.002*	0.020*
TILs (P7)	0.246	0.086	0.005*	0.015*	0.014*	0.017*	0.020*
TILs (P8)	0.018*	0.001*	0.007*	0.636	0.834	0.834	0.834
TILs (P9)	0.150	0.086	0.003*	0.001*	0.007*	0.133	0.308
TILs (P10)	0.054	0.007*	0.006*	0.017*	0.123	0.008*	0.023*
TILs (P11)	0.096	0.015*	0.003*	0.010*	0.002*	0.020*	0.020*
TILs (P12)	0.922	0.033*	0.001*	0.001*	<0.001*	0.002*	0.025*
TILs (P13)	0.150	0.050	0.051	0.006*	0.007*	0.054	0.120
TILs (P14)	0.007*	0.002*	<0.001*	0.001*	0.025*	0.025*	0.070
TILs (P15)	0.242	0.323	0.055	0.006*	<0.001*	<0.001*	0.005*
TILs (P16)	0.246	0.052	0.018*	0.005*	0.010*	0.029*	0.044*
TILs (P17)	0.224	0.010*	0.121	0.021*	0.005*	0.036*	0.002*
TILs (P18)	0.305	0.819	0.154	0.192	0.058	0.096	0.156
TILs (P19)	0.096	0.021*	0.029*	0.006*	0.021*	0.133	0.305
TILs (P20)	0.096	0.028*	0.002*	0.001*	0.002*	0.007*	0.636
TILs (P21)	0.121	0.029*	0.020*	0.054	0.281	0.133	0.025*
TILs (P22)	0.242	0.156	0.005*	0.010*	<0.001*	0.002*	0.020*
TILs (P23)	0.013*	0.041*	0.140	0.060	0.102	0.278	0.191
TILs (P24)	0.242	0.003*	0.001*	0.001*	0.002*	<0.001*	<0.001*
TILs (P25)	0.545	0.098	0.028*	0.015*	0.003*	<0.001*	0.001*
TILs (P26)	0.192	0.055	0.006*	0.021*	0.129	0.305	0.636
TILs (P27)	0.098	0.058	0.028*	0.003	0.001*	0.002*	0.007*
TILs (P28)	0.028*	0.003*	0.005*	0.005*	0.006*	0.006*	0.007*
TILs (P29)	0.692	0.236	0.015*	0.001*	<0.001*	0.002*	0.054
TILs (P30)	0.030*	0.033*	0.001*	0.129	0.054	0.003*	0.008*
TILs (P31)	0.045*	0.250	0.002*	0.015*	0.051	0.033*	0.154
TILs (P32)	0.246	0.154	0.058	<0.001*	0.001*	0.001*	0.020*
TILs (P33)	0.098	0.028*	0.010*	0.007*	0.250	0.478	0.819
TILs (P34)	0.030*	0.058	0.007*	0.007*	0.002*	0.002*	0.002*
TILs (P35)	0.051	0.055	0.154	0.942	0.922	0.922	0.757
TILs (P36)	0.056	0.350	0.218	0.001*	0.002*	0.025*	0.025*
TILs (P37)	0.224	0.059	0.033*	0.072	0.021*	0.001*	<0.001*
TILs (P38)	$	0.141	0.028*	0.021*	0.036*	0.054	0.133
TILs (P39)	0.154	0.003*	<0.001*	0.002*	0.007*	0.007*	0.007*
TILs (P40)	0.156	0.009*	0.007*	0.002*	<0.001*	<0.001*	0.001*

Statistically significant result as determined by Chi Square test.

No p-value was calculated as none of the sTILs scores was <5%

pCR: pathological complete response; sTILs: stromal tumor-infiltrating lymphocytes.

DISCUSSION

In the present study, we demonstrate substantial interobserver variability in sTILs assessment, although the ICC values strongly vary among the different participants. As the participating pathologists work in different countries, employ different laboratory settings (academic versus non-academic, digital versus conventional microscopy, etc) and differ in their reporting habits (quantifying therapeutic response, routine sTILs reporting or not, etc), several factors might have influenced the observed degree of discordance. The variation in practice of TILs reporting from the survey is an interesting finding and calls for more standardization, as was acknowledged by the participants. Unfortunately, the heterogeneous characteristics of the participants do not allow extensive statistical analysis due to lack of power. Similarly, it was impossible to investigate a potential ‘training center effect’. Additionally, various pitfalls in the sTILs assessment may also have contributed to increased discordance, including crush artifacts, section artifacts due to blunt microtome knifes, overstained specimens, extensive tumor necrosis, solid TNBC architecture mimicking pure DCIS, limited intra- and peri-tumoral stroma, and extensive neutrophilic infiltration (Figure 6), as previously described [17]. Although we aimed to obtain a ‘real-life’ biopsy dataset, the evaluation of a single digitalized archived H&E slide does not correspond to the ‘real-life’ setting. In routine practice, deeper levels are available to cope with technical artifacts, and immunohistochemical stains for myoepithelial markers are available to distinguish in situ from invasive components. Most participants did not use digital pathology on a daily basis, which might also have influenced the sTILs scores.

Figure 6.

Photomicrographs of TNBC biopsies, illustrating several potential pitfalls which can hamper sTILs assessment, such as DCIS-like TNBC with solid architecture (a-c), an overstained biopsy specimen with folds (d), section artefacts caused by a blunt microtome knife (e), extensive necrosis (f), extensive neutrophilic infiltration in necrotic areas (g), ample crush artefacts (h) and limited amounts of peri- and intra-tumoral stroma (i).

Interestingly, the individual sTILs scores were statistically significantly associated with therapeutic response for 90% of all participants, despite the presence of substantial interobserver variability and despite the limited size of the evaluated TNBC cohort. This observation indicates that high sTILs are a robust predictive marker for achieving a pCR after NAC in TNBC, at least at the population level. The 2019 Saint Gallen International Consensus Panel recommended that sTILs be routinely assessed in TNBC because of their prognostic value [30], although this has not been widely adopted in international guidelines. Nevertheless, the 2021 Saint Gallen International Consensus Panel voted against the routine use of sTILs in early TNBC, as evidence on sTILs for guidance of NAC regimens in TNBC patients is lacking [31,32]. This contrasts with the perception of twenty-one participants in the present study, who inadvertently assumed that sTILs in the pre-NAC biopsy influenced the NAC treatment at least occasionally. The above variation in sTILs assessment to identify patients likely to achieve a pCR might impact the clinical decision-making if sTILs would be used one day to guide the NAC regimen for individual patients. At present, sTILs are reported as a continuous variable, but any future clinical decision-making will require a particular threshold. Although there is insufficient evidence to de-escalate NAC at present [31,32], future studies should determine this ‘ideal’ sTILs threshold, i.e. how much sTILs in the pre-NAC biopsy are sufficient to de-escalate the NAC regimen, without compromising the chance of achieving a pCR for a significant number of patients? The introduction of a particular threshold to guide clinical decision-making will have to be accompanied by education of pathologists to render sTILs assessment more uniform. Computational assessment by the use of machine learning models might aid to objectify sTILs levels in TNBC in the future [33]. In the present study, we explored seven different post hoc thresholds for sTILs assessment, which affect the number of TNBC that are designated as ‘high sTILs’ and ‘low sTILs’, as well as the association with pCR. The total number of statistically significant associations between pCR and individual sTILs assessments did not substantially differ between the 20%, 30% and 40% thresholds: 30, 31 and 28 out of 40 pathologists, respectively. However, the association depended on the ‘stringency’ of the sTILs assessment. For instance, pathologists who gave low sTILs estimates did not benefit from the thresholds above 40%, which assigned too many TNBC cases to the ‘low sTILs’ category. Pathologists who gave high sTILs estimates benefited from the higher sTILs thresholds, as the thresholds below 30% assigned too many non-pCR TNBC to the ‘high sTILs’ category (Table 2; Supplementary Table 3). Of note, the participants were not aware of these thresholds at the time of the assessment, and therefore, the use of ad hoc thresholds would likely provide different results. Future studies should investigate ad hoc which sTILs threshold is characterized by acceptable interobserver variability among a large community of pathologists. Simultaneously, the selected threshold should have an acceptable ‘degree of error’, i.e. how many ‘false-negative’ high sTILs TNBC and ‘false-positive’ low sTILs TNBC patients are tolerated? The former will not be treated with a de-escalated NAC regimen and are exposed to potential side effects, whereas the latter are inadvertently undertreated by a de-escalated NAC regimen and have smaller chances of achieving a (near) pCR. Additional research is required to explore this difficult equilibrium. The inter-observer variability observed in sTILs assessment in TNBC shows striking similarities with Ki-67 assessment in early hormone receptor-positive, HER2-negative breast cancer, which shows substantial inter-laboratory and inter-observer variability as well [34,35]. Similarly to sTILS, Ki-67 was associated with pCR both as a continuous variable and as a dichotomized variable at several thresholds, in the neoadjuvant GeparTrio trial [36]. Pathologists and oncologists will have to face similar challenges in sTILs assessment, but the experience with the issues in Ki-67 assessment might provide useful information for the implementation of sTILs as a quantitative biomarker in TNBC. Although we observed a strong association between high sTILs and high pCR rates in TNBC for most participants, this was not the case when the individual sTILs scores were correlated with the RCB class: a statistically significant association was observed for only 20% of the participants. Heterogeneously distributed sTILs are unlikely to be responsible for this phenomenon, as Cha et al. have shown that sTILs in core needle biopsies strongly correlated with sTILs in subsequent resections [37]. Additionally, Althobiti et al. reported no significant difference between sTILs across different tumor blocks of the same case [38]. In the present cohort, the reduced association with RCB class was mainly due to the RCB-II and RCB-III cases, which showed sTILs levels intermediate to those observed in RCB-0 and RCB-I. This peculiar observation may suggest that pCR is multifactorial. There might be a role for failing immune responses, as several of these RCB-II/III cases contained an almost similar number of sTILs than some TNBC with post-NAC pCR. However, the limited size of the present TNBC cohort precludes any strong conclusion regarding sTILs levels in RCB-I cases, due to a lack of power. Our observation requires validation in larger, independent patient cohorts to exclude findings merely due to chance. Although assessment of sTILs in residual disease was beyond the scope of the present study, sTILs in residual post-NAC TNBC could add further prognostic information to RCB class, as high residual sTILs levels are associated with improved recurrence-free and overall survival [39]. Future studies should explore whether additional analyses can fine-tune the prognostic and predictive value of sTILs. Immunohistochemical subtyping of sTILs may elucidate which immune cell subtypes stimulate an anti-tumor response during NAC. For instance, high post-NAC levels of CD4-positive lymphocytes in RCB-II and RCB-III TNBC seem to be associated with longer distant recurrence-free survival, and their prognostic value is independent of the RCB class [40]. High pre-NAC levels of CD4-positive lymphocytes are also associated with higher rates of pCR in a breast cancer cohort containing various molecular subtypes [41]. Inflammatory breast cancer patients with high numbers of intratumor CD20-positive and CD8-positive lymphocytes respond better to treatment (Badr et al. – submitted manuscript). New technologies such as multiplex immunofluorescent profiling of the immune microenvironment and whole transcriptome RNA sequencing may also aid the future fine-tuning of sTILs as a predictive marker for pCR. Immunomodulatory mRNA signatures and the PAM50 basal-like profile are associated with significantly higher pCR rates in TNBC [42]. Immune-associated mRNA signatures were associated with pCR after NAC in the GeparNuevo trial, although they were of limited use to predict the response to additional immune-checkpoint blockade by durvalumab [43]. Patients with metastatic or locally advanced TNBC are eligible for treatment with immune checkpoint inhibitors such as atezolizumab, on the condition that the PD-L1 expression on immune cells occupies ≥1% of the tumor area [44]. Atezolizumab represents the first targeted therapy for TNBC patients [45]. Addition of neoadjuvant pembrolizumab to the NAC regimen for stage II/III TNBC patients significantly increased the chance of obtaining a pCR in the phase 3 KEYNOTE-522 trial, regardless the PD-L1 status [46]. Other immune checkpoint inhibitors such as durvalumab are currently being evaluated in a clinical trial setting. Despite the poor reproducibility of PD-L1 assessment in a prospective multi-institutional assessment [47], the interobserver variation seems more limited within a single institution [48]. PD-L1 expression in sTILs might be useful to identify patients at high risk for poor therapeutic response. Consequently, these patients may be eligible for additional immune checkpoint blockade in the neoadjuvant setting. Foldi et al. recently reported promising results in a phase I/II trial, wherein PD-L1-positive TNBC were associated with higher pCR rates than PD-L1-negative TNBC, independent of the pre-NAC sTILs levels [49]. The GeparNuevo trial suggested similar results, as the addition of durvalumab before the start of anthracycline/taxane-based NAC seemed to increase pCR rates in TNBC patients [50]. The International Immuno-Oncology Biomarker Working Group developed a risk management framework for the implementation of combined PD-L1 and TILs assessment in breast cancer [44], as several studies reported a strong correlation between PD-L1 positive immune cells and high sTILs levels [49,51-54]. Biologically, TNBCs require infiltration by sTILs to be designated as PD-L1 positive. In conclusion, sTILs are a robust marker for pCR at the group level, despite substantial interobserver variability among pathologists. However, if sTILs are to be used to guide de-escalation of the NAC regimen in individual patients, inter-observer discordance might significantly impact the chance of obtaining a pCR. Future studies should therefore explore the impact of training, as well as the ‘ideal’ sTILs threshold for dichotomization, as clinical decision-making will demand a particular cut-off. Although sTILs can be considered as a prognostic marker, there is currently insufficient evidence to modify NAC regimens based on pre-NAC sTILs levels. Intriguingly, patients with RCB-II and RCB-III in this cohort often had intermediate sTILs, which may suggest failing immune responses. Hence, future research should focus on fine-tuning patient selection for sTILs-based de-escalation of NAC regimens.

53 in total

1. Reproducibility of residual cancer burden for prognostic assessment of breast cancer after neoadjuvant chemotherapy.

Authors: Florentia Peintinger; Bruno Sinn; Christos Hatzis; Constance Albarracin; Erinn Downs-Kelly; Jerzy Morkowski; Rebekah Gould; W Fraser Symmans
Journal: Mod Pathol Date: 2015-05-01 Impact factor: 7.842

2. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

Authors: Terry K Koo; Mae Y Li
Journal: J Chiropr Med Date: 2016-03-31

3. Estimating the benefits of therapy for early-stage breast cancer: the St. Gallen International Consensus Guidelines for the primary therapy of early breast cancer 2019.

Authors: H J Burstein; G Curigliano; S Loibl; P Dubsky; M Gnant; P Poortmans; M Colleoni; C Denkert; M Piccart-Gebhart; M Regan; H-J Senn; E P Winer; B Thurlimann
Journal: Ann Oncol Date: 2019-10-01 Impact factor: 32.976

4. The requirements of a specialist Breast Centre.

Authors: A R M Wilson; L Marotti; S Bianchi; L Biganzoli; S Claassen; T Decker; A Frigerio; A Goldhirsch; E G Gustafsson; R E Mansel; R Orecchia; A Ponti; P Poortmans; P Regitnig; M Rosselli Del Turco; E J Th Rutgers; C van Asperen; C A Wells; Y Wengström; L Cataliotti
Journal: Eur J Cancer Date: 2013-08-19 Impact factor: 9.162

5. Measurement of residual breast cancer burden to predict survival after neoadjuvant chemotherapy.

Authors: W Fraser Symmans; Florentia Peintinger; Christos Hatzis; Radhika Rajan; Henry Kuerer; Vicente Valero; Lina Assad; Anna Poniecka; Bryan Hennessy; Marjorie Green; Aman U Buzdar; S Eva Singletary; Gabriel N Hortobagyi; Lajos Pusztai
Journal: J Clin Oncol Date: 2007-09-04 Impact factor: 44.544

6. Histological subtypes in triple negative breast cancer are associated with specific information on survival.

Authors: Maschenka C A Balkenhol; Willem Vreuls; Carla A P Wauters; Suzanne J J Mol; Jeroen A W M van der Laak; Peter Bult
Journal: Ann Diagn Pathol Date: 2020-03-03 Impact factor: 2.090

7. Ki67 levels as predictive and prognostic parameters in pretherapeutic breast cancer core biopsies: a translational investigation in the neoadjuvant GeparTrio trial.

Authors: C Denkert; S Loibl; B M Müller; H Eidtmann; W D Schmitt; W Eiermann; B Gerber; H Tesch; J Hilfrich; J Huober; T Fehm; J Barinoff; C Jackisch; J Prinzler; T Rüdiger; E Erbstösser; J U Blohmer; J Budczies; K M Mehta; G von Minckwitz
Journal: Ann Oncol Date: 2013-08-22 Impact factor: 32.976

8. A randomised phase II study investigating durvalumab in addition to an anthracycline taxane-based neoadjuvant therapy in early triple-negative breast cancer: clinical results and biomarker analysis of GeparNuevo study.

Authors: S Loibl; M Untch; N Burchardi; J Huober; B V Sinn; J-U Blohmer; E-M Grischke; J Furlanetto; H Tesch; C Hanusch; K Engels; M Rezai; C Jackisch; W D Schmitt; G von Minckwitz; J Thomalla; S Kümmel; B Rautenberg; P A Fasching; K Weber; K Rhiem; C Denkert; A Schneeweiss
Journal: Ann Oncol Date: 2019-08-01 Impact factor: 32.976

9. An international Ki67 reproducibility study.

Authors: Mei-Yin C Polley; Samuel C Y Leung; Lisa M McShane; Dongxia Gao; Judith C Hugh; Mauro G Mastropasqua; Giuseppe Viale; Lila A Zabaglo; Frédérique Penault-Llorca; John M S Bartlett; Allen M Gown; W Fraser Symmans; Tammy Piper; Erika Mehl; Rebecca A Enos; Daniel F Hayes; Mitch Dowsett; Torsten O Nielsen
Journal: J Natl Cancer Inst Date: 2013-11-07 Impact factor: 13.506

10. Neoadjuvant durvalumab plus weekly nab-paclitaxel and dose-dense doxorubicin/cyclophosphamide in triple-negative breast cancer.

Authors: Julia Foldi; Andrea Silber; Emily Reisenbichler; Kamaljeet Singh; Neal Fischbach; Justin Persico; Kerin Adelson; Anamika Katoch; Nina Horowitz; Donald Lannin; Anees Chagpar; Tristen Park; Michal Marczyk; Courtney Frederick; Trisha Burrello; Eiman Ibrahim; Tao Qing; Yalai Bai; Kim Blenman; David L Rimm; Lajos Pusztai
Journal: NPJ Breast Cancer Date: 2021-02-08

4 in total

1. Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms.

Authors: Katherine Elfer; Sarah Dudgeon; Victor Garcia; Kim Blenman; Evangelos Hytopoulos; Si Wen; Xiaoxian Li; Amy Ly; Bruce Werness; Manasi S Sheth; Mohamed Amgad; Rajarsi Gupta; Joel Saltz; Matthew G Hanna; Anna Ehinger; Dieter Peeters; Roberto Salgado; Brandon D Gallas
Journal: J Med Imaging (Bellingham) Date: 2022-07-27

2. Interobserver Agreement of PD-L1/SP142 Immunohistochemistry and Tumor-Infiltrating Lymphocytes (TILs) in Distant Metastases of Triple-Negative Breast Cancer: A Proof-of-Concept Study. A Report on Behalf of the International Immuno-Oncology Biomarker Working Group.

Authors: Mieke R Van Bockstal; Maxine Cooks; Iris Nederlof; Mariël Brinkhuis; Annemiek Dutman; Monique Koopmans; Loes Kooreman; Bert van der Vegt; Leon Verhoog; Celine Vreuls; Pieter Westenend; Marleen Kok; Paul J van Diest; Inne Nauwelaers; Nele Laudus; Carsten Denkert; David Rimm; Kalliopi P Siziopikou; Scott Ely; Dimitrios Zardavas; Mustimbo Roberts; Giuseppe Floris; Johan Hartman; Balazs Acs; Dieter Peeters; John M S Bartlett; Els Dequeker; Roberto Salgado; Fabiola Giudici; Stefan Michiels; Hugo Horlings; Carolien H M van Deurzen
Journal: Cancers (Basel) Date: 2021-09-29 Impact factor: 6.639

3. HER2-low breast cancer shows a lower immune response compared to HER2-negative cases.

Authors: Nadine S van den Ende; Marcel Smid; Annemieke Timmermans; Johannes B van Brakel; Tim Hansum; Renée Foekens; Anita M A C Trapman; Bernadette A M Heemskerk-Gerritsen; Agnes Jager; John W M Martens; Carolien H M van Deurzen
Journal: Sci Rep Date: 2022-07-28 Impact factor: 4.996

Review 4. Predictive Biomarkers of Response to Neoadjuvant Chemotherapy in Breast Cancer: Current and Future Perspectives for Precision Medicine.

Authors: Françoise Derouane; Cédric van Marcke; Martine Berlière; Amandine Gerday; Latifa Fellah; Isabelle Leconte; Mieke R Van Bockstal; Christine Galant; Cyril Corbet; Francois P Duhoux
Journal: Cancers (Basel) Date: 2022-08-11 Impact factor: 6.575

4 in total