Literature DB >> 23963140

Meta-analysis of agreement between MRI and pathologic breast tumour size after neoadjuvant chemotherapy.

M L Marinovich¹, P Macaskill, L Irwig, F Sardanelli, G von Minckwitz, E Mamounas, M Brennan, S Ciatto, N Houssami.

Abstract

BACKGROUND: Magnetic resonance imaging (MRI) has been proposed to guide breast cancer surgery by measuring residual tumour after neoadjuvant chemotherapy. This study-level meta-analysis examines MRI's agreement with pathology, compares MRI with alternative tests and investigates consistency between different measures of agreement.
METHODS: A systematic literature search was undertaken. Mean differences (MDs) in tumour size between MRI or comparator tests and pathology were pooled by assuming a fixed effect. Limits of agreement (LOA) were estimated from a pooled variance by assuming equal variance of the differences across studies.
RESULTS: Data were extracted from 19 studies (958 patients). The pooled MD between MRI and pathology from six studies was 0.1 cm (95% LOA: -4.2 to 4.4 cm). Similar overestimation for MRI (MD: 0.1 cm) and ultrasound (US) (MD: 0.1 cm) was observed, with comparable LOA (two studies). Overestimation was lower for MRI (MD: 0.1 cm) than mammography (MD: 0.4 cm; two studies). Overestimation by MRI (MD: 0.1 cm) was smaller than underestimation by clinical examination (MD: -0.3 cm). The LOA for mammography and clinical examination were wider than that for MRI. Percentage agreement between MRI and pathology was greater than that of comparator tests (six studies). The range of Pearson's/Spearman's correlations was wide (0.21-0.92; 16 studies). Inconsistencies between MDs, percentage agreement and correlations were common.
CONCLUSION: Magnetic resonance imaging appears to slightly overestimate pathologic size, but measurement errors may be large enough to be clinically significant. Comparable performance by US was observed, but agreement with pathology was poorer for mammography and clinical examination. Percentage agreement can provide supplementary information to MDs and LOA, but Pearson's/Spearman's correlation does not provide evidence of agreement and should be avoided. Further comparisons of MRI and other tests using the recommended methods are warranted.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 23963140 PMCID： PMC3776985 DOI： 10.1038/bjc.2013.473

Source DB: PubMed Journal: Br J Cancer ISSN： 0007-0920 Impact factor: 7.640

Magnetic resonance imaging (MRI) has been proposed to have a role in guiding breast cancer surgical extent by measuring the size of the residual tumour after neoadjuvant chemotherapy (NAC), and has been shown to have good sensitivity for detecting residual disease in that setting (Marinovich ). Given that current guidelines for response evaluation recommend assessment of the largest tumour diameter (Eisenhauer ), estimation of the largest diameter by MRI may guide decisions about whether subsequent mastectomy or breast conserving surgery (BCS) should be attempted, as well as assist in the planning of resection volume to achieve clear surgical margins in BCS. Underestimation of tumour size may therefore lead to involved surgical margins and repeat surgery; overestimation may lead to overly radical surgery (including mastectomy when BCS may have been possible) and poorer cosmetic and psychosocial outcomes (Irwig and Bennetts, 1997). The assessment of tumour size before surgery is subject to a number of potential errors (Padhani and Husband, 2000). Reactive inflammation, fibrosis or necrosis in response to NAC may present as areas of enhancement on MRI images, which may be difficult to distinguish from residual tumour (Yeh ; Belli ). Regression of the tumour as multiple, scattered tumour deposits may also make assessment of the longest diameter problematic, with different approaches to measurement that either include (Rosen ; Wright ) or exclude (Cheung ; Bollet ) intervening normal tissue. Ductal carcinoma in situ (DCIS) may not be well visualised (Berg ) or, alternatively, may be indistinguishable from invasive cancer (Partridge ). Imaging artefacts may also introduce errors in tumour size estimation. For example, the placement of markers in or around the tumour may produce areas of increased signal intensity, which are difficult to distinguish from residual foci, or areas of low signal, which may contribute to size underestimation. Underestimation may also occur owing to partial volume effects (Lobbes ). Furthermore, the inherently pliable nature of breast tissue means that tumour dimensions may vary, depending on patient positioning (Tucker, 2012). In this systematic review and study-level meta-analysis, we investigate agreement in the measurement of residual tumour size by MRI and pathology (the reference standard) after NAC for breast cancer, as assessed by mean differences (MDs) and 95% limits of agreement (LOA) (Bland and Altman, 1986). We also compare the agreement between pathology and alternative tests which have been used to measure residual tumour before surgery (ultrasound (US), clinical examination and mammography). The consistency of results from different methods to assess agreement is investigated, and recommendations are made about methods for future studies.

Materials and methods

Identification of studies

A systematic search of the biomedical literature up to February 2011 was undertaken to identify studies assessing the accuracy of MRI after NAC in measuring the size of residual tumour. MEDLINE and EMBASE were searched via EMBASE.com; PREMEDLINE, Database of Abstracts of Reviews of Effects, Heath Technology Assessment (CLHTA) and the Cochrane databases were searched via Ovid. Search terms were selected to link MRI with breast cancer and response to NAC. Keywords and medical subject headings included ‘breast cancer', ‘nuclear magnetic resonance imaging', ‘MRI', ‘neoadjuvant' and ‘response'. The full search strategy has been reported previously (Marinovich , 2013). Reference lists were also searched and content experts consulted to identify additional studies.

Review of studies and eligibility criteria

All abstracts were screened for eligibility by one author (LM), and a sample of 10% was assessed independently by a second author (NH) to ensure consistent application of the eligibility criteria. Eligible studies were required to have enrolled a minimum of 15 patients with newly diagnosed breast cancer undergoing NAC, with MRI and at least one other test (US, mammography and clinical examination) undertaken after NAC to assess the size of residual tumour before surgery. Pathologically measured tumour size based on surgical excision was the reference standard, but studies were not excluded if alternative reference standards were used in a minority of patients. Potentially eligible citations were reviewed in full (LM or NH). The screening and inclusion process is summarised in Supplementary Information Resource 1 (PRISMA flowchart).

Data extraction

Data relating to tumour size assessment, study design, patient characteristics, tumours, treatment, technical details of MRI, comparator tests and the reference standard were extracted independently by two authors (LM, and either SC, MB or FS). Quality appraisal was undertaken using the Quality Assessment of Diagnostic Accuracy Studies checklist (version 1, modified for this clinical setting; Whiting , 2006). Disagreements were resolved by discussion and consensus, with arbitration by a third author (NH) when required.

Measures of agreement

Bland and Altman (1986) describe appropriate methods to assess agreement between two continuous measures and highlight the inadequacy of the Pearson's correlation coefficient when used for this purpose. Unlike methods such as intraclass correlation (ICC), the Pearson's correlation coefficient measures the degree to which there is a linear, but not necessarily 1 : 1 relationship. Hence, it is possible for a high Pearson's correlation to be observed when there is poor agreement between two measures (e.g., when tests systematically under- or overestimate pathologic size). Spearman's rank correlation is similarly problematic. A commonly reported alternative approach involves calculating the percentage of cases for which there is ‘agreement' between measures within a chosen ‘margin of error'. This approach also has limitations, as the chosen margin of error may be somewhat arbitrary, and tendencies for one measure to under- or overestimate the other within that margin may be obscured. The approach recommended by Bland and Altman (1986) comprises a scatterplot of the differences between the measures (the vertical axis) against their mean (horizontal axis). If the differences are normally distributed and are independent from the underlying size of the measurements, agreement may be quantified by the MD and associated 95% LOA. Hence, MDs and LOA were extracted from studies reporting these outcomes. When LOA were not presented, data were extracted from which the LOA could be derived (e.g., s.d. of the difference or root mean square error). Despite their limitations, percentage agreement within a margin of error (and associated percentages of under/overestimation) and correlation coefficients were also extracted to provide a descriptive summary of these measures.

Statistical analysis

MDs between tumour size measurements by MRI or comparator tests and pathology were pooled by the inverse variance method by assuming a fixed effect using RevMan 5.2 (The Nordic Cochrane Centre (Copenhagen), The Cochrane Collaboration, 2012) (http://ims.cochrane.org/sites/ims.cochrane.org/files/uploads/documents/revman/RevMan_5.2_User_Guide.pdf). The Cochrane Q statistic was used to assess whether statistically significant heterogeneity was present (significant at P<0.10), and the extent of heterogeneity was quantified by the I2 statistic (Higgins ). To estimate the 95% LOA for a pooled MD, a pooled variance was computed under the assumption that the variance of the differences was equal across studies. The pooled variance was calculated as the weighted average of these within-study variances, weighted by the corresponding degrees of freedom for each study (i.e., an extension of the approach used for a two sample Student's t-test (Woodward, 1999)).

Results

Study characteristics

A total of 2108 citations were identified. Nineteen studies were eligible for inclusion in the systematic review (Weatherall ; Balu-Maestro ; Partridge ; Rosen ; Bodini ; Chen ; Londero ; Julius ; Montemurro ; Akazawa ; Bollet ; Segara ; Bhattacharyya ; Moon ; Prati ; Nakahara ; Wright ; Guarneri ), reporting data on 958 patients undergoing MRI and/or comparator tests; MRI data were reported for 953 patients. Studies enrolled patients between 1998 and 2007 (median mid-point of recruitment 2002), and included a median of 38 patients with MRI data (range 12–195). Characteristics of included studies are summarised in Table 1. Study quality appraisal is summarised in Supplementary Information Resource 2.

Table 1

Summary of cohort, tumour, treatment and reference standard characteristics of included studies

	Number providing data
Variable	Studies	Patients	Median estimate	IQR	Range
Cohort characteristics
N (MRI)	19	953	38	21–60	12–195
Recruitment mid-point (year)	12	680	2002	2001–2005	1998–2007
Age, mean (or median) (years)	16	834	48	45–49	42–56
Menopausal status (%)a
Pre	5	254	60.4	59.3–68.8	55.3–75.4
Peri/post	5	118	39.6	31.2–40.7	24.6–44.7
Tumour sizea
Clinical size, mean (or median) (cm)	9	343	4.9	4.7–6.2	4.3–8.2
T stage (%)a
T1	9	50	2.1	0.0–2.6	0.0–50.7
T2	9	323	48.2	10.0–72.9	0.0–84.9
T3	9	166	27.1	12.3–47.9	7.2–68.9
T4	9	93	13.2	2.7–30.0	0.0–43.8
Tx	9	1	0.0	0.0–0.0	0.0–0.5
Stage (%)a
I	7	2	0.0	0.0–0.0	0.0–6.2
II	6	202	81.4	62.5–86.4	47.6–86.7
III	6	55	18.6	13.6–31.2	13.3–52.4
IV	8	0	0.0	0.0–0.0	0.0–0.0
Histology (%)a
IDC	15	552	82.2	71.2–90.0	48.6–96.5
ILC or IDC/ILC	15	79	10.0	5.1–18.8	0.0–26.0
Other	15	31	3.5	0.0–10.2	0.0–16.1
Unknown or NR	15	16	0.0	0.0–0.0	0.0–15.9
Nodal status (%)a
Positive	6	316	62.0	45.8–71.1	38.4–93.8
Negative	6	128	36.5	28.9–54.2	6.2–61.6
Unknown or NR	6	1	0.0	0.0–0.0	0.0–3.1
Grade (%)a
I	5	23	13.3	6.2–18.8	0.0–22.0
II	5	61	37.5	22.2–43.8	15.3–49.2
III	5	95	43.8	25.4–51.1	25.0–78.0
Unknown or NR	5	16	6.8	3.4–13.3	0.0–25.0
ER (%)a
Positive	8	333	63.6	55.9–67.6	40.6–75.0
Negative	7	240	33.9	32.2–48.2	25.0–59.4
Unknown or NR	7	4	0.0	0.0–3.4	0.0–3.4
PR (%)a
Positive	6	155	39.2	34.9–47.5	6.8–68.8
Negative	6	247	53.5	31.2–63.8	27.1–65.1
Unknown or NR	6	41	0.0	0.0–3.4	0.0–66.1
HER2 (%)a
Positive	8	199	29.7	19.6–39.2	12.5–73.9
Negative	8	373	70.3	58.3–78.7	26.1–87.5
Unknown or NR	8	5	0.0	0.0–1.7	0.0–5.1
NAC regimen (%)a
Anthracycline-based	19	316	9.7	0.0–82.7	0.0–100.0
Antracycline-taxane-based	19	437	20.0	0.0–87.2	0.0–100.0
Other	19	210	1.7	0.0–10.5	0.0–100.0
Studies using Trastuzumab with NACa
Trastuzumab used (%)	5b	80	5.6	4.7–42.4	2.1–57.4
Trastuzumab not used (%)	5b	376	94.4	57.6–95.3	42.6–97.9
Type of surgery (%)a
BCS	13	281	37.3	23.8–58.1	6.0–100.0
Mastectomy	13	281	62.7	41.9–76.2	0.0–94.0
No surgery	13	2c	0.0	0.0–0.0	0.0–6.2
Type of reference standard (%)a
Pathology	19	951	100.0	100.0–100.0	93.8–100.0
Other	19	2c	0.0	0.0–0.0	0.0–6.2
Time from MRI to surgery
Days, mean (or median/estimate)	8	255	22.0	14–28	7–28
Prevalence of pCR (%)a
pCR	19	957	14.3	8.3–18.8	0.0–28.6

Abbreviations: BCS=breast conserving surgery; DCIS=ductal carcinoma in situ; ER=oestrogen receptor; HER2=human epidermal growth factor receptor 2; IDC=invasive ductal carcinoma; ILC=invasive lobular carcinoma; IQR=interquartile range; MRI=magnetic resonance imaging; NAC=neoadjuvant chemotherapy; NR=not reported; pCR=pathologic complete response; PR=progesterone receptor.

Calculation of values based on total number of patients enrolled, a minority of whom may not have undergone MRI or were excluded from the analysis for other reasons.

Used in six studies, but figures based on five studies where the proportion of patients receiving Trastuzumab is reported.

Localisation biopsy showed the absence of residual tumour (i.e., pathologic measurement of 0.0 cm).

MRI details

Technical characteristics of MRI are summarised in Supplementary Information Resource 3. The majority of studies used DCE-MRI (84.2%) with a 1.5-T magnet (73.7%). Dedicated bilateral breast coils were used in all studies in which the coil type was reported. All studies providing detail on contrast employed gadolinium-based materials, most commonly gadopentetate dimeglumine (68.4%), typically at the standard dosage of 0.1 mmol per kg body weight (68.4%).

Reference standard

Pathology from surgical excision was the reference standard for all patients in all but one study (Bhattacharyya ), where the absence of residual tumour (pathologic complete response, pCR) in two patients was verified by localisation biopsy, representing 0.2% of patients included in all studies. Study-specific rates of pCR ranged between 0.0% and 28.6%, with a median 14.3% (Table 1).

Mean differences between MRI and pathology

Six studies (Partridge ; Akazawa ; Segara ; Prati ; Wright ; Guarneri ) reported MDs and LOA between MRI and pathology (Supplementary Information Resource 4). All studies measured the longest tumour diameter, except for a study by Akazawa that measured the diameter along the plane connecting the nipple and the tumour centre. This study is therefore presented descriptively, but has been excluded from pooled analyses. Meta-analysis of MDs between MRI and pathologic tumour measurement (Figure 1) showed a tendency for MRI to slightly overestimate pathologic tumour size, with a pooled MD of 0.1 cm (95% CI: −0.1–0.3 cm). There was no evidence of heterogeneity (I2=0%). Pooled LOA indicated that 95% of pathologic measurements fall between −4.2 cm and +4.4 cm of the MRI measurement.

Figure 1

Forest plot of mean difference (cm) between MRI and pathologic size (all studies).

Within-study comparisons of MRI versus US, clinical examination and mammography are presented in Supplementary Information Resource 4. For all but a single study showing similar, small tendencies for overestimation by MRI (0.16 cm) and US (0.06 cm) (Guarneri ), the absolute values of MDs within studies were lower for MRI than that for the alternative tests. Pooled MDs and 95% LOA are summarised in Table 2 and Figures 2, 3, 4. There was no evidence of heterogeneity for MRI in any of the analyses, or for US (all I2=0%). Pooled results from two studies (Segara ; Guarneri ) showed similar small overestimation of pathologic tumour size by MRI and US (MDs of 0.1 cm for both tests), with comparable LOA. Pooled MDs and LOA from two studies (Prati ; Wright ) were larger for mammography (0.4, 95% LOA −7.1 to 8.0 cm) than for MRI (0.1 cm, 95% LOA −6.0 to 6.3 cm), with moderate heterogeneity in MDs for mammography (I2=39%). Pooled estimates for MRI and clinical examination across four studies (Partridge ; Segara ; Prati ; Wright ) resulted in substantial heterogeneity for the latter test (Q=20.59, df=3, P=0.0001; I2=85%); three studies reported that clinical examination underestimated pathologic tumour size, and one study reported the reverse. Pooled MDs showed larger underestimation with wider LOA for clinical examination (−0.3 cm, 95% LOA: −5.3 to 4.7 cm) relative to MRI overestimation (0.1 cm, 95% LOA: −4.5 to 4.6 cm).

Table 2

Pooled MD and LOA (cm) restricted to studies comparing the respective tests (fixed effects)

	N (studies)	MD (95% CI) (cm)	I²	LOA (cm)
MRI	2	0.1 (−0.2, 0.3)	0%	−2.9, 3.0
US	2	0.1 (−0.1, 0.4)	0%	−2.6, 2.9
MRI	4	0.1 (−0.2, 0.3)	0%	−4.5, 4.6
Clinical exam	4	−0.3 (−0.7, 0.0)	85%	−5.3, 4.7
MRI	2	0.1 (−0.5, 0.8)	0%	−6.0, 6.3
Mammography	2	0.4 (−0.5, 1.3)	39%	−7.1, 8.0

Abbreviations: CI=confidence interval; LOA=limits of agreement; MD=mean difference; MRI=magnetic resonance imaging; US=ultrasound.

Figure 2

Forest plots of mean difference (cm) between MRI or US and pathologic size (comparative studies).

Figure 3

Forest plots of mean difference (cm) between MRI or clinical examination and pathologic size (comparative studies).

Figure 4

Forest plots of mean difference (cm) between MRI or mammography and pathologic size (comparative studies).

Percentage agreement

Eight studies (Balu-Maestro ; Rosen ; Julius ; Yeh ; Akazawa ; Segara ; Nakahara ; Guarneri ) reported percentage agreement between tumour size measured by MRI and pathology within a variety of margins of error based on absolute size (±0, 0.5, 1, 2 and 3 cm) or a percentage of the pathologic measurement (±30 and 50% Supplementary Information Resource 4). One study did not report the margin of error used to calculate agreement (Balu-Maestro ), and two studies reported percentage agreement between MRI and pathology but not the associated percentages of MRI under/overestimation (Julius ; Akazawa ). Studies reporting percentage agreement (plus under/overestimation) for MRI, US and clinical examination by an absolute margin of error are summarised in Figure 5 (no studies reported these data for mammography). As would be expected, percentage agreement between all tests and pathology was observed to be higher for wider margins of error (e.g., ∼20% for exact agreement between MRI and pathologic measurements (Segara ; Guarneri ) vs 92% for±3 cm (Nakahara )). With the exception of one study showing a tendency for overestimation (Rosen ), MRI appeared equally likely to overestimate and underestimate pathologic tumour size across all absolute margins of error. For US and clinical examination, a tendency towards underestimation can be observed in Figure 5, but the majority of estimates showing that bias were contributed by a single study (Segara ).

Figure 5

Percentage agreement, underestimation and overestimation for (

Percentage agreement estimates for MRI based on any margin of error were compared with those of alternative tests in six studies (Supplementary Information Resource 4). All six studies compared MRI and US (Balu-Maestro ; Julius ; Yeh ; Akazawa ; Segara ; Guarneri ); MRI was compared with clinical examination in four studies (Balu-Maestro ; Yeh ; Akazawa ; Segara ) and with mammography in three studies (Balu-Maestro ; Julius ; Yeh ). For all but one study and across the range of reported margins of error, percentage agreement estimates for MRI were higher than those for the comparator tests. In the one exception to this pattern of results, a study reporting multiple margins of error (Segara ) found higher percentage agreement for MRI than for US at margins of ±0 and ±1 cm, but percentage agreement at ±2 cm was slightly higher for US (92%) than that for MRI (88%). In one other study (Guarneri ), the difference in percentage agreement favouring MRI over US was relatively small (20% vs 15% at ±0 cm; 54% vs 51% at ±0.5 cm; and 71% vs 68% at ±1 cm).

Correlation coefficients

Sixteen studies (Weatherall ; Partridge ; Rosen ; Bodini ; Chen ; Londero ; Montemurro ; Akazawa ; Bollet ; Segara ; Bhattacharyya ; Moon ; Prati ; Nakahara ; Wright ; Guarneri ) reported correlations between MRI and pathologic tumour size, and similar correlations for at least one alternative test, either by the Pearson's (N=9) or Spearman's (N=5) method (in two studies (Weatherall ; Partridge ), the method was not specified). The range of correlation coefficients was wide (0.21–0.92), with a median value of 0.70 (Supplementary Informtion Resource 4). Coefficients between 0.20 and 0.39 were reported in two studies, 0.40–0.59 in four studies, 0.60–0.79 in six studies, and 0.80 and above in four studies. One study reported ICC between MRI and pathology (0.48), in addition to Spearman's rank coefficients (Bollet ). Six studies reported correlations with pathology of MRI and mammography (Weatherall ; Bodini ; Londero ; Bollet ; Prati ; Wright ), all of which reported consistently higher correlation coefficients for MRI. However, of the 10 studies that reported correlations with pathology of MRI and clinical examination (Weatherall ; Partridge ; Rosen ; Bodini ; Chen ; Akazawa ; Bollet ; Segara ; Prati ; Wright ), two found correlations favouring the latter test (Prati ; Wright ). Similarly, two (Nakahara ; Guarneri ) of 11 studies that presented correlations for MRI and US with pathology (Weatherall ; Bodini ; Londero ; Montemurro ; Akazawa ; Bollet ; Segara ; Bhattacharyya ; Moon ; Nakahara ; Guarneri ) reported higher correlations for US.

Within-study comparisons of different methods

Six studies (Partridge ; Akazawa ; Segara ; Prati ; Nakahara ; Wright ; Guarneri ) compared the performance of MRI and other tests by more than one method. In four of those, different methods produced results that could potentially lead to inconsistent conclusions regarding agreement, depending on which measure is considered. In two (Prati ; Wright ) of six studies that presented both MDs and correlations, the absolute values of the MD was lower for MRI (⩽0.3 cm) than for clinical examination (1.2 cm), but a higher correlation was observed between clinical examination and pathologic size. The 95% LOA for MRI were wider than for clinical examination, reflecting the lower correlation for MRI. Similarly, in two of three studies presenting MDs and percentage agreement, the methods suggest opposing conclusions. Guarneri found a larger MD and wider LOA for MRI compared with US, but slightly higher percentage agreement, whereas Segara reported the reverse (for agreement within 2 cm only). In addition, the slightly higher percentage agreement for MRI than US reported by Guarneri contrasts with a lower correlation coefficient, and vice verse for Segara (for agreement within 2 cm only).

Discussion

In the neoadjuvant setting, accurate information on the extent of residual malignancy assists in guiding surgical management of breast cancer. We pooled estimates of the MD between residual tumour size measured by MRI and pathology from six studies, and found that on average, MRI had a tendency to slightly overestimate pathologic size after NAC (MD of 0.1 cm; Figure 1). However, the pooled 95% LOA around this estimate suggest that pathologic tumour measurements may lie between −4.2 cm and +4.4 of the MRI measurement, indicating that substantial disagreement may exist. Measurement errors within this range may be of clinical importance in terms of their implications for the choice of treatment approach. Our analysis of the relative performance of MRI and alternative tests focused on studies directly comparing the tests against pathology (Bossuyt and Leeflang, 2008). Although only two studies reported MDs with pathologic measurements for both MRI and US, pooled estimates suggested that the tests had a similar tendency to overestimate pathologic size, with comparable LOA. The tendency to overestimate pathologic size was greater for mammography than MRI (two studies). Although significant heterogeneity was present in clinical examination findings, three of four studies reported the same direction of effect (underestimation) for this test. Pooled MDs showed clinical examination's bias towards underestimation to be greater than MRI's bias for overestimation, and within all four studies the absolute values of MDs were larger for clinical examination. Compared with MRI, wider LOA were observed for both clinical examination and mammography, suggesting that those tests had greater variability in terms of agreement with pathologic measurements. The LOA for all of the alternative tests were large enough to be of potential clinical significance. Previous summaries of the literature about MRI's accuracy in measuring residual tumour size have quoted correlations between MRI and pathology, and the percentage of cases in which MRI agrees with, underestimates, or overestimates pathologic measurements. Overall, correlations were considered to be ‘good' (Lobbes ), and the statistical significance of those correlations was emphasised (Mclaughlin and Hylton, 2011). The methodological limitations of that approach are well documented (Bland and Altman, 1986, 1990). The variable overestimation and underestimation described in those overviews has led others to attach caveats about inaccurate measurement to conclusions about the value of MRI in measuring residual tumour size (Sardanelli ; Mclaughlin and Hylton, 2011). This inconsistency reflects an evidence base which is extensive but disparate in terms of the methods used to assess agreement, and highlights uncertainty about drawing meaningful conclusions from the literature. Pearson's and Spearman's rank correlation coefficients were the most commonly reported statistics in our review (in contrast to MDs and LOA, the more appropriate statistics, yet the least reported). These correlation coefficients, which do not measure agreement (Bland and Altman, 1986), varied widely and were commonly inconsistent with more appropriate measures reported in the same study. Intraclass correlation, which does assess the degree to which a 1 : 1 relationship between measurements exists, was presented for MRI and pathology in just one study and was not reported for comparator tests (Bollet ). The ICC may be an adjunct to the analyses recommended by Bland and Altman (1986), but this statistic alone is also limited in the extent to which it assesses agreement, as it is dependent on the range of observed values and does not separate systematic from random error (Bland and Altman, 1990). The percentage of MRI measurements which ‘agree' with pathology within a ‘margin of error' may provide useful information to supplement MDs and LOA. However, the studies in our review varied considerably in the tolerated discrepancy between measures which was used to define ‘agreement', reflecting the somewhat arbitrary nature of an ‘acceptable' error. Furthermore, studies differed in the methods of calculating that discrepancy (i.e., absolute or relative differences), and accompanying percentages of under- or overestimation by MRI were not universally reported. This lack of consistency between studies renders the body of evidence difficult to interpret; future studies can facilitate comparability by reporting agreement, under- and overestimation for multiple margins of error, starting with exact agreement and increasing at 1 cm increments. In contrast to our pooled analysis of MDs showing that MRI has a tendency to slightly overestimate pathologic size, studies describing an absolute margin of error suggested that MRI was equally likely to under- and overestimate the pathologic measurement, highlighting that this method may obscure small measurement biases. Studies of the agreement between imaging and pathologic size have inherent limitations. Although pathology is considered to be the ‘gold standard', a variety of potential errors in pathologic measurement have been identified (Lagios, 2005; Provencher ; Tucker, 2012), meaning that discrepancies with pathology may occur even when residual tumour size is accurately assessed before surgery. For example, pathologic diameters are likely to be overestimated when measured from a combination of tumour fragments, or excised and re-excised specimens (Lagios, 2005). There may also be errors in orientating intact specimens so that tumour diameters on imaging and pathology are measured in the same plane (Provencher ), particularly if three-dimensional imaging data are unavailable to the pathologist (Weatherall ; Tucker, 2012); this could result in pathologic measurements underestimating the longest diameter for irregularly shaped tumours (Lagios, 2005). There also exists the possibility that the process of removal, preparation or measurement of the pathologic specimen may shrink, expand or otherwise distort tumour dimensions (Pritt and Weaver, 2005; Pritt ; Behjatnia ; Provencher ). Furthermore, the inclusion or exclusion of residual DCIS in pathologic measurements has the potential to affect estimates of agreement. Pooled MDs between pathology and MRI or alternative tests (and the associated LOA) must therefore be interpreted with awareness of these issues. However, if errors in the pathologic measurement are random and do not favour MRI over the comparators (or vice versa), these estimates allow for valid comparisons (Glasziou ). Although this assumption may be reasonable when MRI, comparator tests and pathology are undertaken in the same patients, four (Partridge ; Segara ; Prati ; Wright ) of six studies reporting MDs excluded patients from one (or more) testing group(s), with discrepancies ranging from a single patient (2%) to up to 26% of patients with MRI data being excluded from analyses of comparator tests (Supplementary Information Resource 4). Furthermore, differences in test performance may be observed if tumour size is estimated better (or more poorly) in patients selected to (or excluded from) a particular testing group. Authors should be encouraged to present data which allows agreement to be assessed for patients unique to particular analyses vs those common to all testing groups. In addition, these issues also highlight the importance of study authors clearly describing the characteristics of patients excluded from particular analyses. The presentation of important study design characteristics in included studies was generally suboptimal, but in particular, reporting of study withdrawals or exclusions (when they did occur) was poor (Supplementary Information Resource 2). An important consideration in the interpretation of pooled MD and LOA estimates is that they may be misleading if the difference between tests is systematically related to underlying tumour size, or if the differences are not normally distributed (Bland and Altman, 1986). Plots of the differences by their mean allow for any underlying relationships to be assessed, but were presented in only half of the studies reporting MDs (Partridge ; Segara ; Wright ). Examination of the plots presented in these studies suggests the possibility that the difference in pathology and MRI (or alternative tests) may be greater for larger tumour sizes. Careful attention should be given to graphical presentation of the data before calculating MDs, and data transformation should be considered when systematic relationships exist (Bland and Altman, 1986). A possible limitation of our analysis is that many studies were not recent, and consequently newer neoadjuvant treatments, including taxanes and trastuzumab, were used in only a minority of patients (Table 1). Agreement between MRI and pathology may vary because of different patterns of tumour regression between taxane-based and non-taxane-based NAC; contrary to previous findings suggesting underestimation when taxanes are used (Denis ), MDs in studies that used predominantly taxane-based NAC (Wright ; Guarneri ) suggest overestimation by MRI relative to studies using non-taxane-based regimens (Segara ; Prati ; Supplementary Information Resource 4). Increased rates of pCR owing to modern regimens may also potentially affect MD and LOA estimates, but examination of this issue was not possible owing to the small number of studies reporting those outcomes. In summary, our meta-analysis is the first to explore and summarise the evidence on agreement between MRI and pathologic tumour measurements after NAC, and to highlight methodological issues which, to date, have precluded conclusions being drawn from the literature. Our work suggests a tendency for MRI to slightly overestimate pathologic tumour size measurements, but LOA are large enough to be of potential clinical importance. Few studies compared MDs between tests and pathology, but the performance of US appeared to be comparable to that of MRI; poorer agreement was observed for mammography and clinical examination. Although a large number of studies have addressed these questions, most studies have reported Pearson's or Spearman's correlation coefficients. Those measures are inappropriate for assessing agreement, and have contributed to uncertainty about MRI's potential role. Further studies are warranted, and adopt the Bland–Altman approach to assessing MRI's agreement with pathology, and which also assess the agreement with pathology of alternative tests; in addition, we have recommended methods of data presentation to assess the validity of comparisons between tests. Percentages of agreement and associated under/overestimation have limitations, but may provide useful data to supplement Bland–Altman analyses. Similarly, ICCs may also supplement these analyses, but Pearson's and Spearman's correlations should be avoided.

41 in total

1. Breast cancer: comparative effectiveness of positron emission mammography and MR imaging in presurgical planning for the ipsilateral breast.

Authors: Wendie A Berg; Kathleen S Madsen; Kathy Schilling; Marie Tartar; Etta D Pisano; Linda Hovanessian Larsen; Deepa Narayanan; Al Ozonoff; Joel P Miller; Judith E Kalinyak
Journal: Radiology Date: 2010-11-12 Impact factor: 11.105

2. Statistical methods for assessing agreement between two methods of clinical measurement.

Authors: J M Bland; D G Altman
Journal: Lancet Date: 1986-02-08 Impact factor: 79.321

3. MRI and conservative treatment of locally advanced breast cancer.

Authors: T Julius; S E G Kemp; P J Kneeshaw; A Chaturvedi; P J Drew; L W Turnbull
Journal: Eur J Surg Oncol Date: 2005-12 Impact factor: 4.424

4. Optimal assessment of residual disease after neo-adjuvant therapy for locally advanced and inflammatory breast cancer--clinical examination, mammography, or magnetic resonance imaging?

Authors: F C Wright; J Zubovits; S Gardner; B Fitzgerald; M Clemons; M L Quan; P Causer
Journal: J Surg Oncol Date: 2010-06-01 Impact factor: 3.454

5. MR and US imaging for breast cancer patients who underwent conservation surgery after neoadjuvant chemotherapy: comparison of triple negative breast cancer and other intrinsic subtypes.

Authors: Hiroshi Nakahara; Yukiko Yasuda; Eiichiro Machida; Yorio Maeda; Hidemi Furusawa; Kansei Komaki; Mayumi Funagayama; Mayumi Nakahara; Shozo Tamura; Futoshi Akiyama
Journal: Breast Cancer Date: 2010-11-18 Impact factor: 4.239

6. Quality of life after breast conservation or mastectomy: a systematic review.

Authors: L Irwig; A Bennetts
Journal: Aust N Z J Surg Date: 1997-11

7. Age and HER2 expression status affect MRI accuracy in predicting residual tumor extent after neo-adjuvant systemic treatment.

Authors: H-G Moon; W Han; J W Lee; E Ko; E-K Kim; J-H Yu; S Y Kang; W K Moon; N Cho; I-A Park; D-Y Oh; S-W Han; S-A Im; D-Y Noh
Journal: Ann Oncol Date: 2009-01-29 Impact factor: 32.976

8. Magnetic resonance imaging in comparison to clinical palpation in assessing the response of breast cancer to epirubicin primary chemotherapy.

Authors: Maria Bodini; Alfredo Berruti; Alberto Bottini; Giovanni Allevi; Carla Fiorentino; Maria Pia Brizzi; Alessandra Bersiga; Daniele Generali; Davide Volpi; Ugo Marini; Sergio Aguggini; Marco Tampellini; Palmiro Alquati; Lucio Olivetti; Luigi Dogliotti
Journal: Breast Cancer Res Treat Date: 2004-06 Impact factor: 4.872

9. Does breast cancer tumor size really matter that much?

Authors: Louise Provencher; Caroline Diorio; Jean-Charles Hogue; Catherine Doyle; Simon Jacob
Journal: Breast Date: 2012-07-25 Impact factor: 4.380

10. Accuracy of clinical evaluation of locally advanced breast cancer in patients receiving neoadjuvant chemotherapy.

Authors: Raquel Prati; Christina A Minami; Jeff A Gornbein; Nanette Debruhl; Debbie Chung; Helena R Chang
Journal: Cancer Date: 2009-03-15 Impact factor: 6.860

17 in total

1. Image Registration for Microwave Tomography of the Breast Using Priors From Nonsimultaneous Previous Magnetic Resonance Images.

Authors: Gregory Boverman; Cynthia E L Davis; Shireen D Geimer; Paul M Meaney
Journal: IEEE J Electromagn RF Microw Med Biol Date: 2017-12-27

2. Breast magnetic resonance imaging: are those who need it getting it?

Authors: S Tan; J David; L Lalonde; M El Khoury; M Labelle; R Younan; E Patocskai; J Richard; I Trop
Journal: Curr Oncol Date: 2017-06-27 Impact factor: 3.677

3. Automated Semi-Quantitative Analysis of Breast MRI: Potential Imaging Biomarker for the Prediction of Tissue Response to Neoadjuvant Chemotherapy.

Authors: Matthias Dietzel; Clemens Kaiser; Katja Pinker; Evelyn Wenkel; Matthias Hammon; Michael Uder; Barbara Bennani Baiti; Paola Clauser; Rüdiger Schulz-Wendtland; Pascal Baltzer
Journal: Breast Care (Basel) Date: 2017-08-29 Impact factor: 2.860

Review 4. Meta-analysis of pre-operative magnetic resonance imaging (MRI) and surgical treatment for breast cancer.

Authors: Nehmat Houssami; Robin M Turner; Monica Morrow
Journal: Breast Cancer Res Treat Date: 2017-06-06 Impact factor: 4.872

5. Prediction of Pathologic Complete Response in Breast Cancer Patients Comparing Magnetic Resonance Imaging with Ultrasound in Neoadjuvant Setting.

Authors: Frederik Knude Palshof; Charlotte Lanng; Niels Kroman; Cemil Benian; Ilse Vejborg; Anne Bak; Maj-Lis Talman; Eva Balslev; Tove Filtenborg Tvedskov
Journal: Ann Surg Oncol Date: 2021-05-27 Impact factor: 5.344

Review 6. Breast MRI: EUSOBI recommendations for women's information.

Authors: Ritse M Mann; Corinne Balleyguier; Pascal A Baltzer; Ulrich Bick; Catherine Colin; Eleanor Cornford; Andrew Evans; Eva Fallenberg; Gabor Forrai; Michael H Fuchsjäger; Fiona J Gilbert; Thomas H Helbich; Sylvia H Heywang-Köbrunner; Julia Camps-Herrero; Christiane K Kuhl; Laura Martincich; Federica Pediconi; Pietro Panizza; Luis J Pina; Ruud M Pijnappel; Katja Pinker-Domenig; Per Skaane; Francesco Sardanelli
Journal: Eur Radiol Date: 2015-05-23 Impact factor: 5.315

Review 7. Clinical Breast MR Using MRS or DWI: Who Is the Winner?

Authors: Francesco Sardanelli; Luca Alessandro Carbonaro; Stefania Montemezzi; Carlo Cavedon; Rubina Manuela Trimboli
Journal: Front Oncol Date: 2016-10-28 Impact factor: 6.244

8. Direct comparison of PET/CT and MRI to predict the pathological response to neoadjuvant chemotherapy in breast cancer: a meta-analysis.

Authors: Lihua Chen; Qifang Yang; Jing Bao; Daihong Liu; Xuequan Huang; Jian Wang
Journal: Sci Rep Date: 2017-08-16 Impact factor: 4.379

9. Agreement between MRI and pathologic breast tumor size after neoadjuvant chemotherapy, and comparison with alternative tests: individual patient data meta-analysis.

Authors: Michael L Marinovich; Petra Macaskill; Les Irwig; Francesco Sardanelli; Eleftherios Mamounas; Gunter von Minckwitz; Valentina Guarneri; Savannah C Partridge; Frances C Wright; Jae Hyuck Choi; Madhumita Bhattacharyya; Laura Martincich; Eren Yeh; Viviana Londero; Nehmat Houssami
Journal: BMC Cancer Date: 2015-10-08 Impact factor: 4.430

10. Thermal tomography for monitoring tumor response to neoadjuvant chemotherapy in women with locally advanced breast cancer.

Authors: Qi Wu; Juanjuan Li; Si Sun; Xiaoli Yao; Shan Zhu; Juan Wu; Qian Liu; Xiaojun Ding; Manman Shi; Kaiyang Li; Shengrong Sun
Journal: Oncotarget Date: 2017-03-25