Literature DB >> 27334105

Evaluation of 12 strategies for obtaining second opinions to improve interpretation of breast histopathology: simulation study.

Joann G Elmore¹, Anna Na Tosteson², Margaret S Pepe³, Gary M Longton⁴, Heidi D Nelson⁵, Berta Geller⁶, Patricia A Carney⁷, Tracy Onega⁸, Kimberly H Allison⁹, Sara L Jackson¹⁰, Donald L Weaver¹¹.

Abstract

OBJECTIVE: To evaluate the potential effect of second opinions on improving the accuracy of diagnostic interpretation of breast histopathology.
DESIGN: Simulation study.
SETTING: 12 different strategies for acquiring independent second opinions. PARTICIPANTS: Interpretations of 240 breast biopsy specimens by 115 pathologists, one slide for each case, compared with reference diagnoses derived by expert consensus. MAIN OUTCOME MEASURES: Misclassification rates for individual pathologists and for 12 simulated strategies for second opinions. Simulations compared accuracy of diagnoses from single pathologists with that of diagnoses based on pairing interpretations from first and second independent pathologists, where resolution of disagreements was by an independent third pathologist. 12 strategies were evaluated in which acquisition of second opinions depended on initial diagnoses, assessment of case difficulty or borderline characteristics, pathologists' clinical volumes, or whether a second opinion was required by policy or desired by the pathologists. The 240 cases included benign without atypia (10% non-proliferative, 20% proliferative without atypia), atypia (30%), ductal carcinoma in situ (DCIS, 30%), and invasive cancer (10%). Overall misclassification rates and agreement statistics depended on the composition of the test set, which included a higher prevalence of difficult cases than in typical practice.
RESULTS: Misclassification rates significantly decreased (P<0.001) with all second opinion strategies except for the strategy limiting second opinions only to cases of invasive cancer. The overall misclassification rate decreased from 24.7% to 18.1% when all cases received second opinions (P<0.001). Obtaining both first and second opinions from pathologists with a high volume (≥10 breast biopsy specimens weekly) resulted in the lowest misclassification rate in this test set (14.3%, 95% confidence interval 10.9% to 18.0%). Obtaining second opinions only for cases with initial interpretations of atypia, DCIS, or invasive cancer decreased the over-interpretation of benign cases without atypia from 12.9% to 6.0%. Atypia cases had the highest misclassification rate after single interpretation (52.2%), remaining at more than 34% in all second opinion scenarios.
CONCLUSION: Second opinions can statistically significantly improve diagnostic agreement for pathologists' interpretations of breast biopsy specimens; however, variability in diagnosis will not be completely eliminated, especially for breast specimens with atypia. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27334105 PMCID： PMC4916777 DOI： 10.1136/bmj.i3069

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Attention to diagnostic errors in the medical literature and mass media has led many to consider obtaining second opinions to prevent errors and improve quality.1 2 3 For example, obtaining second opinions, such as double reading of screening mammograms, has been associated with improved detection rates for cancer.4 5 Obtaining a second opinion is a strategy commonly suggested to improve diagnostic accuracy in breast disease.6 7 8 Interpretation of breast pathology is notoriously difficult and rates of disagreement between pathologists are high, especially in cases of atypia (eg, atypical ductal hyperplasia, ADH) and ductal carcinoma in situ (DCIS).6 9 10 11 A survey of laboratories in the United States noted that 6.6% of all histopathology cases were reviewed before sign out, suggesting second opinions are often obtained in clinical practice, especially in challenging areas such as breast disease.12 Guidelines have also been published for obtaining second opinions in pathology to prevent medical errors,13 and approximately two thirds of US pathology laboratories have policies, with most requiring a second review of new diagnoses of invasive cancer.12 Criteria for when and how to obtain second opinions in breast disease, however, vary considerably.12 14 In our previous study of 252 pathologists, 81% reported requesting second opinions in the absence of institutional policy for at least some of their breast samples, and 96% believed that second opinions improved their diagnostic accuracy.14 Other studies also suggest possible improvements in patient outcomes. For example, in one study, a second review of 405 node negative cases of breast cancer resulted in substantial modifications in treatments,15 potentially decreasing unnecessary interventions and subsequent costs.8 Despite this strong endorsement by practicing pathologists, and multiple studies noting that more than 10% of breast biopsy specimens have important changes in histopathology diagnoses after review,15 16 17 18 19 20 the best guidelines for when to obtain second opinions in pathology are unknown. Comparing the impact of different strategies for obtaining a second opinion on accuracy needs to be evaluated. Many potential strategies exist, ranging from review of every biopsy specimen by multiple pathologists, to obtaining second opinions only for specific case categories (eg, cases interpreted initially as invasive breast cancer) or only from different pathology situations (eg, pathologists who see a high volume of cases weekly (10 or more) versus a low volume). We compared the effect of different criteria for triggering procurement of second opinions on the accuracy of interpretation of breast disease. We designed our study to assess improvements in accuracy in a controlled test situation using data from 6900 individual interpretations by 115 pathologists. We evaluated 12 strategies with different criteria for obtaining second opinions and compared how each approach may affect over-interpretation and under-interpretation rates relative to reference diagnoses.

Methods

Test set cases and consensus reference diagnoses

We used data from the Breast Pathology Study (B-Path), a national study on the accuracy of interpretation of breast tissue.6 21 The 240 biopsy cases were divided into four test sets of 60 cases.6 21 Breast biopsy specimens were selected from two state registries (NH, VT), which are part of the Breast Cancer Surveillance Consortium sponsored by the National Cancer Institute.22 Case selection was stratified by age (49% aged 40-49 years, 51% aged ≥50 years), breast density (51% with heterogeneously or extremely dense breast tissue based on mammography findings), and biopsy type (58% core needle, 42% excisional). Three pathologists experienced in breast disease interpreted each case independently before arriving at a consensus reference diagnosis using a modified Delphi approach.23 Their diagnoses were categorized using the breast pathology assessment tool and hierarchy for diagnosis (BPATH-Dx).6 9 This tool incorporated 14 distinct diagnostic assessments into four main BPATH-Dx categories: benign without atypia (including non-proliferative and proliferative without atypia), atypia (eg, atypical ductal hyperplasia), DCIS, and invasive carcinoma. To improve the statistical precision of accuracy estimates we oversampled cases of proliferative without atypia, atypia, and DCIS. Of the final 240 cases, 72 were benign without atypia (24 non-proliferative and 48 proliferative without atypia), 72 were atypia, 73 were DCIS, and 23 were invasive breast cancer based on the reference consensus diagnoses.21 We randomly assigned the cases within each diagnostic category to the four test sets using stratification to achieve balance on patient age, breast density, and the reference panelists’ difficulty rating.

Participating pathologists

Pathologists from eight US states (Alaska, Maine, Minnesota, New Hampshire, New Mexico, Oregon, Vermont, and Washington) were invited to participate. Details of their identification and recruitment have been described elsewhere.6 24 Pathologists were eligible if they had interpreted breast biopsy specimens in the past year, planned to continue for the next year, and were not residents or fellows in training. A web based survey queried participants about personal characteristics, clinical practices, and interpretive experience.14 24 We randomly assigned pathologists to independently interpret one of the four test sets of 60 cases (each case was represented by a single glass slide), and they recorded their interpretations using the online BPATH-Dx tool.6 9 Participants also indicated whether the case was borderline between two diagnoses and whether they would obtain a second opinion in their usual clinical practice because of laboratory policies, because it was personally desired, or both. A six point Likert scale was used to assess each case for perceived diagnostic difficulty, with results summarized as a binary variable (difficult cases rated as 4, 5, or 6).

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for recruitment, design, or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to the relevant patient community.

Definitions of interpretations and second opinion strategies

The results of single (initial) interpretations, based on the categorical interpretation by each participating pathologist of each case, have been reported previously.6 We defined interpretations that incorporated second opinions by considering each possible pair of pathologists interpreting the same case and, when disagreement occurred, included resolution by using a third, independent interpretation. Resolution was defined by assigning the case to the BPATH-Dx diagnosis category identified by two of the three pathologists or, if all three disagreed, assigning the middle diagnosis (fig 1).

Fig 1 Algorithm for determination of final biopsy interpretation used in the evaluation of different second opinion policy strategies. Up to three pathologists may be needed to obtain a final interpretation. Data are comprised of 5 145 480 observations each involving three independent pathologist interpretations of a single slide from a breast biopsy specimen and are derived from 115 single pathologists interpreting 60 cases each in four test sets We evaluated 12 strategies for obtaining a second opinion (see box 1), beginning with the strategy where all cases received second opinions. We then evaluated eight selective strategies for obtaining a second opinion, which were determined by criteria based on the initial pathologist’s diagnosis (that is, second opinions were obtained only for cases initially diagnosed as atypia, DCIS, or invasive; DCIS or invasive; or invasive only), determined by the initial pathologist’s assessment of the case (that is, only for cases marked borderline or considered difficult), and determined by whether a second opinion would be required by policy or desired by the pathologist (that is, only for cases where second opinions were required; desired; required or desired). Second opinion applied to all breast biopsy specimens Second opinion obtained only if initial interpretation is atypia, ductal carcinoma in situ (DCIS), or invasive carcinoma Second opinion obtained only if initial interpretation is DCIS or invasive carcinoma Second opinion obtained only if initial interpretation is invasive carcinoma Second opinion obtained only if initial interpretation considered borderline Second opinion obtained only if initial interpretation considered difficult Second opinion obtained only if initial pathologist desired a second opinion Second opinion obtained only if initial diagnosis would prompt a second opinion owing to policy requirements Second opinion obtained only if initial pathologist desired a second opinion or if required by policy Evaluation restricted to interpretations from initial pathologists who had less clinical experience in breast pathology as they report a low volume of cases weekly (<10); second opinion obtained on all cases from another low volume pathologist, with third brought in for any disagreement from a high volume pathologist Evaluation restricted to interpretations from initial low volume pathologists: second opinion obtained on all cases, with second and third opinion from high volume pathologists Evaluation restricted to interpretations from initial pathologists with high volumes: second opinion obtained on all cases, with second and third opinions from high volume pathologists Finally, we assessed how the clinical volume of the interpreting pathologist affected diagnoses in three strategies with designs shaped by previous findings from the B-Path study.6 These strategies included combinations of low volume and high volume pathologists providing first, second, and third opinions, as needed, to resolve discordant diagnoses. We defined low to average volume as dealing with fewer than 10 women with breast biopsies per week and high volume as dealing with 10 biopsies or more per week.

Statistical analyses

We assessed rates of over-interpretation, under-interpretation, and overall misclassification compared with the expert consensus diagnosis as reference. Over-interpretation was defined as cases classified by participants at a hierarchically more severe diagnostic BPATH-Dx category compared with the reference diagnosis category, under-interpretation was defined as cases classified lower than the reference diagnosis category, and misclassification was defined as cases either over-interpreted or under-interpreted compared with the reference diagnosis category. To simulate interpretations that involved obtaining second opinions, we combined the independent interpretations of the study pathologists. For each case, we created an ordered data record of interpretations for every three pathologists who interpreted the case and used the majority or middle interpretation as their final assessment (fig 1). This is analytically equivalent to using the assessment of the first two pathologists if they agree and using the third pathologist for resolution if they disagree. The advantage of creating data records in this manner, resulting in 5 145 480 triple reader data records, is that the correct relative weighting is provided to interpretations of cases where the first two pathologist interpretations agree versus where they do not agree, while allowing data to be included for all potential third readers rather than picking one third reader at random from those available when a third reading was required. The 5 145 480 data records resulted from 29 pathologists interpreting the 60 cases in test set A, 27 for test set B, 30 for test set C, and 29 for test set D, yielding a total of 60×(29×28×27+27×26×25+30×29×28+29×28×27)=5 145 480 triple interpretations. Note that for each case the number of simulated triple readings is n×(n−1)×(n−2), where n is the number of pathologists who interpreted the case. Figure 1 shows how the triple records were used in conjunction with different criteria for procuring second opinions to arrive at final assessments. We compared the final assessments with reference diagnoses to calculate rates of over-interpretation, under-interpretation, and overall misclassification. In calculating confidence intervals for the over-interpretation, under-interpretation, and overall misclassification rates, we used centiles of the bootstrap distribution of each rate where resampling of pathologists was performed 1000 times. We discarded from the bootstrapped estimates second opinion interpretations that included the same pathologist for second or third interpretations as the first pathologist. P values for the Wald test of a difference in rates between the single pathologist and second opinion strategies were based on the bootstrap standard error of the difference in rates. To calculate κ statistics and rates of agreement between single interpretations we used a simple cross tabulation of all pairwise interpretations of the same cases. The computational burden of analogous calculations for the 5 145 480 assessments involving second opinions was not tenable so we paired each triple reading of a case with a random permutation of the triple readings of the same case, and, after excluding pairs where the same reader was included in both sets of triples, we calculated agreement statistics. We replicated this procedure 1000 times and report average agreement and κ statistics. All analyses were conducted using Stata statistical software, version 13.

Results

Table 1 lists the characteristics of the participating pathologists. Forty of 115 pathologists interpreted 10 breast specimens or more weekly and were defined as high volume pathologists. These pathologists were more likely to report spending greater proportions of their clinical time interpreting breast specimens and report that their peers considered them experts in breast pathology.

Table 1

Characteristics of participating pathologists (n=115) by reported weekly volume of breast biopsy specimens

Characteristics	Total No (%)	No (%)		P value‡
		Reported weekly breast caseload
		Low volume*	High volume†
Total sample	115 (100)	75 (100)	40 (100)
Age at survey (years):
33-39	16 (14)	11 (15)	5 (13)	0.98
40-49	41 (36)	27 (36)	14 (35)
50-59	42 (37)	25 (33)	17 (43)
≥60	16 (14)	12 (16)	4 (10)
Women	46 (40)	28 (37)	18 (45)	0.42
Men	69 (60)	47 (63)	22 (55)
Experience of breast pathology
Fellowship training in breast pathology:
No	109 (95)	71 (95)	38 (95)	0.94
Yes	6 (5)	4 (5)	2 (5)
Affiliation with academic medical center:
No	87 (76)	59 (79)	28 (70)	0.50
Adjunct or affiliated	17 (15)	9 (12)	8 (20)
Primary appointment	11 (10)	7 (9)	4 (10)
Breast pathology experience (years):
0-4	22 (19)	17 (23)	5 (13)	0.45
5-9	23 (20)	14 (19)	9 (23)
10-19	34 (30)	21 (28)	13 (33)
≥20	36 (31)	23 (31)	13 (33)
Breast specimens as proportion of total clinical case load (% of total clinical work interpreting breast tissue):
0-9	59 (51)	51 (68)	8 (20)	<0.001
10-24	45 (39)	23 (31)	22 (55)
25-49	8 (7)	1 (1)	7 (18)
≥50	3 (3)	0 (0)	3 (8)
Considered an expert in breast pathology by colleagues:
No	90 (78)	67 (89)	23 (58)	<0.001
Yes	25 (22)	8 (11)	17 (43)

Some column percentages do not sum to 100% owing to rounding.

*<10 breast biopsy specimens weekly.

†≥10 breast biopsy specimens weekly.

‡Based on Wilcoxon rank sum test for difference in age, experience of breast pathology, and composition of breast specimen of total caseload between low volume and high volume caseload groups. Otherwise P values correspond to a Pearson χ2 test for difference between caseload groups.

Characteristics of participating pathologists (n=115) by reported weekly volume of breast biopsy specimens Some column percentages do not sum to 100% owing to rounding. *<10 breast biopsy specimens weekly. †≥10 breast biopsy specimens weekly. ‡Based on Wilcoxon rank sum test for difference in age, experience of breast pathology, and composition of breast specimen of total caseload between low volume and high volume caseload groups. Otherwise P values correspond to a Pearson χ2 test for difference between caseload groups. Among the entire 6900 initial test case interpretations, the pathologists reported that they desired second opinions for 35% (n=2451). Figure 2 shows results by pathologists’ diagnosis of the case. The highest rate of pathologists desiring second opinions was for cases interpreted as atypia (48.4% desired only and 17.7% policy and desired). When second opinions were desired for specific cases, pathologists noted that 71% (1731/2451) would not be required by laboratory policies in their own clinical practices.

Fig 2 Percentage of individual case assessments in which a second opinion was desired or would be required by policy in pathologist’s clinical practice, or both, shown by first `epathologists’ diagnosis of test case (n=115 pathologists, n=6900 individual case assessments). DCIS=ductal carcinoma in situ Table 2 shows the rates of agreement with the reference diagnosis after single interpretations and under different strategies for obtaining a second opinion based on characteristics of the initial interpretation. Table 2 also shows for each strategy, the percentage of cases requiring a second pathologist and requiring a third pathologist for resolution of differences between the first two pathologists. The highest misclassification rate within diagnostic categories after single interpretation was for cases of atypia (52.2%), followed by DCIS (15.9%), benign without atypia (12.9%), and invasive carcinoma (3.9%).6

Table 2

Over-interpretation, under-interpretation, and misclassification rates of single interpretation compared with reference consensus standard and nine second opinion strategies

Strategies	Rate (%)				Overall (95% CI)	P value*
	Reference consensus diagnosis
	Benign	Atypia	DCIS	Invasive
Single interpretation and second opinion applied to all cases
Single interpretation:
% requiring 2nd opinion	0.0	0.0	0.0	0.0	0.0
Over-interpretation	12.9	17.4	2.6	–	9.9 (9.0 to 10.8)
Under-interpretation	–	34.7	13.3	3.9	14.8 (13.8 to 15.9)
Misclassification	12.9	52.2	15.9	3.9	24.7 (23.6 to 25.8)	n/a
1. Second opinion with resolution applied to all cases
% requiring 2nd opinion	100.0	100.0	100.0	100.0	100.0
Requiring 3rd opinion	19.7	55.9	21.6	3.7	29.6
Over-interpretation	8.4	11.1	0.6	–	6.0 (4.7 to 7.5)
Under-interpretation	–	29.9	9.3	3.5	12.1 (10.0 to 14.3)
Misclassification	8.4	40.9	9.9	3.5	18.1 (16.1 to 20.0)	P<0.001
Criterion for obtaining second opinion based on initial diagnosis
2. Second opinion only for initial interpretations considered atypia or DCIS or invasive:
% requiring 2nd opinion	12.9	65.3	93.7	99.6	61.5
% requiring 3rd opinion	10.4	36.5	17.8	3.3	19.8
Over-interpretation	6.0	10.0	0.6	–	5.0 (3.9 to 6.3)
Under-interpretation	–	41.8	11.4	3.9	16.4 (14.4 to 18.4)
Misclassification	6.0	51.9	12.1	3.9	21.4 (19.5 to 23.2)	P<0.001
3. Second opinion only for initial interpretations considered DCIS or invasive:
% requiring 2nd opinion	3.2	17.4	86.7	99.6	42.1
% requiring 3rd opinion	2.9	13.1	12.2	3.3	8.8
Over-interpretation	11.3	7.9	0.6	–	5.9 (4.9 to 7.1)
Under-interpretation	–	35.9	15.2	3.9	15.8 (13.8 to 17.6)
Misclassification	11.3	43.7	15.8	3.9	21.7 (19.8 to 23.5)	P<0.001
4. Second opinion only for initial interpretations considered invasive:
% requiring 2nd opinion	1.0	0.4	2.6	96.1	10.4
% requiring 3rd opinion	0.8	0.3	2.4	1.9	1.2
Over-interpretation	12.5	17.5	0.4	–	9.1 (7.7 to 10.6)
Under-interpretation	–	34.4	13.2	4.7	14.8 (13.0 to 16.5)
Misclassification	12.5	51.9	13.6	4.7	23.9 (22.1 to 25.7)	P=0.25
Second opinion only obtained for cases considered borderline or difficult
5. Second opinion obtained only for initial interpretations considered borderline:
% requiring 2nd opinion	19.0	45.3	21.4	3.5	26.1
% requiring 3rd opinion	7.5	25.6	9.3	0.8	12.8
Over-interpretation	10.0	14.2	1.5	–	7.7 (6.3 to 9.2)
Under-interpretation	–	34.6	9.9	3.8	13.8 (11.8 to 15.7)
Misclassification	10.0	48.9	11.5	3.8	21.5 (19.5 to 23.3)	P<0.001
6. Second opinion obtained only for initial interpretations considered difficult:
% requiring 2nd opinion	23.2	48.2	24.8	11.1	30.0
% requiring 3rd opinion	9.0	27.3	9.8	1.5	14.0
Over-interpretation %	9.2	13.8	1.6	–	7.4 (6.0 to 8.8)
Under-interpretation %	–	34.3	10.1	3.3	13.7 (11.8 to 15.5)
Misclassification	9.2	48.1	11.7	3.3	21.1 (19.3 to 22.8)	P<0.001
Second opinion only obtained for cases when desired or required by policy, or both
7. Second opinion only for cases when desired by pathologist:
% requiring 2nd opinion	26.7	55.9	30.5	15.5	35.5
% requiring 3rd opinion	10.1	31.3	11.2	1.5	16.0
Over-interpretation	9.2	14.0	1.5	–	7.4 (5.8 to 9.2)
Under-interpretation	–	33.7	9.9	3.6	13.4 (11.4 to 15.6)
Misclassification	9.2	47.7	11.4	3.6	20.9 (18.9 to 22.8)	P<0.001
8. Second opinion only when required by policy:
% requiring 2nd opinion	33.8	40.6	55.3	59.9	44.9
% requiring 3rd opinion	7.6	23.0	10.0	2.1	12.4
Over-interpretation %	10.8	13.2	1.6	–	7.7 (6.3 to 9.1)
Under-interpretation %	–	33.9	12.1	4.0	14.2 (12.3 to 16.1)
Misclassification	10.8	47.1	13.7	4.0	21.9 (20.0 to 23.7)	P<0.001
9. Second opinion only when desired or required by policy:
% requiring 2nd opinion	54.0	80.4	75.5	69.8	70.0
% requiring 3rd opinion	14.9	45.3	18.0	3.0	23.8
Over-interpretation %	8.1	11.4	0.9	–	6.1 (4.8 to 7.5)
Under-interpretation %	–	32.8	9.5	3.7	13.1 (11.1 to 15.2)
Misclassification	8.1	44.3	10.3	3.7	19.2 (17.3 to 21.0)	P<0.001

*Based on Wald test for difference in overall misclassification rates between second opinion strategy and single pathologist interpretation. Test statistic uses bootstrap standard error of difference in rates.

Over-interpretation, under-interpretation, and misclassification rates of single interpretation compared with reference consensus standard and nine second opinion strategies *Based on Wald test for difference in overall misclassification rates between second opinion strategy and single pathologist interpretation. Test statistic uses bootstrap standard error of difference in rates. The overall misclassification rate for a single interpretation (24.7%, 95% confidence interval 23.6% to 25.8%) was used to compare performance of the different second opinion strategies. Among the strategies described in table 2, the lowest overall misclassification rate resulted when second opinions were obtained for all cases. In this strategy the rates for over-interpretation decreased from 9.9% to 6.0%, for under-interpretation from 14.8% to 12.1%, and for overall misclassification from 24.7% to 18.1%. The percentage of assessments requiring a third opinion for resolution of the diagnosis ranged from 3.7% for invasive carcinoma to 55.9% for atypia. The fraction of assessments where all three readers disagreed was small: 5.1% overall, 2.0% for cases in the benign without atypia reference category, 12.0% for cases of atypia, 3.1% for DCIS, and 0.1% for invasive carcinoma. Overall misclassification rates compared with the expert consensus reference diagnosis after implementing all of the remaining strategies (table 2) ranged from 19.2% (where second opinion was obtained when desired or required by policy) to 23.9% (where a second opinion was obtained only for cases considered invasive). The only strategy that did not show a statistically significant improvement relative to single reading was obtaining a second opinion exclusively for cases with initial interpretations of invasive breast carcinoma; the overall misclassification rate was reduced from 24.7% on initial interpretation to 23.9% (22.1% to 25.7%, P=0.25). In that scenario, only 10.4% of interpretations required a second opinion, and few (0.4-2.6%) were from reference diagnostic categories other than invasive carcinoma. Misclassification rates for single readings were lower for cases that were not classified as borderline, difficult, or needing a second opinion (fig 3); however, the misclassification rates for these cases were also reduced when a second opinion was obtained, albeit the improvement was more noticeable for cases that were classified as borderline, difficult, or needing a second opinion.

Fig 3 Percentage of cases misclassified based on whether initial pathologist indicated case was borderline, difficult, or would have obtained a second opinion (either desired or because of policy at his or her laboratory). Results are shown for single interpretations and after a second opinion strategy is applied to these cases. (A) Indicated the case was borderline between two diagnoses (26% of 6900 single interpretations) compared with not borderline (74% of 6900 interpretations). (B) Indicated case was difficult (30% of 6900 interpretations) compared with not difficult (70% of 6900 interpretations). (C) Policy or desired second opinion (70% of 6900 interpretations) compared with no policy and no desire for a second opinion (30% of 6900 interpretations) The second opinion strategies in table 3 were evaluated separately for initial pathologists with a low weekly breast pathology volume and for initial pathologists with a high weekly volume. The overall misclassification rate for single interpretations by pathologists with low weekly volumes was 26.4% compared with 21.5% for those by pathologists with high weekly volumes. The second opinion strategies all showed statistically significant reductions in misclassification rates (P<0.001) compared with single interpretations, with improvement noted not only when the initial pathologist had a low weekly case volume but also when the initial pathologist had a high weekly case volume. The lowest overall misclassification rate, 14.3%, was noted when both first and second opinions were obtained from pathologists with high volumes. The greatest reduction in overall misclassification rate was when the first pathologist was from the low volume group and the second and, if needed, third, pathologist was from the high volume group.

Table 3

Over-interpretation, under-interpretation, and misclassification rates associated with three second opinion strategies based on case volume of interpreting pathologist

Strategy	Single interpretation rate (%)					Second opinion strategy rate (%)					P value*
	Reference consensus diagnosis					Reference consensus diagnosis
	Benign	Atypia	DCIS	Invasive		Benign	Atypia	DCIS	Invasive
Strategy 10:	Low volume pathologist					Second opinion from low volume pathologist, third opinion from high volume pathologist
Over-interpretation	14.4	17.9	2.8	–	10.5 (9.4 to 11.7)	8.4	11.2	0.7	–	6.1 (4.7 to 6.7)
Under-interpretation	–	36.9	14.5	4.4	15.9 (14.5 to 17.4)	–	30.0	9.4	3.6	12.2 (10.1 to 14.3)
Misclassification	14.4	54.8	17.2	4.4	26.4 (25.1 to 27.8)	8.4	41.2	10.1	3.6	18.3 (16.2 to 20.1)	P<0.001
Strategy 11:	Low volume pathologist					Second opinion from high volume pathologist, third opinion from high volume pathologist
Over-interpretation	14.4	17.9	2.8	–	10.5 (9.4 to 11.7)	7.3	11.0	0.4	–	5.6 (4.1 to 7.6)
Under-interpretation	–	36.9	14.5	4.4	15.9 (14.5 to 17.4)	–	26.8	7.9	3.1	10.7 (8.5 to 13.0)
Misclassification	14.4	54.8	17.2	4.4	26.4 (25.1 to 27.8)	7.3	37.8	8.2	3.1	16.3 (13.9 to 18.7)	P<0.001
Strategy 12:	High volume pathologist					Second opinion from high volume pathologist, third opinion from high volume pathologist
Over-interpretation	10.1	16.5	2.2	–	8.7 (7.1 to 10.1)	6.1	10.4	0.2	–	5.0 (3.0 to 8.2)
Under-interpretation	–	30.7	11.1	3.0	12.9 (11.3 to 14.3)	–	23.7	6.4	2.5	9.3 (6.1 to 12.8)
Misclassification	10.1	47.2	13.3	3.0	21.5 (19.5 to 23.4)	6.1	34.1	6.6	2.5	14.3 (10.9 to 18.0)	P<0.001

High volume pathologists defined as those who report interpreting an average of 10 or more breast biopsy specimens per week. A lower volume pathologist reports 9 or fewer breast biopsy specimens per week. Study sample comprised 75 lower volume pathologists and 40 high volume pathologists.

Over-interpretation, under-interpretation, and misclassification rates associated with three second opinion strategies based on case volume of interpreting pathologist High volume pathologists defined as those who report interpreting an average of 10 or more breast biopsy specimens per week. A lower volume pathologist reports 9 or fewer breast biopsy specimens per week. Study sample comprised 75 lower volume pathologists and 40 high volume pathologists. *Based on Wald test for difference in overall misclassification rates between second opinion strategy and single pathologist interpretation. Test statistic uses bootstrap standard error of difference in rates As a secondary analysis we evaluated the agreement rate between assessments of the same case by single readers and assessments of the same case that incorporated second opinions. The average between pathologist pairwise agreement rate for single interpretations of the same case was 70.4%, whereas the corresponding agreement rate for interpretations that included second opinions was higher, at 79.3%. The corresponding κ statistics were 0.58 and 0.71.

Discussion

We examined the effects of 12 different strategies for obtaining second opinions by pathologists for breast biopsy specimens. The results support the common belief among clinicians that second opinions should be sought ideally from those with greater clinical experience and especially when the primary reviewing pathologist is uncertain. All strategies showed statistically significant improvements in accuracy except when only obtaining second opinions for cases with initial interpretations of invasive breast cancer. Improvements varied according to diagnostic attributes of the cases and the pathologists’ clinical experience. None of the strategies completely eliminated diagnostic variability, especially for cases of breast atypia, suggesting that approaches beyond obtaining a second opinion should be investigated for these challenging cases. Most US pathology laboratories have policies requiring second opinions for cases of invasive breast carcinoma, yet such diagnoses already have high diagnostic agreement among pathologists. The addition of a second opinion strategy only for cases of invasive breast cancer provided no statistically significant improvement; however, this finding does not indicate there is no clinical value in assuring the highest level of accuracy for invasive carcinoma considering the risks and benefits of treatment. Larger improvements were observed when second opinion strategies included cases with initial diagnoses of atypia and ductal carcinoma in situ (DCIS), two diagnostic categories less often included in laboratory polices mandating second opinions. However, even after applying an array of strategies for obtaining second opinions, the misclassification rates for atypia and DCIS remained high. In actual clinical practice, obtaining second opinions in such diagnostically complex areas might promote, over time, consensus within practices by highlighting diagnostic areas requiring education or expert consultation. Importantly, a small, proportional reduction in the over-interpretation of breast biopsy specimens may have a large absolute effect at a population level. Obtaining second opinions for all cases with an initial diagnosis of atypia, DCIS, or invasive breast cancer substantially reduced over-interpretation of benign cases without atypia; we observed a reduction in over-interpretation of these cases, from 12.9% with single interpretation to 6.0% with second opinion.

Strengths and limitations of this study

This study has potential limitations. The pathologists’ interpretations were independent and only involved a single slide for each case, yet in clinical practice many slides might be reviewed for each case and a second pathologist might be informed of the initial interpretation. It would be impossible and infeasible to design a study of second opinion strategies where full clinical case material for 60 breast biopsy specimens was covertly inserted into the day-to-day practice of more than 100 pathologists from diverse clinical practices. Knowledge of the initial pathologist’s interpretation may influence additional opinions, and this should be studied. Interestingly, about half the pathologists reported that when seeking a second opinion they typically blind the second reviewer to their initial diagnosis.14 We weighted the study cases to include more atypia, DCIS, and proliferative lesions, such as usual hyperplasia, than are typically observed in clinical practice, resulting in higher overall misclassification rates than observed in clinical practice. While the overall misclassification rate is useful for comparing different strategies using second opinions, the results within individual diagnostic categories are more relevant to clinical practice given the weighting of the test cases.25 In addition, future studies should consider differentiating DCIS grade and microinvasion. It has been suggested that the clinical course of the disease is the ideal means for assessment of the accuracy of a pathology diagnosis.26 However, the clinical course (natural history) is altered by diagnostic excision, clinical treatment, and heightened surveillance after breast biopsy; thus we defined our reference standard as the consensus diagnosis of three experienced pathologists in breast histopathology, a standard acceptable to most women undergoing biopsy and their clinicians. We selected the reference standard as defined by the expert consensus panel after comparison with a reference standard that included the majority opinion of the participants.6 27 The strengths of this study include the large number of participating pathologists (n=115), each interpreting 60 cases from the full range of diagnostic categories, providing 5 145 480 group level interpretations. We also assessed the impact of 12 different strategies for obtaining second opinions, including obtaining a second opinion on all cases and for predefined subsets based on the initial interpretation. In typical clinical practice, providers often identify challenging cases and then solicit second opinions from the most knowledgeable pathologist (that is, local expert). We simulated this by having pathologists with a high clinical volume provide second opinions in some strategies. For many study cases, participants indicated a desire for a second opinion prior to finalizing their diagnosis; thus, pathologists are likely already obtaining second opinions for cases they encounter in clinical practice, highlighting the relevance of our data.

Comparison with other studies

We suspect that patient outcomes will be improved by second opinions in clinical practice, but this was not evaluated. Previous studies have noted consistent rates of discrepant diagnoses uncovered by second opinion within surgical pathology in general,28 and within breast pathology specifically, with second reviews reported to identify clinically significant discrepancies in more than 10% of breast biopsy cases.15 16 17 18 19 20

Clinical and policy implications

Providing second opinions for all breast biopsy specimens or requiring that the interpreting pathologists must be experienced high volume clinicians may be unfeasible given the estimated millions of breast biopsies carried out each year.29 30 Our analysis therefore presents strategies that may be more realistic for clinical practice. Possible barriers to the adoption of second opinion strategies in clinical practice include workload constraints,31 uncertainty about the impact of second opinions on clinical outcomes, lack of readily available colleagues with expertise in breast disease, limited reimbursement by third party payers, and concerns about treatment delay. Conversely, the availability of digital whole slide imaging may speed second opinions in the future through telepathology. Though adopting a routine second opinion strategy for some or all breast biopsy specimens may seem daunting, the high rates of disagreement among practicing pathologists on some breast biopsy diagnostic categories is of concern.6 The financial burden of this variability in pathology diagnoses may be substantial.32 This includes unnecessary or incorrect treatment, lost income, morbidity, and death.

Conclusion

Breast biopsy specimens are challenging to interpret,6 and many pathologists seek second opinions in clinical practice through informal routes.14 It might be time for clinical support systems and payment structures to align with and better support clinicians in their current practice. In this study we observed reductions in both over-interpretation and under-interpretation of breast pathology when systematically including second opinion strategies, and we noted that pathologists desire second opinions in a substantial proportion of breast biopsy cases. Improvement was observed regardless of whether a second opinion was or was not desired by the initial pathologist, and it was most notable when the initial interpretation was atypia or DCIS. The feasibility and cost of implementing specific second opinion strategies in clinical practice need further consideration. Studies have documented extensive variability in the interpretation of breast biopsy tissue by pathologists, with resulting concern about harm to patients Though statistically significant changes in diagnosis have been reported in more than 10% of breast biopsy cases on secondary review, no studies have systematically compared different strategies for obtaining second opinions as an approach to reducing errors Second opinions can statistically significantly improve diagnostic accuracy of interpretation of breast histopathology Accuracy improves regardless of pathologists’ confidence in their diagnosis or experience Second opinions improve but do not completely eliminate diagnostic variability, especially in the challenging case of breast atypia

28 in total

1. Changes in breast cancer therapy because of pathology second opinions.

Authors: Valerie L Staradub; Kathleen A Messenger; Nanjiang Hao; Elizabeth L Wiley; Monica Morrow
Journal: Ann Surg Oncol Date: 2002-12 Impact factor: 5.344

2. Interobserver reproducibility in the diagnosis of ductal proliferative breast lesions using standardized criteria.

Authors: S J Schnitt; J L Connolly; F A Tavassoli; R E Fechner; R L Kempson; R Gelman; D L Page
Journal: Am J Surg Pathol Date: 1992-12 Impact factor: 6.394

Review 3. Special report: Consensus conference III. Image-detected breast cancer: state-of-the-art diagnosis and treatment.

Authors: Melvin J Silverstein; Abram Recht; Michael D Lagios; Ira J Bleiweiss; Peter W Blumencranz; Terri Gizienski; Steven E Harms; Jay Harness; Roger J Jackman; V Suzanne Klimberg; Robert Kuske; Gary M Levine; Michael N Linver; Elizabeth A Rafferty; Hope Rugo; Kathy Schilling; Debu Tripathy; Frank A Vicini; Pat W Whitworth; Shawna C Willey
Journal: J Am Coll Surg Date: 2009-08-20 Impact factor: 6.113

4. Expertise vs evidence in assessment of breast biopsies: an atypical science.

Authors: Nancy E Davidson; David L Rimm
Journal: JAMA Date: 2015-03-17 Impact factor: 56.272

5. Diagnostic concordance among pathologists interpreting breast biopsy specimens.

Authors: Joann G Elmore; Gary M Longton; Patricia A Carney; Berta M Geller; Tracy Onega; Anna N A Tosteson; Heidi D Nelson; Margaret S Pepe; Kimberly H Allison; Stuart J Schnitt; Frances P O'Malley; Donald L Weaver
Journal: JAMA Date: 2015-03-17 Impact factor: 56.272

6. Institutional pathology consultation.

Authors: Jeffrey S H Tsung
Journal: Am J Surg Pathol Date: 2004-03 Impact factor: 6.394

7. Screening-detected breast cancers: discordant independent double reading in a population-based screening program.

Authors: Solveig Hofvind; Berta M Geller; Robert D Rosenberg; Per Skaane
Journal: Radiology Date: 2009-09-29 Impact factor: 11.105

8. Variability in Pathologists' Interpretations of Individual Breast Biopsy Slides: A Population Perspective.

Authors: Joann G Elmore; Heidi D Nelson; Margaret S Pepe; Gary M Longton; Anna N A Tosteson; Berta Geller; Tracy Onega; Patricia A Carney; Sara L Jackson; Kimberly H Allison; Donald L Weaver
Journal: Ann Intern Med Date: 2016-03-22 Impact factor: 25.391

9. Understanding diagnostic variability in breast pathology: lessons learned from an expert consensus review panel.

Authors: Kimberly H Allison; Lisa M Reisch; Patricia A Carney; Donald L Weaver; Stuart J Schnitt; Frances P O'Malley; Berta M Geller; Joann G Elmore
Journal: Histopathology Date: 2014-04-02 Impact factor: 5.087

10. Effectiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review.

Authors: J Dinnes; S Moss; J Melia; R Blanks; F Song; J Kleijnen
Journal: Breast Date: 2001-12 Impact factor: 4.380

12 in total

1. Second opinion strategies in breast pathology: a decision analysis addressing over-treatment, under-treatment, and care costs.

Authors: Anna N A Tosteson; Qian Yang; Heidi D Nelson; Gary Longton; Samir S Soneji; Margaret Pepe; Berta Geller; Patricia A Carney; Tracy Onega; Kimberly H Allison; Joann G Elmore; Donald L Weaver
Journal: Breast Cancer Res Treat Date: 2017-09-06 Impact factor: 4.872

2. Characteristics associated with requests by pathologists for second opinions on breast biopsies.

Authors: Berta M Geller; Heidi D Nelson; Donald L Weaver; Paul D Frederick; Kimberly H Allison; Tracy Onega; Patricia A Carney; Anna N A Tosteson; Joann G Elmore
Journal: J Clin Pathol Date: 2017-05-02 Impact factor: 3.411

3. Effects of DNA Methylation on Progression to Interstitial Fibrosis and Tubular Atrophy in Renal Allograft Biopsies: A Multi-Omics Approach.

Authors: S V Bontha; D G Maluf; K J Archer; C I Dumur; M G Dozmorov; A L King; E Akalin; T F Mueller; L Gallon; V R Mas
Journal: Am J Transplant Date: 2017-07-08 Impact factor: 8.086

4. Diagnostic Reproducibility: What Happens When the Same Pathologist Interprets the Same Breast Biopsy Specimen at Two Points in Time?

Authors: Sara L Jackson; Paul D Frederick; Margaret S Pepe; Heidi D Nelson; Donald L Weaver; Kimberly H Allison; Patricia A Carney; Berta M Geller; Anna N A Tosteson; Tracy Onega; Joann G Elmore
Journal: Ann Surg Oncol Date: 2016-12-02 Impact factor: 5.344

5. Histopathological diagnosis of cutaneous melanocytic lesions: blinded and nonblinded second opinions offer similar improvement in diagnostic accuracy.

Authors: Kathleen F Kerr; Gary M Longton; Lisa M Reisch; Andrea C Radick; Megan M Eguchi; Hannah L Shucard; Margaret S Pepe; Michael W Piepkorn; David E Elder; Raymond L Barnhill; Joann G Elmore
Journal: Clin Exp Dermatol Date: 2022-06-22 Impact factor: 4.481

6. Complexities of perceived and actual performance in pathology interpretation: A comparison of cutaneous melanocytic skin and breast interpretations.

Authors: Patricia A Carney; Paul D Frederick; Lisa M Reisch; Linda Titus; Stevan R Knezevich; Martin A Weinstock; Michael W Piepkorn; Raymond L Barnhill; David E Elder; Donald L Weaver; Joann G Elmore
Journal: J Cutan Pathol Date: 2018-04-26 Impact factor: 1.587

7. Helicobacter pylori Eradication for Metachronous Gastric Cancer: An Unsuitable Methodology Impeding Broader Clinical Usage.

Authors: Alexios-Fotios A Mentis; Efthimios Dardiotis
Journal: Front Oncol Date: 2019-02-20 Impact factor: 6.244

8. Assessment of Second-Opinion Strategies for Diagnoses of Cutaneous Melanocytic Lesions.

Authors: Michael W Piepkorn; Gary M Longton; Lisa M Reisch; David E Elder; Margaret S Pepe; Kathleen F Kerr; Anna N A Tosteson; Heidi D Nelson; Stevan Knezevich; Andrea Radick; Hannah Shucard; Tracy Onega; Patricia A Carney; Joann G Elmore; Raymond L Barnhill
Journal: JAMA Netw Open Date: 2019-10-02

9. Inter-observer reproducibility of classical lobular neoplasia (B3 lesions) in preoperative breast biopsies: a study of the Swiss Working Group of breast and gynecopathologists.

Authors: Linda Moskovszky; Barbara Berger; Achim Fleischmann; Thomas Friedrich; Birgit Helmchen; Meike Körner; Tilman T Rau; Zsuzsanna Varga
Journal: J Cancer Res Clin Oncol Date: 2020-03-30 Impact factor: 4.553

10. Breast Cancer Prognostic Factors in the Digital Era: Comparison of Nottingham Grade using Whole Slide Images and Glass Slides.

Authors: Tara M Davidson; Mara H Rendi; Paul D Frederick; Tracy Onega; Kimberly H Allison; Ezgi Mercan; Tad T Brunyé; Linda G Shapiro; Donald L Weaver; Joann G Elmore
Journal: J Pathol Inform Date: 2019-04-03