OBJECTIVE: To study random and the systematic error in breast cancer grading, to find the source of disagreements and measure the reliability of graders so that appropriate corrective action can be taken. STUDY DESIGN: Five independent observers graded 50 breast carcinoma slides from 50 consecutive breast cancer specimens according to the Nottingham criteria. The polychoric correlation was used to measure association. Stuart-Maxwell and McNemar tests were used to measure equality of thresholds. RESULTS: The polychoric correlation among observers was high (mean = 0.803, 0.712, 0.797 and 0.602 for the final grade, tubule formation, nuclear pleomorphism and mitotic figures, respectively). However, there were significant differences in thresholds (6, 7, 7 and 9 pairs of 10 showing significant differences in classification of grades/scores for final grade, tubule formation, nuclear pleomorphism and mitotic counts, respectively). CONCLUSION: The high polychoric correlations suggest that random error in grading breast cancers in this study was low, confirming the underlying reliability of grading and graders. However, significant differences in the thresholds lowers raw agreement. Such a scenario may be rectified by increased intradepartmental discussion.
OBJECTIVE: To study random and the systematic error in breast cancer grading, to find the source of disagreements and measure the reliability of graders so that appropriate corrective action can be taken. STUDY DESIGN: Five independent observers graded 50 breast carcinoma slides from 50 consecutive breast cancer specimens according to the Nottingham criteria. The polychoric correlation was used to measure association. Stuart-Maxwell and McNemar tests were used to measure equality of thresholds. RESULTS: The polychoric correlation among observers was high (mean = 0.803, 0.712, 0.797 and 0.602 for the final grade, tubule formation, nuclear pleomorphism and mitotic figures, respectively). However, there were significant differences in thresholds (6, 7, 7 and 9 pairs of 10 showing significant differences in classification of grades/scores for final grade, tubule formation, nuclear pleomorphism and mitotic counts, respectively). CONCLUSION: The high polychoric correlations suggest that random error in grading breast cancers in this study was low, confirming the underlying reliability of grading and graders. However, significant differences in the thresholds lowers raw agreement. Such a scenario may be rectified by increased intradepartmental discussion.
Authors: Emad A Rakha; Jorge S Reis-Filho; Frederick Baehner; David J Dabbs; Thomas Decker; Vincenzo Eusebi; Stephen B Fox; Shu Ichihara; Jocelyne Jacquemier; Sunil R Lakhani; José Palacios; Andrea L Richardson; Stuart J Schnitt; Fernando C Schmitt; Puay-Hoon Tan; Gary M Tse; Sunil Badve; Ian O Ellis Journal: Breast Cancer Res Date: 2010-07-30 Impact factor: 6.466
Authors: Gisela L G Menezes; Ritse M Mann; Carla Meeuwis; Bob Bisschops; Jeroen Veltman; Philip T Lavin; Marc J van de Vijver; Ruud M Pijnappel Journal: Eur Radiol Date: 2019-05-27 Impact factor: 5.315