OBJECTIVES: To investigate the reproducibility of peer ratings of consultant radiologists' reports, as part of the new General Medical Council (GMC) Performance Procedures. DESIGN: An evaluation protocol was piloted, used in a blocked, balanced, randomized generalizability analysis with three blocks of three judges (raters), each rating 30 reports from 10 radiologists, and re-rated to estimate intrarater reliability with conventional statistics (kappa). SETTING: Rating was performed at the Royal College of Radiologists. Volunteers were sampled from 23 departments of radiology in university teaching and district general hospitals. PARTICIPANTS: A nationally drawn non-random sample of 30 consultant radiologists contributing a total of 900 reports. Three trained and six non-trained judges were used in the rating analysis. RESULTS: A protocol was generated that was usable by judges. Generalizable results would be obtained with not less than three judges all rating the same 60 reports from a radiologist. CONCLUSIONS: Any assessment of performance of technical abilities in this field will need to use multiple assessors, basing judgements on an adequate sample of reports.
OBJECTIVES: To investigate the reproducibility of peer ratings of consultant radiologists' reports, as part of the new General Medical Council (GMC) Performance Procedures. DESIGN: An evaluation protocol was piloted, used in a blocked, balanced, randomized generalizability analysis with three blocks of three judges (raters), each rating 30 reports from 10 radiologists, and re-rated to estimate intrarater reliability with conventional statistics (kappa). SETTING: Rating was performed at the Royal College of Radiologists. Volunteers were sampled from 23 departments of radiology in university teaching and district general hospitals. PARTICIPANTS: A nationally drawn non-random sample of 30 consultant radiologists contributing a total of 900 reports. Three trained and six non-trained judges were used in the rating analysis. RESULTS: A protocol was generated that was usable by judges. Generalizable results would be obtained with not less than three judges all rating the same 60 reports from a radiologist. CONCLUSIONS: Any assessment of performance of technical abilities in this field will need to use multiple assessors, basing judgements on an adequate sample of reports.