OBJECTIVE: To demonstrate a method of assessing radiologic diagnostic agreement utilizing multiple blinded external readers. METHODS: Six body CT studies interpreted by one reader (primary reader) at the host institution were compiled with patient identifiers removed. Brief clinical histories that were available to the primary reader were provided. Radiologists at 22 centers participated and the interpretations were analyzed in aggregate with the consensus majority that served as the surrogate gold standard for each case. RESULTS: A total of 31 radiologists formed the group of secondary readers with two-thirds in academic practice averaging 8 years of experience (range: 1-25 years). The average findings per reader for cases A to F include: 1.9 (range: 1-5), 6.3 (range: 2-10), 10.4 (range: 7-14), 5.7 (range: 3-10), 4.2 (range: 2-8), and 3.8 (range: 1-7), respectively. There was agreement of the primary interpretation and the surrogate gold standard for each case. CONCLUSIONS: The results of our study demonstrate a wide range of interpretation, with wider ranges observed in more complex cases and with vague clinical complaints. Comparison to the primary reader required the use of aggregate analysis and an agreement percentage cutoff to minimize bias and the limitations of this type of study. An intensive evaluation of radiologist performance such as this could be considered in various settings such as a quality assurance program, intense scrutiny of an individual radiologist whether competency is in question, or for medicolegal purpose to ascertain standard of care.
OBJECTIVE: To demonstrate a method of assessing radiologic diagnostic agreement utilizing multiple blinded external readers. METHODS: Six body CT studies interpreted by one reader (primary reader) at the host institution were compiled with patient identifiers removed. Brief clinical histories that were available to the primary reader were provided. Radiologists at 22 centers participated and the interpretations were analyzed in aggregate with the consensus majority that served as the surrogate gold standard for each case. RESULTS: A total of 31 radiologists formed the group of secondary readers with two-thirds in academic practice averaging 8 years of experience (range: 1-25 years). The average findings per reader for cases A to F include: 1.9 (range: 1-5), 6.3 (range: 2-10), 10.4 (range: 7-14), 5.7 (range: 3-10), 4.2 (range: 2-8), and 3.8 (range: 1-7), respectively. There was agreement of the primary interpretation and the surrogate gold standard for each case. CONCLUSIONS: The results of our study demonstrate a wide range of interpretation, with wider ranges observed in more complex cases and with vague clinical complaints. Comparison to the primary reader required the use of aggregate analysis and an agreement percentage cutoff to minimize bias and the limitations of this type of study. An intensive evaluation of radiologist performance such as this could be considered in various settings such as a quality assurance program, intense scrutiny of an individual radiologist whether competency is in question, or for medicolegal purpose to ascertain standard of care.