| Literature DB >> 22516804 |
Abstract
Laboratory receiver operating characteristic (ROC) studies, that are often used to evaluate medical imaging systems, differ from 'live' clinical interpretations in several respects which could compromise their clinical relevance. The aim was to develop methodology for quantifying the clinical relevance of a laboratory ROC study. A simulator was developed to generate ROC ratings data and binary clinical interpretations classified as correct or incorrect for a common set of images interpreted under clinical and laboratory conditions. The area under the trapezoidal ROC curve (AUC) was used as the laboratory figure-of-merit and the fraction of correct clinical decisions as the clinical figure-of-merit. Conventional agreement measures (Pearson, Spearman, Kendall and kappa) between the bootstrap-induced fluctuations of the two figures of merit were estimated. A jackknife pseudovalue transformation applied to the figures of merit was also investigated as a way to capture agreement existing at the individual image level that could be lost at the figure-of-merit level. It is shown that the pseudovalues define a relevance-ROC curve. The area under this curve (rAUC) measures the ability of the laboratory figure-of-merit-based pseudovalues to correctly classify incorrect versus correct clinical interpretations. Therefore, rAUC is a measure of the clinical relevance of an ROC study. The conventional measures and rAUC were compared under varying simulator conditions. It was found that design details of the ROC study, namely the number of bins, the difficulty level of the images, the ratio of disease-present to disease-absent images and the unavoidable difference between laboratory and clinical performance levels, can lead to serious underestimation of the agreement as indicated by conventional agreement measures, even for perfectly correlated data, while rAUC showed high agreement and was relatively immune to these details. At the same time rAUC was sensitive to factors such as intrinsic correlation between the laboratory and clinical decision variables and differences in reporting thresholds that are expected to influence agreement both at the individual image level and at the figure-of-merit level. Suggestions are made for how to conduct relevance-ROC studies aimed at assessing agreement between laboratory and clinical interpretations. The method could be used to evaluate the clinical relevance of alternative scalar figures of merit, such as the sensitivity at a predifined specificity.Entities:
Mesh:
Year: 2012 PMID: 22516804 PMCID: PMC3352681 DOI: 10.1088/0031-9155/57/10/2873
Source DB: PubMed Journal: Phys Med Biol ISSN: 0031-9155 Impact factor: 3.609