| Literature DB >> 21935501 |
Amita K Manatunga1, José Nilo G Binongo, Andrew T Taylor.
Abstract
BACKGROUND: The accuracy of computer-aided diagnosis (CAD) software is best evaluated by comparison to a gold standard which represents the true status of disease. In many settings, however, knowledge of the true status of disease is not possible and accuracy is evaluated against the interpretations of an expert panel. Common statistical approaches to evaluate accuracy include receiver operating characteristic (ROC) and kappa analysis but both of these methods have significant limitations and cannot answer the question of equivalence: Is the CAD performance equivalent to that of an expert? The goal of this study is to show the strength of log-linear analysis over standard ROC and kappa statistics in evaluating the accuracy of computer-aided diagnosis of renal obstruction compared to the diagnosis provided by expert readers.Entities:
Year: 2011 PMID: 21935501 PMCID: PMC3175375 DOI: 10.1186/2191-219X-1-5
Source DB: PubMed Journal: EJNMMI Res ISSN: 2191-219X Impact factor: 3.138
Number of kidneys rated by RENEX and the consensus readings of experts (n = 185)
| RENEX reading | Consensus reading | ||
|---|---|---|---|
| Non-obstructed | Equivocal | Obstructed | |
| Non-obstructed | 101 | 7 | 1 |
| Equivocal | 14 | 13 | 2 |
| Obstructed | 5 | 9 | 33 |
Agreement between RENEX and consensus for each rating category
| Category | Regression coefficient, δa (SE) | |
|---|---|---|
| Non-obstructed ( | 1.57 (0.30) | < 0.0001 |
| Equivocal ( | -0.28 (0.31) | 0.37 |
| Obstructed ( | 1.82 (0.36) | < 0.0001 |
* p < 0.05 indicates there is a significant beyond-chance agreement between RENEX and consensus readings in the corresponding category. aThe δs reflect the strength of the beyond-chance agreement within each specific category.
Pairwise agreement within experts and between experts and RENEX
| Log-linear model coefficients | ||||||
|---|---|---|---|---|---|---|
| Between experts | RENEX and expert | |||||
| [ | [ | [ | [ | [ | [ | |
| Non-obstructed | 1.58* (0.42) | 0.36 (0.40) | 0.91 (0.46) | 0.26 (0.40) | 1.08* (0.41) | 0.84* (0.32) |
| Equivocal | -0.12 (0.40) | 0.89 (0.37) | -0.58 (0.44) | -0.06 (0.37) | -0.28 (0.39) | 0.47 (0.33) |
| Obstructed | 0.78 (0.46) | -0.07 (0.43) | 1.47* (0.45) | 0.48 (0.43) | 1.08 (0.44) | 0.41 (0.38) |
| Among experts | RENEX and expert | Experts vs. RENEX | ||||
| Non-obstructed | 0.15 | 0.47 | 0.41 | |||
| Equivocal | 0.08 | 0.34 | 0.95 | |||
| Obstructed | 0.11 | 0.58 | 0.81 | |||
* Significant positive pairwise agreement at α = 0.01. aHypothesis that the overall pairwise agreement among experts is the same. The δs reflect the strength of the beyond-chance agreement between two raters within each specific category. bHypothesis that the overall pairwise agreement between RENEX and each expert is the same. The θs reflect the strength of the beyond-chance agreement between RENEX and an expert within each specific category. cHypothesis that the overall pairwise agreement among experts is the same as the overall pairwise agreement between each expert and RENEX.
Three-way agreement within experts and between experts and RENEX
| Log-linear model coefficients | ||||
|---|---|---|---|---|
| Response category | Experts | RENEX and experts | ||
| [ | [ | [ | [ | |
| Non-obstructed | 0.82* (0.29) | 0.98* (0.30) | 0.60 (0.38) | 0.60 (0.38) |
| Equivocal | 0.64 (0.28) | -0.34 (0.37) | 0.32 (0.28) | -0.21 (0.34) |
| Obstructed | 0.08 (0.97) | 1.48* (0.42) | -0.45 (0.98) | 1.89* (0.35) |
| Non-obstructed | 0.79 | |||
| Equivocal | 0.03 | |||
| Obstructed | 0.49 | |||
* Significant positive three-way agreement at α = 0.01. aTests the hypothesis that the overall three-way agreement among experts is the same as the overall three-way agreement between RENEX and any two experts