| Literature DB >> 33179459 |
Sarah J Erickson-Bhatt1, Douglas G Simpson1,2, Stephen A Boppart1,3,4.
Abstract
SIGNIFICANCE: Optical coherence tomography (OCT) is widely used as a potential diagnostic tool for a variety of diseases including various types of cancer. However, sensitivity and specificity analyses of OCT in different cancers yield results varying from 11% to 100%. Hence, there is a need for more detailed statistical analysis of blinded reader studies. AIM: Extensive statistical analysis is performed on results from a blinded study involving OCT of breast tumor margins to assess the impact of reader variability on sensitivity and specificity. APPROACH: Five readers with varying levels of experience reading OCT images assessed 50 OCT images of breast tumor margins collected using an intraoperative OCT system. Statistical modeling and analysis was performed using the R language to analyze reader experience and variability.Entities:
Keywords: cancer; diagnostic accuracy; intraoperative; optical coherence tomography; reader variability; sensitivity and specificity
Year: 2020 PMID: 33179459 PMCID: PMC7657413 DOI: 10.1117/1.JBO.25.11.116002
Source DB: PubMed Journal: J Biomed Opt ISSN: 1083-3668 Impact factor: 3.170
Fig. 1Box plots comparing distributions of individual reader scores and mean scores for positive (P) and negative (N) images. The R# labels refer to the reader number (1 to 5).
Fig. 2ROC curves for (a) mean score and (b) median score classification. The scale bar on the right represents the different threshold values from 1 (black) to 4 (light blue). For each, the ROC value for the optimal threshold is indicated on the curve and labeled in red.
Fig. 3Individual reader AUC performance ( standard errors) versus level of experience (ordinal score from 1 to 4). The points for two readers with were offset horizontally to display their individual AUC values.
Fig. 4Comparison of individual reader call frequencies and mean score call frequencies versus histology (P = positive and N = negative) using threshold of 2.0 for individual reader scores and a threshold of 1.8 for mean scores; the individual reader bar plots are in increasing order of training level.
Estimated sensitivity and specificity for individual readers and combined call rules versus histology, in order of increasing reader experience (95% confidence intervals are shown in parentheses).
| Experience level | Call rule | Sensitivity (95% CI) | Specificity (95% CI) |
|---|---|---|---|
| 1 | 37.5 (21.2, 57.3) | 94.7 (87.2, 97.9) | |
| 2 | 95.8 (79.8, 99.8) | 40.8 (30.4, 52.0) | |
| 3 | 70.8 (50.8, 85.1) | 84.2 (74.4, 90.7) | |
| 4 | 83.3 (64.2, 93.3) | 90.8 (82.2, 95.5) | |
| 4 | 83.3 (64.2, 93.3) | 88.2 (79.0, 93.6) | |
| Mixed | 91.7 (74.2, 97.7) | 89.5 (80.6, 94.6) | |
| Mixed | 91.7 (74.2, 97.7) | 92.1 (83.8, 96.3) |
Fig. 5Diagnostic probability estimated from average reader scores (predictor variable) and histology calls (binary response variable) via logistic regression. The red dotted line indicates that the 1.8 threshold corresponds to a likelihood of being found positive on histology.
Logistic (log-odds) regression analysis of reader experience as a factor in predicative accuracy of reader scores. “ExpNum” is the numerical experience score, whereas “ExpCat” is the experience level modeled as a category variable. The best model (in bold) minimizes the Akaike information criterion (AIC). LLR chi-square statistics test each model against the most complex listed first; significance of the test indicates lack of fit of the reduced model.
| Model variables | AIC | LLR | Degrees of freedom | |
|---|---|---|---|---|
| Score, ExpCat, score*ExpCat | 385.82 | Reference | — | — |
| Score, ExpCat, score*ExpNum | 1.17 | 2 | 0.556 | |
| Score, ExpCat | 386.52 | 6.70 | 3 | 0.082 |
| Score | 406.77 | 32.96 | 6 |
Fig. 6Superimposed diagnostic probability curves based on experience adjusted logistic regression analysis of histology versus reader scores and reader training levels. Fitted curves are for four levels of prior training: , of prior training. Also shown are the ideal probability curve of a hypothetical perfect predictor (orange dashed line) and the probability curve based on the mean score prediction model discussed above (red line). An additional plot of diagnostic probability curves for positive histology versus reader score is shown in the Supplementary Material.
Repeatability of reader scores between original and reversed images.
| Experience level | Reader | Mean absolute score difference | Polychoric correlation |
|---|---|---|---|
| 1 | R5 | 0.26 | 0.662 |
| 2 | R1 | 0.36 | 0.895 |
| 3 | R4 | 0.20 | 0.916 |
| 4 | R2 | 0.08 | 0.994 |
| 4 | R3 | 0.10 | 0.997 |
| Mixed | Median | 0.18 | 0.97 |
| Mixed | Mean | 0.14 | 0.951 |