| Literature DB >> 31363920 |
Walter Stummer1, Raphael Koch2, Ricardo Diez Valle3, David W Roberts4, Nadar Sanai5, Steve Kalkanis6, Constantinos G Hadjipanayis7, Eric Suero Molina8.
Abstract
BACKGROUND: Surgery for gliomas is often confounded by difficulties in distinguishing tumor from surrounding normal brain. For better discrimination, intraoperative optical imaging methods using fluorescent dyes are currently being explored. Understandably, such methods require the demonstration of a high degree of diagnostic accuracy and clinical benefit. Currently, clinical utility is determined by tissue biopsies which are correlated to optical signals, and quantified using measures such as sensitivity, specificity, positive predictive values, and negative predictive values. In addition, surgical outcomes, such as extent of resection rates and/or survival (progression-free survival (PFS) and overall survival (OS)) have been measured. These assessments, however, potentially involve multiple biases and confounders, which have to be minimized to ensure reproducibility, generalizability and comparability of test results. Test should aim at having a high internal and external validity. The objective of this article is to analyze how diagnostic accuracy and outcomes are utilized in available studies describing intraoperative imaging and furthermore, to derive recommendations for reliable and reproducible evaluations.Entities:
Keywords: 5-ALA; Diagnostic accuracy; Fluorescein; Fluorescence guidance; Glioma; STARD CNS
Mesh:
Year: 2019 PMID: 31363920 PMCID: PMC6739423 DOI: 10.1007/s00701-019-04007-y
Source DB: PubMed Journal: Acta Neurochir (Wien) ISSN: 0001-6268 Impact factor: 2.216
Diagnostic decision matrix for diagnostic accuracy
|
| |||
|
|
| ||
| Test result | Positive | True positive (TP) The test is positive and the subject suffers the disease | False positive (FP) The test is positive whereas the subject is healthy |
| Negative | False negative (FN) The test is negative, yet the patient suffers the disease | True negative (TN) The test is negative and the subject is healthy | |
|
|
|
| |
| Sensitivity | probability of a positive test result when a subject has the disease | TP/(TP + FN) | |
| Specificity | Probability of a test being negative if a patient does not have the disease | TN/(TN + FP) | |
| Positive predictive value | Probability that the patient has the disease if the test is positive | TP/(TP + FP) | |
| Negative predictive value | Probability that the patient is disease free if the test is negative | TN/(TN + FN) | |
Fig. 1a Influence of tissue allocation bias type 1 on the NPV and specificity. Since gliomas are infiltrating tumors and the density of infiltrating cells will decrease rapidly with distance from the tumor bulk, the calculated NPV and specificity will be higher the further away from the tumor samples are collected because of the lower likelihood for falsely negative samples. b Influence of tissue allocation bias type 1 on PPV and sensitivity. The likelihood for finding falsely positive biopsies will depend on the location of biopsies. If samples are collected predominantly in the main tumor mass, the calculated PPV and sensitivity will be high. If samples are collected at the margins and the diagnostic method unreliably detects tumor, the PPV will be lower
Fig. 2Tissue allocation bias type 3. Intraoperative optical diagnostic information is usually two-dimensional, i.e., only giving superficial information from the exposed tissue. The biopsy, on the other hand, is three-dimensional and assessment of only a part of the biopsy might miss the pathology
Fig. 3Timing and threshold bias pertinent for fluorochromes that are applied i.v. that do not have any specific tumor affinity (e.g., fluorescein sodium, Diaz et al. 2015), or expected to have selective affinity (targeted fluorochromes, e.g., APC-analoga, Swanson et al. 2015). This graph illustrates the course of fluorescence in different tissue compartments. (A) After i.v. injection, concentrations will be high in blood vessels, all perfused tissues, and will slowly abate. (B) Due to extravasation through BBB disruption within malignant tumor, pseudo-selectivity will ensue; this effect will also pertain to any areas of surgically induced BBB damage, e.g., the resection margin. (C) Meanwhile, extravasated fluorophore propagates with edema into peritumoral tissue in an unspecific manner. The apparent diagnostic accuracy will strongly depend on the definition of thresholds and on time after injection. (D) For targeted fluorochromes, selective retention can be expected after clearance from edema and plasma. These curves directly the signal-to-noise ratio, which changes over time
How with a given diagnostic method, differences in the number of biopsies obtained from certain regions, based on the sampling algorithm chosen by investigator A compared to investigator B, will strongly influence the results for the measures of diagnostic accuracy
| Tumor center | Tumor margin | Normal tissue | Sens | Spec | NPV | PPV | |
|---|---|---|---|---|---|---|---|
| Investigator A |
| 1 TP 2 FP | 3 TN 2 FN |
| 0.75 | 0.6 |
|
| Investigator B |
| 1 TP 2 FP | 3 TN 2 FN |
| 0.66 | 0.66 |
|
In this hypothetical example, only the number of truly positive samples from the tumor center was varied, causing a relevant difference in sensitivity and positivity (italic entries).
How pooling samples from different patients influences results
| TP | FP | TN | FN | Sensitivity | Specificity | |
|---|---|---|---|---|---|---|
| Patient A | 10 | 1 | 5 | 1 | 0.91 | 0.83 |
| Patient B | 1 | 1 | 1 | 1 | 0.50 | 0.50 |
| Average measures | 0.71 | 0.67 | ||||
| Pooled biopsies | 11 | 2 | 6 | 2 | 0.78 | 0.75 |
The two hypothetical assessments differ only in the number of samples taken by investigators per site with a particular method
Potential biases and confounders in establishing diagnostic accuracy of intraoperative optical diagnostics
| Bias type | Explanation |
|---|---|
| Tissue allocation bias type A | The This is due to the fact that during intraoperative diagnostic testing in a typical, infiltrating brain tumor the prevalence of tumor cells is high in its center, whereas at the infiltrating margin the prevalence of tumor cells is lower and decreases with distance away from the tumor bulk. If samples are taken immediately beyond the margins of the highlighted tumors, the likelihood of finding unmarked, falsely negative tumor cells (FN) will be higher than if the samples are taken at a distance using the same method. This will directly affect NPV and specificity. Conversely, if marked tissue samples are taken only at the center of the tumor, the prevalence of tumor cells will be high and the rate of false positive samples low, and the calculation of PPV or sensitivity will give high values. When samples are taken at the more critical margin using the same method, it is to be expected that the rate false positive samples will be higher and the values for sensitivity and PPV lower. However, in practice PPV and sensitivity will not be as susceptible to such strong effects as the NPV and specificity, since invariably the surgeon will primarily target gross tumor, as defined by neuronavigation, ultrasound, or the optical impression under conventional illumination, and (understandably) not adjacent inconspicuous brain.
|
| Tissue allocation bias type B | With methods that provide ambiguous signals, investigators are more likely to sample areas of the tumor that are judged to be abnormal with conventional illumination, e.g., by texture or color, than the inconspicuous margins, which might look like normal brain. Thus, the likelihood for true positive samples may be high with such methods despite the limitations of the method for detecting tumor at the margins. In other words, the distribution of the optical signal or tissue characteristics with conventional illumination will influence the surgeon not to adhere to a truly random biopsy regime as he will be guided to take biopsies most likely where he sees the signal or suspects tumor with conventional illumination. |
| Tissue allocation bias type C | Depending on the size of the sample, the
|
| Bias from biopsy frequency | In many studies in the brain the number of typically collected tissue samples is rather low and the number per patient differ. These samples are then pooled for the final analysis of measures of diagnostic accuracy. These, however, depend on the number of samples taken in a certain brain region an entered into the calculation. Table
|
| Pooling samples from different patients | If a method fails to show a signal in one patient, there will be little sampling in the tumor core. In patients in whom the method works well, it is likely that more samples will be taken. Pooling these samples will skew the results and overestimate the diagnostic accuracy for single patients. Pooling biopsies from different patients without taken the dependencies of biopsies within a patient into account will, in consequence, lead to an underestimation of the variability and of the confidence limits. Also, calculating diagnostic measures per patient and then averaging over all patients will lead to biased results (Table |
| Threshold bias in methods with significant signal-to-noise ratios | For optical methods which do not provide binary or dichotomous information (i.e., signal vs. no signal) but rather provide optical information with a wide range of values (i.e., continuous), including low level signals from normal tissue, that is, a background signal resulting in noise, the apparent discrimination between diseased tissue and normal tissue will depend on the threshold which is selected. A high threshold will decrease the likelihood of false positive samples and thus will increase PPV and specificity. A low threshold will reduce the likelihood of false negatives and will therefore increase NPV and sensitivity. For instance, spectrograpical methods will return data on a continuous scale and will be subject to this relationship, as demonstrated for 5-ALA derived porphyrin fluorescence (Valdes et al. 2011, Stummer et al. 2014). Fluorochromes, such as fluorescein sodium, which are injected i.v. and present in the plasma, will lead to a background level within normal and peritumoral tissues (Fig.
|
| Timing bias | Many intraoperative imaging methods, which rely on dyes, reveal time-dependent staining of tumor tissue and also surrounding signal with a varying
|
| Bias from methods for histological assessment | Histological assessments are clearly an important standard of truth (reference standard) for intraoperative optical testing. However, it is difficult even for the experienced neuropathologist to identify individual tumor cells based on conventional stains (e.g., H&E) only. Immunohistochemical approaches might serve to increase the likelihood of detecting tumor cells in samples, e.g., Ki67 staining, p53 or IDH1 staining, Results regarding sensitivity and specificity will vary depending on the sensitivity of neuropathological assessments and the detection of tumor cells in the peritumoral region.
|
Fig. 4PRISMA flow diagram
Frequency of patients and biopsies in studies summarized in Table 2 (for studies with biopsies
|
|
| ||
|---|---|---|---|
| Mean | 25.3 | 103 | 5.20 |
| Standard deviation | 21.0 | 85.8 | 4.29 |
| Minimum | 3 | 4 | 0.83 |
| Median | 21 | 88 | 4 |
| Maximum | 99 | 354 | 22 |
N number of patients in study, n number of biopsies per study, n/N number of biopsies per patient per study
Fig. 5Hypothetical examples of validation algorithms of a new microscope for visualizing fluorescence in a diffusely infiltrating tumor compared to an established method. The question to be answered are: does the new method have a similar or better diagnostic accuracy, does the new method detect the same low or lower density of infiltrating cells (biological assessment, left part of the diagram), does the new method disclose the same visual margins of fluorescence (visual assessment, right). IHC immunhistochemistry, EvG Elastica van Gieson, IDH isocitrate dehydrogenase, GFAP glial fibrillary acidic protein, MGMT O6-methylguanine DNA methyltransferase
STARD-CNS
1. Introduction As with the STARD initiative [ These recommendations do not only pertain to fluorescence methods but to any methods that relate tissues identified intraoperatively to imaging and/or histology, e.g., other forms of non-optical tumor identification such as navigation per se, intraoperative MRI, ultrasound, but also to targeted fluorochromes or narrow field methods such as OCT, RAMAN, confocal imaging and others. Also, these suggestions may not only be pertinent for gliomas but might be extended to other tumor entities in the brain as well (e.g., metastasis, meningiomas, adenomas) for which intraoperative detection methods are being developed or employed. Furthermore, methods of intraoperative tissue detection are also being explored for the surgery of tumors outside the CNS, where similar considerations regarding the evaluation of such methods are justified, e.g., for mapping of sentinel lymphatic node or identification of solid tumors by near infrared fluorescence (as reviewed in Schaafsma et al. [ | |
2. Recommendations pertaining to the design of a study • Consider a protocol with intraoperative neuronavigation and postoperative imaging for assessing the extent of the detection signal and how this relates to MRI morphology. • Consider addressing a particular tissue area first based on navigation, which relates this area to imaging data, then assessing the detection signal and finally collecting a biopsy. • In protocols containing neuronavigation for correlating tissue signal to imaging, methods should be described that compensate for the influence of brain shift • Consider histological assessment of the complete biopsy (the smallest unit of resection) and not only of a part of the biopsy • Consider expanding simple H&E histology by immunohistochemistry for better detection of infiltrating tumor cells • Consider focusing on the PPV in conjunction with the NPV (giving an exact description of • Consider using objective methods (e.g., spectrography) to validate subjective optical impressions. • Consider additional reference standards, i.e., extent of resection and outcome (safety, survival), apart from biopsies. • Define statistical methods for confirmatory endpoints ex ante. Involve a statistician in the planning stage (see • If an equivalent and sufficient number of biopsies per patient cannot be collected, consider appropriate statistical methods to adjust for varying numbers of biopsies (see below). • Consider randomization to analyze the usefulness of the method for improving resection rates on MRI and outcome to achieve independence from non-therapeutic factors, such as resectability, age etc. • In studies using a method with algorithms for identifying tumor based on a specific tissue characteristic, such as with optical properties (reflection, fluorescence) with processing of multiple inputs to give a final algorithm for tissue identification, a validation cohort is required to rule out algorithms only to be valid for the particular data set used for generating the algorithm (e.g., Butte et al. [ _______________________________________________________________________________________________________________________ 3. Checklist for reporting, expanding the STARD Checklist [ Bias reduction: • What methods were used to reduce rater bias, e.g., blinded assessments by pathologists or radiologists? • Were optical signals validated by objective detection technology, i.e., spectrography? • Did multiple raters address the optical signal independently? Tissue sampling algorithms: • Describe exactly • Were the location and the number of biopsies taken per patient documented? • With time sensitive methods of detection (e.g., fluorochromes injected i.v.): Are the time points at which biopsies were taken described? • It is recommended that the same number of samples be taken from similarly defined locations in individual patients. How was this handled? | |
Signal detection: • If the methodology employs thresholding, were the thresholds and the rationale for the thresholds exactly described? How was the background signal handled? Was ROC analysis employed for continuous data? Where values transformed? • If the methodology requires image processing, the exact procedure and settings need to be described in a reproducible way. • How was the technical equipment tested and maintained? • What factors confound signal detection and how are these handled? • Was intraobserver variability accounted for? • If algorithms for tissue detection are constructed using multiple inputs, was an independent cohort for cross-validation included? Reference standard • Describe which types of histological assessment are implemented, e.g., was immunohistochemistry used for identifying tumor cells in low density that infiltrate the brain? Which markers were assessed, e.g., Ki-67/MIB-1 staining, EGFR, GFAP, IDH1, p53, others? • If other reference standards are used (post-OP imaging, outcome, other optical imaging methods), are these exactly described? • What methods are used to ensure transparency in non-histological reference standards to allow comparability? Statistical considerations: • Was a statistician involved in the planning stage? Was a sample size calculation performed? What are the planned settings (type I error, power, assumed effects)? • What are the primary endpoints and statistical hypotheses? • What is the statistical design? • Were multiple testing procedures used for type I error control? • Were different diagnostic tools compared? Which statistical method was used? • Describe exactly how dependent data (biopsy within patient) and independent data (per patient) were handled. • Are statistical methods applied to account for the clustered data structure and differences in the number of biopsies per patient (e.g., generalized linear mixed models)? • Describe the applied statistical methods exactly and reproducibly. • Describe how missing data were handled. • Report estimates of diagnostic accuracy and measures of statistical uncertainty (e.g., sensitivity, specificity, PPV, NPV, and corresponding 95% confidence intervals). Are CI adjusted for clustered data structure? • If possible, use dichotomous outcomes for pathology and dichotomous or continuous measures for the diagnostic tool. |