| Literature DB >> 32548282 |
Brian J Smith1, John M Buatti2, Christian Bauer3, Ethan J Ulrich3,4, Payam Ahmadvand5, Mikalai M Budzevich6, Robert J Gillies6, Dmitry Goldgof7, Milan Grkovski8, Ghassan Hamarneh5, Paul E Kinahan9, John P Muzi9, Mark Muzi9, Charles M Laymon10,11, James M Mountz11, Sadek Nehmeh12, Matthew J Oborski10, Binsheng Zhao13, John J Sunderland14, Reinhard R Beichel3.
Abstract
Quantitative imaging biomarkers (QIBs) provide medical image-derived intensity, texture, shape, and size features that may help characterize cancerous tumors and predict clinical outcomes. Successful clinical translation of QIBs depends on the robustness of their measurements. Biomarkers derived from positron emission tomography images are prone to measurement errors owing to differences in image processing factors such as the tumor segmentation method used to define volumes of interest over which to calculate QIBs. We illustrate a new Bayesian statistical approach to characterize the robustness of QIBs to different processing factors. Study data consist of 22 QIBs measured on 47 head and neck tumors in 10 positron emission tomography/computed tomography scans segmented manually and with semiautomated methods used by 7 institutional members of the NCI Quantitative Imaging Network. QIB performance is estimated and compared across institutions with respect to measurement errors and power to recover statistical associations with clinical outcomes. Analysis findings summarize the performance impact of different segmentation methods used by Quantitative Imaging Network members. Robustness of some advanced biomarkers was found to be similar to conventional markers, such as maximum standardized uptake value. Such similarities support current pursuits to better characterize disease and predict outcomes by developing QIBs that use more imaging information and are robust to different processing factors. Nevertheless, to ensure reproducibility of QIB measurements and measures of association with clinical outcomes, errors owing to segmentation methods need to be reduced.Entities:
Keywords: FDG PET; head and neck cancer; multi-site performance analysis; radiomics; segmentation
Mesh:
Substances:
Year: 2020 PMID: 32548282 PMCID: PMC7289247 DOI: 10.18383/j.tom.2020.00004
Source DB: PubMed Journal: Tomography ISSN: 2379-1381
Methods Used to Segment Tumors and Derive Quantitative Imaging Biomarkers in the QIN Segmentation Challenge
| Method | Description | Operator(s) |
|---|---|---|
| Manual | Manual Segmentation | 3 Radiation Oncologists |
| 1 | In-house software based on active contour segmentation | PhD research scientist |
| 2 | In-house software using a graph-based optimized segmentation | Radiation oncologist |
| 3 | Commercial software package Mirada Medical RTx | Imaging physicist |
| 4 | Combination of commercial software packages VCAR and PMOD | Medical physics postdoc |
| 5 | Commercial software package MIM | Imaging physicist |
| 6 | Commercial software package PMOD | Image analyst |
| 7 | In-house software based on 3D level-set segmentation | Medical image analysis graduate student |
Descriptions of the Quantitative Imaging Biomarkers Compared in the QIN Segmentation Challenge
| QIB | Description (Unit) | Type |
|---|---|---|
| Max | Maximum value in region of interest (SUV) | C |
| Peak | Maximum average gray value that is calculated from a 1 cm3 sphere placed within the region of interest ( | C |
| Mean | Mean value in region of interest (SUV) | C |
| MTV | Volume of region of interest (mL) | C |
| TLG | Total lesion glycolysis (mL) | C |
| Min | Minimum value in region of interest (SUV) | I |
| Standard | Standard deviation in region of interest (SUV) | I |
| RMS | Root mean square value in region of interest (SUV) | I |
| First Quartile | 25th percentile value in region of interest (SUV) | I |
| Median | 50th percentile value in region of interest (SUV) | I |
| Third Quartile | 75th percentile value in region of interest (SUV) | I |
| Upper Adjacent | First value in region of interest not greater than 1.5 times the interquartile range (SUV) | I |
| Q1 Distribution | Percent of gray values that fall within the first quarter of the grayscale range within the region of interest (%) | I |
| Q2 Distribution | Percent of gray values that fall within the second quarter (%) | I |
| Q3 Distribution | Percent of gray values that fall within the third quarter (%) | I |
| Q4 Distribution | Percent of gray values that fall within the fourth quarter (%) | I |
| Glycolysis Q1 | Lesion glycolysis calculated from the first quarter of the grayscale range within the region of interest (mL) | I |
| Glycolysis Q2 | Lesion glycolysis calculated from the second quarter (mL) | I |
| Glycolysis Q3 | Lesion glycolysis calculated from the third quarter (mL) | I |
| Glycolysis Q4 | Lesion glycolysis calculated from the fourth quarter (mL) | I |
| SAM | Standardized added metabolic activity ( | I |
| RA | Rim average; mean of uptake in a 2-voxel-wide rim region around region of interest (SUV) | I |
Abbreviations: C, common clinical biomarkers; I, biomarkers provided by the 3D Slicer PET-IndiC extension.
Figure 1.Boxplots showing the distribution of quantitative imaging biomarkers' (QIB) means calculated for each of the 8 segmentation methods for 47 lesions.
Figure 3.Boxplots showing the distribution of intraclass correlation coefficients (ICCs) across segmentation methods.
Figure 5.Boxplots of QIB root mean square error (RMSE) comparing method-specific odds ratios (ORs) estimated from hypothetical binary clinical outcomes simulated from QIB relationships defined by manual segmentations. RMSE is calculated as the square root of the estimated odds ratio bias squared plus its variance.
Figure 6.Heatmap summary of method-specific powers to detect OR associations between hypothetical binary clinical outcomes simulated from relationships defined by manual segmentation. QIBs and methods are ordered according to the similarity of their powers as measured by hierarchical clustering.
Summary of Performance Metrics for QIBs Grouped by Segmentation Impact
| QIB by Segmentation Impact | Population Mean CV | Average Absolute Relative Bias | Average wCV | Average ICC | Average Power |
|---|---|---|---|---|---|
| Low | |||||
| Max | 0.060 | 0.039 | 0.033 | 0.996 | 0.866 |
| Peak | 0.068 | 0.048 | 0.019 | 0.997 | 0.864 |
| Standard | 0.146 | 0.139 | 0.096 | 0.988 | 0.822 |
| Upper Adjacent | 0.039 | 0.041 | 0.042 | 0.993 | 0.854 |
| Group Mean (SD) | 0.078 (0.047) | 0.067 (0.049) | 0.048 (0.033) | 0.993 (0.004) | 0.851 (0.020) |
| Moderate | |||||
| Mean | 0.063 | 0.143 | 0.061 | 0.975 | 0.829 |
| RMS | 0.058 | 0.126 | 0.057 | 0.980 | 0.839 |
| First Quartile | 0.085 | 0.176 | 0.078 | 0.947 | 0.727 |
| Median | 0.070 | 0.144 | 0.067 | 0.967 | 0.788 |
| Third Quartile | 0.049 | 0.098 | 0.054 | 0.984 | 0.841 |
| RA | 0.111 | 0.106 | 0.072 | 0.940 | 0.660 |
| Group Mean (SD) | 0.073 (0.022) | 0.132 (0.029) | 0.065 (0.009) | 0.966 (0.018) | 0.781 (0.074) |
| High | |||||
| MTV | 0.559 | 0.370 | 0.367 | 0.910 | 0.703 |
| TLG | 1.054 | 1.542 | 0.528 | 0.861 | 0.623 |
| Glycolysis Q1 | 0.380 | 0.333 | 0.414 | 0.891 | 0.677 |
| Glycolysis Q2 | 0.269 | 0.248 | 0.371 | 0.910 | 0.726 |
| Glycolysis Q3 | 0.284 | 0.254 | 0.341 | 0.920 | 0.747 |
| Glycolysis Q4 | 0.479 | 0.392 | 0.454 | 0.915 | 0.700 |
| SAM | 0.559 | 0.370 | 0.367 | 0.910 | 0.703 |
| Group Mean (SD) | 0.538 (0.281) | 0.52 (0.459) | 0.409 (0.064) | 0.892 (0.031) | 0.677 (0.063) |
| Extreme | |||||
| Min | 0.232 | 0.672 | 0.108 | 0.894 | 0.434 |
| Q1 Distribution | 0.459 | 1.191 | 0.268 | 0.521 | 0.237 |
| Q2 Distribution | 0.176 | 0.148 | 0.149 | 0.389 | 0.113 |
| Q3 Distribution | 0.318 | 0.339 | 0.180 | 0.556 | 0.198 |
| Q4 Distribution | 0.198 | 0.203 | 0.253 | 0.655 | 0.362 |
| Group Mean (SD) | 0.277 (0.115) | 0.511 (0.431) | 0.192 (0.068) | 0.603 (0.188) | 0.269 (0.129) |
Abbreviations: CV, coefficient of variation; wCV, within coefficient of variation; ICC, intraclass correlation coefficient.
Figure 7.Example head and neck positron emission tomography (PET)/computed tomography (CT) segmentations of a cancerous lymph node. Guidance image provided to challenge participants indicating to segment a lesion (indicated as “2”) located next to a large primary tumor (X), which should be excluded in the segmentation (A). Substantial differences in derived quantitative imaging biomarkers can result from segmentation methods that correctly distinguish the lesion (B, C, F and G) versus those that leak into the primary tumor (D and H) or fail to distinguish the lymph node from the primary tumor (E).