Literature DB >> 35243026

Cleaning radiotherapy contours for radiomics studies, is it worth it? A head and neck cancer study.

Pierre Fontaine1,2, Vincent Andrearczyk2, Valentin Oreiller2,3, Daniel Abler2,3, Joel Castelli1, Oscar Acosta1, Renaud De Crevoisier1, Martin Vallières4, Mario Jreige3, John O Prior3, Adrien Depeursinge2,3.   

Abstract

A vast majority of studies in the radiomics field are based on contours originating from radiotherapy planning. This kind of delineation (e.g. Gross Tumor Volume, GTV) is often larger than the true tumoral volume, sometimes including parts of other organs (e.g. trachea in Head and Neck, H&N studies) and the impact of such over-segmentation was little investigated so far. In this paper, we propose to evaluate and compare the performance between models using two contour types: those from radiotherapy planning, and those specifically delineated for radiomics studies. For the latter, we modified the radiotherapy contours to fit the true tumoral volume. The two contour types were compared when predicting Progression-Free Survival (PFS) using Cox models based on radiomics features extracted from FluoroDeoxyGlucose-Positron Emission Tomography (FDG-PET) and CT images of 239 patients with oropharyngeal H&N cancer collected from five centers, the data from the 2020 HECKTOR challenge. Using Dedicated contours demonstrated better performance for predicting PFS, where Harell's concordance indices of 0.61 and 0.69 were achieved for Radiotherapy and Dedicated contours, respectively. Using automatically Resegmented contours based on a fixed intensity range was associated with a C-index of 0.63. These results illustrate the importance of using clean dedicated contours that are close to the true tumoral volume in radiomics studies, even when tumor contours are already available from radiotherapy treatment planning.
© 2022 The Authors.

Entities:  

Keywords:  Head and neck cancer; Radiomics; Survival analysis

Year:  2022        PMID: 35243026      PMCID: PMC8881196          DOI: 10.1016/j.ctro.2022.01.003

Source DB:  PubMed          Journal:  Clin Transl Radiat Oncol        ISSN: 2405-6308


Introduction

With the recent advances in computational science, the emergence of precision medicine is moving one step further to the clinical world. Radiomics allows quantitative analyses from radiological and nuclear medicine images with high throughput extraction to obtain prognostic patient information[1]. Unlike biopsies, radiomics does not require invasive sampling inside the tumor. It can provide an exhaustive and quantitative evaluation of lesion phenotype based on medical images that were acquired during diagnosis and treatment course. Established links between the radiomics features and outcomes of interest (e.g. staging, response to treatment) can be leveraged to assist clinical decisions prospectively. Radiomics features quantify the intensity, texture, and shape properties of provided Volumes of Interest (VOI)[2]. VOIs are necessary to focus the radiomics analysis on relevant biological structures, such as the tumoral volume. This contouring process, among others, is known to have a strong impact on the performance (e.g. precision, robustness) of the models[3]. Thus, the VOI must be as close as possible to the true tumoral volume if the latter is considered as the main source of information concerning the targeted outcomes.

Related work

Radiomics studies on Head and Neck cancer (H&N) are based on various kinds of delineations to obtain the VOIs, including the direct reutilization of those used for radiotherapy planning, (semi-) automatically generated (e.g. based on metabolic activity thresholding), or dedicated to the study using expert manual contours. Combinations of approaches are also used in some cases, such as manual contouring refined using automatic re-segmentation[2]. Unfortunately, the delineation approach is often not clearly reported in the literature. Table 1 lists the types of delineation methods used in several H&N radiomics studies. The direct reutilization of VOIs created in the context of radiotherapy planning was used in [4], [5], [6], [7], [8]. This allows performing radiomics studies without the need for re-annotating the images specifically for these tasks. The contours made for radiotherapy are, however, very large as compared to the true tumoral volumes and frequently include non-tumoral tissues and parts of other organs (e.g. trachea, see Fig. 1).
Table 1

VOI delineation methods used in H&N radiomics studies.

Authorsdelineation purposedelineation methodimaging modalities
(Castelli et. al 2019)[5]radiotherapymanualPET/CT
(Leger et. al 2019)[9]radiotherapymanual + re-segmentationCT
(Parmar et. al 2015)[16]unknownmanualCT
(Zhang et. al 2008)[17]unknownsemi-autoSonograms
(Bogowicz et. al 2017a)[11]radiotherapymanual + re-segmentationCT
(Leijenaar et. al 2018)[6]radiotherapymanualCT
(Al Ajmi et. al 2018)[18]unknownmanualDual-energy CT
(Wang et. al 2018)[19]radiomicsmanualMRI
(Zhang et. al 2017)[20]radiomicsmanualMRI
(Leijenaar et. al 2015)[7]radiotherapymanualCT
(Bogowicz et. al 2017b)[12]radiotherapymanual (CT) + automatic (PET)PET/CT
(Vallières et. al 2017)[8]radiotherapymanualPET/CT
(Ouyang et. al 2017)[21]radiotherapymanualMRI
(Van Dijk et. al 2018)[4]radiotherapymanualMRI
(Wenbing et. al 2021)[10]radiotherapymanualPET/CT
Fig. 1

Example of VOI delineation: Radiotherapy (green), Resegmented (purple), and Dedicated (blue) overlayed on a fused FDG-PET/CT image. The blue contour is closer to the true volume of the primary tumor. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

VOI delineation methods used in H&N radiomics studies. Example of VOI delineation: Radiotherapy (green), Resegmented (purple), and Dedicated (blue) overlayed on a fused FDG-PET/CT image. The blue contour is closer to the true volume of the primary tumor. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) A few recent studies used a re-segmentation step of the initial VOI, (e.g. Leger et al. 2019[9] and Wenbing et al. 2021[10]) to remove air and only keep soft tissue. Moreover, several studies including Bogowicz et al. 2017a[11] and Bogowicz et al. 2017b[12] performed a resegmentation step by manually removing slices that contain artifacts and excluding voxels outside the soft tissue window based on Hounsfield Units (HU). The performance evaluation of using automatically generated segmentation for building deep and traditional prognostic models was studied in [13], [14], [15]. Those two studies showed a comparison analysis between the use of manually and automatically generated VOIs. It was reported that fully automatic prognostic models achieved slightly better performance. Beyond the specific domain of H&N radiomics, several studies investigated the stability of radiomics features with regard to VOI delineation. The tumor segmentation step is a critical stage of the radiomics workflow [22]. Information extracted from those delineations and is crucial to extract relevant biomarkers within the VOI while avoiding the inclusion of peripheral non-informative regions or other information than tumoral site [10]. Even more so, most of the features extracted from the VOI are aggregated into a scalar value via an integrative operation [23], with a risk of decreasing the prognostic power of features via the dilution of relevant localized patterns with other unrelated tissue. In Depeursinge et al. 2015 [24], authors used artificial contour perturbations and observed that their model for predicting lung adenocarcinoma recurrence remained stable as long as VOI perturbations are under 4 mm. Other studies investigated the impact of inter-observer delineation on radiomics features [25], [26]. Both studies, based on a single center dataset, demonstrated that most of the radiomics features are unstable under delineation variations. The results show that for different kinds of tumor (e.g., H&N squamous cell carcinoma, non-small cell lung cancers, or malignant pleural mesothelioma) it is possible to find a subset of stable features. However, the prognosis power of this subset was not studied. Huang et al. 2017 [27] observed that both the number of stable features with high prognostic value and their predictive value differed across delineations from three radiologist observers. In this study, we evaluate and compare the Progression-Free Survival (PFS) prognosis performance between radiomics models based on two different VOIs types. We use Radiotherapy delineations which were used for treatment planning as well as Dedicated VOIs. The latter result from the manual re-segmentation of the initial Radiotherapy VOIs to fit the primary tumor as perfectly as possible when based on a fusion of FluoroDeoxyGlucose-Positron Emission Tomography (FDG-PET) and Computed Tomography (CT) images.

Material and methods

Patient data

The dataset used in this work includes the training and test sets of the HEad and NeCK TumOR segmentation in PET/CT images (HECKTOR) 2020 challenge [28], organized as a satellite event of the 23rd International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). The dataset was assembled from five centers and includes 239 cases1. It contains PET/CT images of patients with H&N cancer located in the oropharynx region. The clinical characteristics of the dataset are detailed in Table 2.
Table 2

Overview of the dataset. The centers include Hôpital Général Juif (HGJ), Montréal, CA; Centre Hospitalier Universitaire de Sherbooke (CHUS), Sherbrooke, CA; Hôpital Maisonneuve-Rosemont (HMR), Montréal, CA; Centre Hospitalier de l’Université de Montréal (CHUM), Montréal; Centre Hospitalier Universitaire Vaudois (CHUV), CH.

CenterpatientGenderAge(avg.)T classificationN classificationFollow-up(avg. days)events
HGJ55MaleFemale431262T1T2T3121816N0N1N27739133911
T49N32
CHUS71MaleFemale502162T1T2T363617N0N1N219445124613
T412N33
HMR18MaleFemale14469T1T2T302N0N1N2101612744
T48N31
CHUM55MaleFemale411464T1T2T382517N0N1N2483611207
T45N37
CHUV40MaleFemale35563T1T2T351417N0N1N210247057
T44N33
Overview of the dataset. The centers include Hôpital Général Juif (HGJ), Montréal, CA; Centre Hospitalier Universitaire de Sherbooke (CHUS), Sherbrooke, CA; Hôpital Maisonneuve-Rosemont (HMR), Montréal, CA; Centre Hospitalier de l’Université de Montréal (CHUM), Montréal; Centre Hospitalier Universitaire Vaudois (CHUV), CH. For each patient, a PET/CT image series and two primary Gross Tumor Volume (GTVt) contours are available. We refer to these two types of delineations as Radiotherapy and Dedicated. The former was made for radiotherapy planning by experts in radiotherapy. Details about these annotations can be found in [8], [28]. The Radiotherapy contours are potentially not suitable for radiomics studies as they are often larger than the true tumoral volume, considering peripheral tissues and trachea. For this reason, these contours were re-delineated as close as possible to the true tumoral volume in the context of the HECKTOR 2020 challenge [28]. The re-delineation aims at contouring the entire edges of the morphological anomaly, visualized as a mass effect in the non-enhanced CT, for the corresponding hypermetabolic volume in the PET. The contouring excludes the hypermetabolic activity projecting outside the physical limits of the lesion, e.g., lumen of the airway or bony structures with no morphologic evidence of local invasion.

Feature extraction

In this section, we describe the extraction of features from the PET/CT images prior to model building. We preprocessed both PET and CT images with iso-resampling of 2 × 2 × 2 mm voxels using linear interpolation. This step is performed before feature extraction. In order to compare the performance using either Radiotherapy or Dedicated contours in the context of survival analysis, we used a classical radiomics pipeline. Following the preprocessing step, we extracted features from both PET and CT image series based on either Radiotherapy or Dedicated VOIs using the PyRadiomics library [29]. In addition, we extracted features with a Resegmented VOI initially based on Radiotherapy VOI. The re-segmentation step was achieved by thresholding CT images between [−300,200] HU to only keep soft tissue. This re-segmentation step was used to investigate the importance of expert knowledge when contouring the true tumoral volume when compared to e.g., simple air and high-density tissue removal. An example of this new segmentation is illustrated (in purple) in Fig. 1. Table 3 details the features families and extraction parameters used in this study. A total of 130 features were extracted per modality with additional 14 shape features. For each patient and for each contour type, we, therefore, computed a total of 274 features2. From those two modalities per patient (CT and PET), we extracted features from the first-order (18 features) and second-order (56 features) families. Regarding the second-order, we extracted the 56 features using two different binning strategies based on Fixed Bin Number (FBN) and Fixed Bin Size (FBS) (as detailed in Table 3). Those 56 features were divided into three subfamilies, namely Grey Level Co-occurrence Matrix(GLCM), Grey Level Run Length Matrix (GLRLM), and Grey Level Size Zone Matrix (GLSZM). Finally, we computed 14 shape features.
Table 3

List of the different combinations of parameters and features.

ImagePreprocessingBinningFeatures
CTIso-resampling2x2x2mmLinear interpolationFBN = 32FBS = 50GLCM (24)GLRLM (16)GLSZM (16)
First Order (18)
Shape (14)
PETIso-resampling2x2x2mmLinear interpolationFBN = 8FBS = 1GLCM (24)GLRLM (16)GLSZM (16)
First Order (18)
List of the different combinations of parameters and features.

Univariate analysis

To compare the two types of delineation, we first performed a univariable analysis to investigate the stability of radiomics features regarding the type of VOI used. This analysis is independent of the radiomics model workflow. We computed the two-way mixed single measure Intraclass Correlation Coefficient (ICC(3,1)) [30] for every single feature and for both modalities to assess their stability when extracted from either Dedicated or Radiotherapy VOIs. The ICC is a statistical indicator that gives information about the consistency of feature measurements. A value of zero indicates no reliability whereas a value of one means that the measurements are perfectly stable. This univariable analysis allows revealing which kind of feature is more affected by a change of VOI. We also computed the univariable C-index value of each feature to quantify its association with the PFS outcome. We also further used the results of these univariable C-indexes to select features for the multivariable model.

Multivariable analysis

The pipeline of the multivariable radiomics analysis used to estimate the influence of using Radiotherapy or Dedicated contours on the PFS prediction performance is depicted in Fig. 2.
Fig. 2

Flow chart of the proposed radiomics analysis. Univariable steps are shown in green and multivariable analyses in gray. We repeated those steps 100 times with random splits to define training/validation (80%) and test (20%) sets using a stratified shuffle split method. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Flow chart of the proposed radiomics analysis. Univariable steps are shown in green and multivariable analyses in gray. We repeated those steps 100 times with random splits to define training/validation (80%) and test (20%) sets using a stratified shuffle split method. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) First (1), we pooled the image data from the five centers and randomly divided into a training/validation (80%) cohort and a testing (20%) cohort using a stratified shuffling method where the stratification criterion is the PFS outcome. This first split was repeated 100 times and we used the same splits of each repetition to statistically compare the results between the two contour types. Second (2), we computed the univariable C-index [31] of each feature based on the training dataset and (3) transformed this value (i.e. |Cindex − 0.5)|) to keep both concordant and anti-concordant features. (4) We used the resulting C-index to rank the features based on concordance with the outcome and retained the top 20 concordant features. The number 20 was used to respect a ten to one ratio between the number of features and the number of patients. We then used a grid-search (5) method to determine the feature correlation threshold value: t ∈ {0.6,0.65,0.70,0.75,0.80}. We used a stratified 5-folds cross-validation method to divide the sub-dataset into a train (80%) and a validation (20%) dataset. This step avoids basing the models on highly correlated feature sets. Based on this feature set, we trained a Cox proportional hazards model [32] (from scikit-survival [33] V0.14.1 in Python) on the training set to predict the hazard score and further computed the C-index on the validation set, as the performance measure to estimate the performance of this survival analysis. After selecting the best performing model during grid-search, (6) we applied it to the test set, and (7) computed the test C-index value. The code used to compute this pipeline is available on GitHub (https://github.com/Pierre1d6/CleanedContours.git).

Results

Influence of VOI types on feature stability

We first compared the stability of the features across the Radiotherapy and Dedicated types of VOIs, grouping features based on their family and image modality. The significance of stability comparisons between feature families, imaging modalities, and VOI types is assessed using a Student t-test. The associated results are detailed in Fig. 3. We observe that features from PET images are more stable than those from CT images (p < 0.001, see Fig. 3a). When further looking at stability differences between feature families, we observe that shape features are the most stable across the five families with a median ICC around 0.7. Fig. 3b confirms the better stability of features regardless of their family when extracted from PET images. GLSZM features achieved the lowest stability (median ICC3 < 0.4) both in PET and CT images. These observations are further interpreted in Section 5.
Fig. 3

Feature stability comparison when extracted from either Radiotherapy or Dedicated VOIs.

Feature stability comparison when extracted from either Radiotherapy or Dedicated VOIs.

Multivariable prognostic models

We applied the multivariable radiomics workflow described in Section 3.4 and report the results in Fig. 4.
Fig. 4

C-index values for the three VOI types. These results are obtained from 100 repetitions of the radiomics pipeline depicted in Fig. 2.

C-index values for the three VOI types. These results are obtained from 100 repetitions of the radiomics pipeline depicted in Fig. 2.

Discussions and conclusion

In this work, we studied the impact of using Dedicated VOIs in the context of H&N radiomics studies in PET/CT that are specifically fitted to the GTVt volume, as compared to reusing VOIs directly from radiotherapy treatment planning. We first investigated the stability of the features regarding their family type and imaging modality. Fig. 3a and 3c suggest that the features are overall more stable when computed on PET images. This can be explained by the difference in terms of value range between PET (≈[0,25] Standardized Uptake Value, SUV) and CT (≈[-1000, 1000] HU when including air from the trachea). Therefore, including peritumoral regions has a stronger impact on features extracted from the CT images, with air contained in the trachea around GTVt having much lower values in CT (-1000 HU) than in PET (0 SUV) when compared to voxel statistics inside GTVt. In addition, spatial deviations of the contours result in smaller differences in the PET because of the lower resolution when compared to CT. Fig. 3c reports the stability of features per family and across modalities (PET or CT). In PET and for first-order features, a high median value and high variability are observed. When focusing on specific first-order features, we observed that the maximum was the most stable feature (ICC3 = 0.98) because there is no high SUV activation around the tumor and the maximum SUV is almost always in both VOIs. However, the minimum was one of the least stable features (ICC3 = 0.2), which can be explained by the fact that the Radiotherapy VOI is generally larger than the Dedicated VOI and therefore includes lower SUV values. Regarding the second-order families, all GLCM, GLRLM, and GLSZM feature sets were overall unstable (see Fig. 3b). When looking closely at Fig. 3c, however, the stability was larger in PET images, particularly for GLCM and GLRLM features. For GLSZM, the stability was mostly low in both imaging modalities. No specific parameter optimization was performed in the feature extraction step. Therefore, the use of default parameters may explain the poor stability of those texture features. In this context of H&N cancer, we observed that survival models based on Dedicated contours achieved better performance for predicting PFS and led to improved patient risk stratification in comparison to using Radiotherapy contours. It is worth noting that using the standard uncorrected student’s t-test yielded a p-value close to 0 (8.51·10–8). We feel that reporting the latter is important as many studies in the field do not use corrections, breaking the independence assumption of the t-test as the repeated random splits are containing overlapping observations. Therefore, according Benavoli et. al [34], we performed a Bayesian approach to assess the performance significance between those two model. Thus, we computed the probability density function of the difference between the results of each model (C-index dedicated contours – C-index radiotherapy contours). Then we calculated the integral of the posterior on the interval (0, +∞) and we obtained a value of 0.893. In other words: the probability of dedicated VOI model being more accurate (C-index) than Radiotherapy VOI model is 89.3%, suggesting that 9 times over 10, a model based on dedicated ROIs will outperform the model based on radiotherapy ROIs. And so, by using this more appropriate approach we can conclude from the statistical analysis that the use of dedicated VOIs significantly improved the prediction performance. It is also worth noting that the cleaning process was based on manual re-segmentation and may not be suitable for large-scale studies. We estimated duration of 20 to 30 min to perform the VOI cleaning stage for one patient. Moreover, adding an automatic re-segmentation step (Resegmented VOIs) based on fixed ranges of values did not improve the overall performance. The average C-index was higher than when we use the Radiotherapy VOIs but the Inter Quartile Range (IQR) is almost 2 times bigger and the average was lower. We also recognize some limitations of this work. First, the workflow proposed in this study may not be fully optimized for this task. As an example, we did not explore filter-based radiomics features [35], [36]. Liu et al. [37] and other studies reported a better predictive performance to model PFS in H&N cancer. However, while the performance can- not be directly compared, the goal of this study was not to find the best model to predict PFS but to focus on the performance comparison between Dedicated and Radiotherapy contours using the classical radiomics approach. In future work, we will apply this workflow to combine clinical patient data (e.g. age, gender, smoking status, tumor site) and radiomics features in order to further improve the prognosis performance of the model.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  30 in total

1.  Parotid gland fat related Magnetic Resonance image biomarkers improve prediction of late radiation-induced xerostomia.

Authors:  Lisanne V van Dijk; Maria Thor; Roel J H M Steenbakkers; Aditya Apte; Tian-Tian Zhai; Ronald Borra; Walter Noordzij; Cherry Estilo; Nancy Lee; Johannes A Langendijk; Joseph O Deasy; Nanna M Sijtsema
Journal:  Radiother Oncol       Date:  2018-06-26       Impact factor: 6.280

2.  Computed Tomography Radiomics Predicts HPV Status and Local Tumor Control After Definitive Radiochemotherapy in Head and Neck Squamous Cell Carcinoma.

Authors:  Marta Bogowicz; Oliver Riesterer; Kristian Ikenberg; Sonja Stieb; Holger Moch; Gabriela Studer; Matthias Guckenberger; Stephanie Tanadini-Lang
Journal:  Int J Radiat Oncol Biol Phys       Date:  2017-06-15       Impact factor: 7.038

3.  Comparison of PET and CT radiomics for prediction of local tumor control in head and neck squamous cell carcinoma.

Authors:  Marta Bogowicz; Oliver Riesterer; Luisa Sabrina Stark; Gabriela Studer; Jan Unkelbach; Matthias Guckenberger; Stephanie Tanadini-Lang
Journal:  Acta Oncol       Date:  2017-08-18       Impact factor: 4.089

4.  Spectral multi-energy CT texture analysis with machine learning for tissue classification: an investigation using classification of benign parotid tumours as a testing paradigm.

Authors:  Eiman Al Ajmi; Behzad Forghani; Caroline Reinhold; Maryam Bayat; Reza Forghani
Journal:  Eur Radiol       Date:  2018-01-02       Impact factor: 5.315

5.  Radiomics Features of Multiparametric MRI as Novel Prognostic Factors in Advanced Nasopharyngeal Carcinoma.

Authors:  Bin Zhang; Jie Tian; Di Dong; Dongsheng Gu; Yuhao Dong; Lu Zhang; Zhouyang Lian; Jing Liu; Xiaoning Luo; Shufang Pei; Xiaokai Mo; Wenhui Huang; Fusheng Ouyang; Baoliang Guo; Long Liang; Wenbo Chen; Changhong Liang; Shuixing Zhang
Journal:  Clin Cancer Res       Date:  2017-03-09       Impact factor: 12.531

6.  Computational Radiomics System to Decode the Radiographic Phenotype.

Authors:  Joost J M van Griethuysen; Andriy Fedorov; Chintan Parmar; Ahmed Hosny; Nicole Aucoin; Vivek Narayan; Regina G H Beets-Tan; Jean-Christophe Fillion-Robin; Steve Pieper; Hugo J W L Aerts
Journal:  Cancer Res       Date:  2017-11-01       Impact factor: 12.701

7.  CT-based radiomic signatures for prediction of pathologic complete response in esophageal squamous cell carcinoma after neoadjuvant chemoradiotherapy.

Authors:  Zhining Yang; Binghui He; Xinyu Zhuang; Xiaoying Gao; Dandan Wang; Mei Li; Zhixiong Lin; Ren Luo
Journal:  J Radiat Res       Date:  2019-07-01       Impact factor: 2.724

8.  The importance of feature aggregation in radiomics: a head and neck cancer study.

Authors:  Pierre Fontaine; Oscar Acosta; Joël Castelli; Renaud De Crevoisier; Henning Müller; Adrien Depeursinge
Journal:  Sci Rep       Date:  2020-11-12       Impact factor: 4.379

9.  Radiomics: Images Are More than Pictures, They Are Data.

Authors:  Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal:  Radiology       Date:  2015-11-18       Impact factor: 11.105

10.  Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer.

Authors:  Martin Vallières; Emily Kay-Rivest; Léo Jean Perrin; Xavier Liem; Christophe Furstoss; Hugo J W L Aerts; Nader Khaouam; Phuc Felix Nguyen-Tan; Chang-Shu Wang; Khalil Sultanem; Jan Seuntjens; Issam El Naqa
Journal:  Sci Rep       Date:  2017-08-31       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.