Glioblastoma is an aggressive and fast-growing brain tumor with poor prognosis. Predicting the expected survival of patients with glioblastoma is a key task for efficient treatment and surgery planning. Survival predictions could be enhanced by means of a radiomic system. However, these systems demand high numbers of multicontrast images, the acquisitions of which are time consuming, giving rise to patient discomfort and low healthcare system efficiency. Synthetic MRI could favor deployment of radiomic systems in the clinic by allowing practitioners not only to reduce acquisition time, but also to retrospectively complete databases or to replace artifacted images. In this work we analyze the replacement of an actually acquired MR weighted image by a synthesized version to predict survival of glioblastoma patients with a radiomic system. Each synthesized version was realistically generated from two acquired images with a deep learning synthetic MRI approach based on a convolutional neural network. Specifically, two weighted images were considered for the replacement one at a time, a T2w and a FLAIR, which were synthesized from the pairs T1w and FLAIR, and T1w and T2w, respectively. Furthermore, a radiomic system for survival prediction, which can classify patients into two groups (survival >480 days and ≤ 480 days), was built. Results show that the radiomic system fed with the synthesized image achieves similar performance compared with using the acquired one, and better performance than a model that does not include this image. Hence, our results confirm that synthetic MRI does add to glioblastoma survival prediction within a radiomics-based approach.
Glioblastoma is an aggressive and fast-growing brain tumor with poor prognosis. Predicting the expected survival of patients with glioblastoma is a key task for efficient treatment and surgery planning. Survival predictions could be enhanced by means of a radiomic system. However, these systems demand high numbers of multicontrast images, the acquisitions of which are time consuming, giving rise to patient discomfort and low healthcare system efficiency. Synthetic MRI could favor deployment of radiomic systems in the clinic by allowing practitioners not only to reduce acquisition time, but also to retrospectively complete databases or to replace artifacted images. In this work we analyze the replacement of an actually acquired MR weighted image by a synthesized version to predict survival of glioblastoma patients with a radiomic system. Each synthesized version was realistically generated from two acquired images with a deep learning synthetic MRI approach based on a convolutional neural network. Specifically, two weighted images were considered for the replacement one at a time, a T2w and a FLAIR, which were synthesized from the pairs T1w and FLAIR, and T1w and T2w, respectively. Furthermore, a radiomic system for survival prediction, which can classify patients into two groups (survival >480 days and ≤ 480 days), was built. Results show that the radiomic system fed with the synthesized image achieves similar performance compared with using the acquired one, and better performance than a model that does not include this image. Hence, our results confirm that synthetic MRI does add to glioblastoma survival prediction within a radiomics-based approach.
area under the curvemultimodal brain tumor segmentationconvolutional neural networkedemaenhancing tumorintra‐class correlation coefficientinterquartile rangeinversion recovery spin echoKarnofsky performance statuslogistic regressionmean absolute errormaximum relevance–minimum redundancymean squared errornon‐enhancing tumorproton densitypositron emission tomographypeak signal to noise ratioregion of interestradiomic systemspin echosurvival predictionstructural similarity indexsupport vector machinetumor coreThe Cancer Imaging ArchiveweightedT1w post‐contrastweightedwhole tumorextreme gradient boosting
INTRODUCTION
Glioblastoma is an aggressive and fast‐growing brain tumor, which comprises the most common kind of malignant brain tumor.
Recently, several advances have been achieved in precision oncology and immunotherapy.
However, the overall survival remains poor, with approximately 40% survival in the first year after diagnosis and 17% in the second year.
Thus, survival prediction (SP) in glioblastoma prognosis is a key task for efficient treatment and surgery planning.Factors commonly used for SPs are age, sex, extent of resection, or Karnofsky performance status (KPS)
—a functional status metric for activities of daily living.
Although these factors have shown ability to predict survival rates, their results are still poor and nowadays there is no clinical standard based on these metrics.
Moreover, some of these factors are subjective (e.g., KPS), so clinicians are forced to rely on their previous experience and intuition. Therefore, a quantitative predictive tool is still an unmet need.Radiomics and medical images are two major actors to achieve such a quantitative predictive tool. Radiomics
is a discipline that consists in the extraction of a large number of quantitative features from the images, the selection of a subset of them according to some quality criteria, and the design of an inference engine that will carry out the clinical predictions with the remaining features as inputs. Radiomics has been an useful technique, pioneering in the oncology field, for quantitative decision making in diagnosis, prognosis, and therapeutic response.
In particular, radiomic systems (RSs) for SP in glioblastoma could enhance patient management by treatment personalization, giving rise to better outcomes.Regarding the second actor, MRI is a widely used medical imaging technique in different fields—and, particularly, in neuro‐oncology—for non‐invasive diagnosis and evaluation of disease progression. An MRI scan protocol typically consists of several pulse sequences that provide images with different contrasts, each of which is referred to as a weighted image, and they are collectively referred to as multicontrast images. Each image weighting provides complementary information for diagnosis, since different tissue properties may be more clearly visible in each of them.
The use of multicontrast images is also crucial for radiomics. However, multicontrast acquisitions are time consuming, which results in patient discomfort and artifact‐prone protocols. For protocol shortening, the emerging field of synthetic MRI lends itself to become a cornerstone.Synthetic MRI comprises methodologies pursued to computationally synthesize realistic MR images from a set of actually acquired images. This discipline has been boosted by deep learning techniques. Synthetic MRI has many potential applications as for image database management, such as retrospective completion of databases with missing weighted images or replacement of artifacted images. Additional applications such as data harmonization
or data augmentation
,
for segmentation or classification algorithms have recently been proposed. As for multicontrast acquisitions, synthetic MRI may be applied to replace some acquisitions with their synthesized versions, leading to MR protocol shortening, patient wellbeing and higher efficiency. To make this possible, synthesized MR images should be of sufficient quality, and this is usually assessed both visually and with specific image quality assessment metrics. However, it is also essential to measure how these synthesized images will perform with quantitative algorithms.In this work we propose and quantify the application of synthetic MRI to improve a radiomics approach for SP in glioblastoma; both the RS and the image synthesis method are original and the details of their design are fully described. Our purpose is to show that an RS that incorporates an input channel fed by a synthesized image (a) behaves similarly to this system when it is fed with an acquired image and (b) undoubtedly outperforms an RS that does not have this channel. Hence, we validate an MR protocol shortening procedure by means of a glioblastoma SP radiomics‐based application. Two weighted images are considered for the synthesis, namely, FLAIR and
weighted (T2w). We synthesize these images by means of an improved version of our previous deep learning approach for relaxometry maps synthesis.
Our results allow us to state that synthetic MRI does add to glioblastoma SP within a radiomics‐based approach.
Related work
The quality assessment of synthesized images generally focuses on their usage in qualitative applications, and it remains to be verified whether quantitative algorithms can reliably work with these synthesized images. In particular, to the best of our knowledge, the performance of synthesized images with a clinical endpoint in radiomics applications has not been thoroughly tested.Recent works make use of synthesized images to improve subsequent quantitative image analysis algorithms. In Reference
an MRI synthesis method was presented, and the quality of a tumor segmentation algorithm was used as a benchmark to compare different synthesis procedures. Furthermore, in Reference
a generative adversarial network was trained for MRI synthesis and was then used for data augmentation in Parkinson's disease classification. The usage of synthesized data showed an improvement in the classification performance. Moreover, Sikka et al
proposed the synthesis of positron emission tomography (PET) images from MRI. The synthesized PET images were then included in an Alzheimer's disease classification task, for which a relevant accuracy improvement was also shown.Regarding the application of synthesized images in glioblastoma, we are aware of two contributions
,
that have used synthesized images as input. In Reference
image synthesis is used to fill missing contrasts in MRI glioma databases. The overall performance is measured by means of tumor segmentation accuracy, as well as by the merit figures of two RSs, one for tumor grading and the other one for isocitrate dehydrogenase‐1 (IDH1) status prediction. Completion of the databases with synthesized images produced an improvement in the results of these tasks. However, synthesized images are apparently used for training and testing of the RSs, which implies a coupling between these systems and the corresponding synthesis method.Islam et al
proposed a radiogenomics system for overall SP in glioblastoma using an MRI synthesis method to complete the missing images in the dataset. Tumor segmentation and overall SP were tested with the synthesized data. Nevertheless, the classifiers themselves were trained on the synthesized images; hence, this methodology is a data augmentation procedure and benefits in SP may arise not only from the quality of the synthesized images, but also from the fact that more data are used, as pursued in data augmentation. Hence, this methodology does not provide sufficient evidence that acquired images can be replaced by their synthesized versions in a quantitative radiomic application, since two effects are coupled. Furthermore, the authors predominantly use morphological characteristics rather than intensity or texture‐based features, although the latter are more directly related to the intensity image values so as to measure the impact of the synthesized images. The aforementioned limitations of Reference
are overcome in the present work, since our RS is solely trained with acquired images while synthesized images are exclusively used for testing. In addition, intensity and texture features are also considered in the RS that we propose.
MATERIALS AND METHODS
In this work we propose the application of synthetic MRI to improve an RS for SP in glioblastoma. Following the flow provided in Figure 1, Section 2.1 describes the datasets used in this work and Section 2.2 the data preprocessing stage. Feature extraction and selection as well as classifier training is described in Section 2.3. Section 2.4 describes the procedure for the synthesis of weighted images, which enter the RS exclusively at test time through the feature calculation block. Finally, Section 2.5 describes the three experiments we have carried out to obtain Results I, II, and III listed in Figure 1.
FIGURE 1
Workflow of the proposed approach. Initially, patients are divided into training (175 patients) and testing (24 patients) sets. The preprocessing pipeline segments tumors and normalizes the contrast intensity. Features are retrieved from the segmented regions of interest (ROIs). After feature selection, relevant features are retained. Five machine learning models for SP were examined. Three different experiment configurations (referred to throughout the text as Experiments/Results I, II, and III) were defined for comparative performance assessment
Workflow of the proposed approach. Initially, patients are divided into training (175 patients) and testing (24 patients) sets. The preprocessing pipeline segments tumors and normalizes the contrast intensity. Features are retrieved from the segmented regions of interest (ROIs). After feature selection, relevant features are retained. Five machine learning models for SP were examined. Three different experiment configurations (referred to throughout the text as Experiments/Results I, II, and III) were defined for comparative performance assessment
Datasets and MR image acquisition
Four different datasets of glioblastoma patients were used in this work. Two of them are publicly available, namely, the BraTS2020 (multimodal brain tumor segmentation)
Challenge dataset, and the datasets reachable through TCIA (The Cancer Imaging Archive)
—which, in turn, consist of thee sources, namely, the Ivy Glioblastoma Altas Project (Ivy‐GAP), the Clinical Proteomic Tumor Analysis Consortium Glioblastoma Multiforme (CPTAC) and The Cancer Genome Atlas Glioblastoma Multiforme (TCGA). The other two (Dataset22 and Dataset24), are private datasets acquired in Hospital Universitario Río Hortega, Valladolid, Spain, and Hospital Universitario 12 de Octubre, Madrid, Spain, respectively. Details of the datasets can be found in Table 1. From all the datasets, we only included those patients (199 patients in total) in whom gross total resection (100% of the enhancing tumor (ET) volume) or near‐total resection (>95% of the ET volume) could be performed. The cases selected from the public datasets are referenced in Supporting Information Table S1.
TABLE 1
Datasets used in this work. BraTS2020 and TCIA are public datasets, whereas Dataset22 and Dataset24 are private datasets. The number of patients in each dataset is denoted by
. Age is shown as mean
standard deviation. Survival is defined as the time in days from diagnosis to death (censored = 0) or to the last date the patient was known to be alive (censored = 1). The percentages of patients with survival less than 16 months (survival < 16 M) for the different datasets are also displayed (16 months or, equivalently, 480 days)
Dataset
n
Age
Survival (IQR)
% Censored = 1
Survival < 16 M
BraTS2020
119
62
± 12
374 (364)
0%
65.6 %
TCIA
34
60
± 10
521 (482)
5.9 %
58.8 %
Dataset22
22
65
± 10
451 (307)
22.7 %
59.1 %
Dataset24
24
57
± 13
552 (218)
29.2 %
54.2 %
Whole datset
199
60
± 11
447 (346)
7.0 %
62.3 %
Datasets used in this work. BraTS2020 and TCIA are public datasets, whereas Dataset22 and Dataset24 are private datasets. The number of patients in each dataset is denoted by
. Age is shown as mean
standard deviation. Survival is defined as the time in days from diagnosis to death (censored = 0) or to the last date the patient was known to be alive (censored = 1). The percentages of patients with survival less than 16 months (survival < 16 M) for the different datasets are also displayed (16 months or, equivalently, 480 days)For each patient, four MR structural weighted images—
weighted (T1w), T2w, FLAIR, and T1w post‐contrast (T1w‐c)—were available. All the acquisitions of the private datasets were performed with IRB approval and informed written consent. See Table 2 for details of the acquisition parameters.
TABLE 2
All MRI sessions are composed of four structural weighted images, namely, a T1w, a T2w, a FLAIR, and a T1w‐c. Details of the scanner and the acquisition parameters are provided if available. The acquisition parameters are echo time (
), repetition time (
), and inversion time (
)
Dataset
Scanner
T1w
T2w
FLAIR
T1w‐c
BraTS2020
19 institutions
NA
NA
NA
NA
TCIA
8 institutions
TE = 2.75–19 ms
TE = 15–120 ms
TE = 34.6–155 ms
TE = 2.1–20 ms
TR = 352–3379 ms
TR = 700–6370 ms
TR = 6000–11 000 ms
TR = 4.9–3285 ms
Dataset22
1.5 T GE
TE = 6.33–12 ms
TE = 99–110 ms
TE = 120–127 ms
TE = 2.56 ms
and
TR = 360–800 ms
TR = 2680–8480 ms
TR = 6000–8000 ms
TR = 7.96 ms
1.5 T Philips
TI = 2000 ms
Dataset24
1.5 T GE
TE = 1.83 ms
TE = 122 ms
TE = 142 ms
TE = 2.18 ms
TR = 5.98 ms
TR = 4162 ms
TR = 9350 ms
TR = 6.85 ms
TI = 2200 ms
All MRI sessions are composed of four structural weighted images, namely, a T1w, a T2w, a FLAIR, and a T1w‐c. Details of the scanner and the acquisition parameters are provided if available. The acquisition parameters are echo time (
), repetition time (
), and inversion time (
)
Data preprocessing
All weighted images were first co‐registered to 1 mm3 isotropic resolution and skull‐stripped
following BraTS preprocessing in CaPTk.
Then, denoising
followed by N4 bias correction
were performed in order to obtain white matter
and tumor segmentations. nNUnet
was utilized to segment the tumor into three distinct regions (ET, non‐enhancing tumor (NET), and edema (ED)) for feature extraction. On the other hand, N4 bias correction
was applied on the skull‐stripped images and these were next normalized by dividing each by the mean intensity of the white matter region contralateral to the tumor. This latter pipeline produces the images input to the RS and to the synthesis procedure. Figure 2 depicts the preprocessing pipeline.
FIGURE 2
Preprocessing pipeline. Initial images are first co‐registered and skull‐stripped following the CaPTk pipeline. Afterwards, the different contrast images are denoised and bias‐corrected before obtaining the white matter and tumor segmentations. Finally, skull‐stripped images are bias‐corrected and intensity normalized using the segmentations
Preprocessing pipeline. Initial images are first co‐registered and skull‐stripped following the CaPTk pipeline. Afterwards, the different contrast images are denoised and bias‐corrected before obtaining the white matter and tumor segmentations. Finally, skull‐stripped images are bias‐corrected and intensity normalized using the segmentations
RS for SP of glioblastoma patients
Our RS was trained to classify patients according to the survival criterion (survival
480 days). The threshold of 480 days (i.e., 16 months) was chosen in order to achieve a balance between group sizes in the test dataset. BraTS2020, TCIA, and Dataset22 (175 patients in total, see Table 1) were used to train the RS. Dataset24 (24 patients) was used for testing in coordination with the synthesis method described in Section 2.4.Starting from a total of 117 088 handcrafted features extracted from the weighted images, we trained the RS following a nested cross‐validation scheme (outer, 5‐fold; inner, 10‐fold). Feature selection methods
,
were repeated in each outer split to reduce the possible bias produced if training were done on a single cross‐validation split. For each outer split, the model with the lowest Brier loss in the inner split was chosen. Note that five models were selected for the following screening due to the fivefold decision of the outer split. Each of these models was then validated with the validation data corresponding to its outer split. Finally, the model with the best performance, in terms of area under the curve (AUC), was selected. The complete pipeline is outlined in Appendix A.Using the methodology previously outlined, the best model for each of the three scenarios described below was chosen.When the four weighted images (i.e., T1w, T2w, FLAIR, and T1w‐c) were used as input of the RS, the selected model turned out to be an extreme gradient boosting (XGB) with 17 features, two of which are from FLAIR and two from T2w.When the channel fed with FLAIR was discarded, the resulting model was a logistic regression (LR) classifier with 16 features.When the channel fed with T2w was discarded, a support vector machine (SVM) classifier with 16 features was selected.Hereinafter, these three models are termed XGB17, LR16, and SVM16, respectively. Features selected for each of the previous models are listed in Supporting Information Tables S4, S5, and S6, respectively.
Synthesis method using a self‐supervised CNN
In Reference
we proposed a synthetic MRI approach for computation of the
,
, and proton density (PD) relaxometry maps and the synthesis of different weighted images from only a pair of inputs. A U‐Net convolutional neural network (CNN) trained with synthetic data was employed. However, some synthesized weightings, such as FLAIR, presented relatively low quality presumably due to the exclusively synthetic training. Thus, if only a few image contrasts are of interest, this issue can be overcome by extending the CNN
into a self‐supervised CNN to be trained with acquired images with the desired contrast. Such an extension has been performed in this work and is graphically represented in Figure 3.
FIGURE 3
Overview of the self‐supervised CNN. The inputs of the network are T1w and T2w for the synthesis of FLAIR, and T1w and FLAIR for the synthesis of T2w. Note that all the switches change depending on the weighting we want to synthesize. The lambda layers implement the ideal equations indicated in Appendix B
Overview of the self‐supervised CNN. The inputs of the network are T1w and T2w for the synthesis of FLAIR, and T1w and FLAIR for the synthesis of T2w. Note that all the switches change depending on the weighting we want to synthesize. The lambda layers implement the ideal equations indicated in Appendix BThe original CNN was configured with two encoders for the two input weighted images, namely, T1w and T2w or T1w and FLAIR (in the case of synthesizing FLAIR or T2w, respectively). Then, the latent representations of each encoder were fused using a pixel‐wise max function. Finally, we configured three decoders for generation of the
,
, and PD relaxometry maps. In order to extend the CNN into a self‐supervised CNN (see Figure 3) we have included a non‐trainable lambda layer after the decoder's output. This lambda layer implements ideal equations that describe the MR intensity of the output weighted image as a function of the relaxometry maps and the acquisition parameters. These well known equations can be found in Appendix B. The loss function used to train the self‐supervised CNN, named
, is computed in the weighted image domain as the average of the mean absolute error (MAE) between each acquired image and its synthesized counterpart. Specifically, let
denote the intensity value of the
th acquired image at pixel
, defined in some domain
, and let
be the synthesized image at that pixel location. Then,
with
and
vectors that represent the image intensity values of the acquired and synthesized images respectively in all the pixels belonging to domain
with cardinality
. Then, the loss function is defined as
with
the overall number of images entering the average (i.e., the batch size).Dataset24 (see Table 1) was used to train the self‐supervised CNN following a leave‐one‐out scheme (i.e., a total of 24 models were trained). To train each model, one patient was used for testing and the remaining 23 patients were randomly split between training (18 patients, approximately 80%) and early‐stopping validation (5 patients, approximately 20%).During training, the loss function was optimized using the Adam algorithm with a learning rate
of 1
10−4. Further, we empirically fixed the batch size to 32 images. We ran the code using the TensorFlow backend
on a single NVidia GeForce GTX 1070. The total learning took approximately 1 h of computation time for each model, but execution reduces to a few seconds once the network is fully trained.
Experiments
We carried out three test experiments to assess the performance of using synthesized images as input of an RS for SP. In all of them the RS was tested with Dataset24. These experiments are detailed next.XGB17 was tested with the acquired T1w, T2w, FLAIR, and T1w‐c images as inputs.XGB17 was tested replacing one of the acquired inputs (FLAIR and T2w, one at a time) by its synthesized version. Therefore, in this experiment three of the inputs of the RS were acquired and one was synthesized. As previously stated, these synthesized images were the test images from the leave‐one‐out scheme of the synthesis method, so no overlap between the training and testing splits occurs in either the synthesis or in the RS.Models LR16 and SVM16 were tested without considering as input FLAIR or T2w, respectively. Note that the RSs used in this third experiment have been built with only three input channels.Performance assessment is twofold. On the one hand, we evaluated the quality of the synthesized images. In addition to visual assessment, we also carried out a quantitative analysis using the well known measures
mean squared error (MSE), structural similarity index (SSIM), and peak signal to noise ratio (PSNR). These metrics have been defined within a 3D domain between the synthesized and the acquired images, specifically, within the smallest cube that comprises the foreground of each volume; hence
in Equation (1) is the intersection between this domain and the
th acquired image. Thereafter, the mean and standard deviation values across patients were calculated.On the other hand, in order to compare the performance of the RS with the different experiment configurations, we computed AUC, accuracy, precision, recall, and
‐score as classifier performance metrics. These metrics were reported as the average value over the two classification classes. Additionally, for the sake of completeness, we analyzed the predicted probabilities of survival obtained at the output of the RS for the pairs Experiments I–II and I–III. To this end, we calculated the
value, customarily used in linear regression, to measure how predicted probabilities of Experiments II and III deviated from the identity function at abscissae equal to the probabilities of Experiment I. The intra‐class correlation coefficient (ICC)
between these pairs was also measured and boxplots of the probability differences for these pairs were constructed.
RESULTS
Figure 4 shows a representative slice of the synthesized and the corresponding acquired weighted images for several test glioblastoma patients. Overall, the synthesized images are close to the acquired versions regarding structural information and contrast between tissues in both healthy and pathological regions. In particular, the contours and intensities of the different lesion areas are similar in both. Note that in Patient 3 the synthesized FLAIR image does not suffer from motion artifacts, which are present in the acquired image.
FIGURE 4
A representative axial slice of the images synthesized by the self‐supervised CNN for different test patients of Dataset24. A, Synthesized FLAIR images. B, Corresponding actually acquired FLAIR images. C, Synthesized T2w images. D, Corresponding actually acquired T2w images
A representative axial slice of the images synthesized by the self‐supervised CNN for different test patients of Dataset24. A, Synthesized FLAIR images. B, Corresponding actually acquired FLAIR images. C, Synthesized T2w images. D, Corresponding actually acquired T2w imagesAdditionally, Table 3 shows the mean and standard deviation values computed across patients of the synthesis quality metrics between synthesized and acquired images for FLAIR and T2w. SSIM is a value ranging between 0 and 1, and the value 1 is only achievable for two identical images. The high values of PSNR and the low values of MSE show low error between the synthesized and the acquired weighted images for both FLAIR and T2w. All the metrics improve considerably with respect to those obtained by Moya‐Sáez et al
for FLAIR. Note that the comparison of the values obtained for T2w is not representative since this weighted image was input to the CNN in that work.
TABLE 3
Synthesis quality metrics used to evaluate the capability to synthesize FLAIR and T2w weighted images. These metrics are the MSE, SSIM, and PSNR. Mean and standard deviation (between parentheses) computed across patients are reported. The metrics were calculated between both the synthesized and the acquired images
MSE
SSIM
PSNR
FLAIR
0.0163
0.7595
23.7975
(0.0104)
(0.0474)
(2.0011)
T2w
0.0742
0.7845
25.8934
(0.0319)
(0.0616)
(1.9032)
Synthesis quality metrics used to evaluate the capability to synthesize FLAIR and T2w weighted images. These metrics are the MSE, SSIM, and PSNR. Mean and standard deviation (between parentheses) computed across patients are reported. The metrics were calculated between both the synthesized and the acquired imagesFigure 5 shows the AUC, accuracy, precision, recall, and
‐score achieved for the RS for the three different experiment configurations (Experiments I, II, and III defined in Section 2.5). All the metrics are substantially better in the case of using a synthesized image rather than using a system without such a weighted image for both FLAIR and T2w. The comparison between using an acquired and a synthesized image shows that the performance of the system does not diminish in terms of AUC, and only suffers from a slight degradation in terms of the other performance metrics for FLAIR. Such degradation is not observed with the synthesized T2w.
FIGURE 5
AUC, accuracy, precision, recall, and
‐score of the RS tested with Dataset24 when FLAIR (A) and T2w (B) are acquired, synthesized, or ignored in the whole pipeline
AUC, accuracy, precision, recall, and
‐score of the RS tested with Dataset24 when FLAIR (A) and T2w (B) are acquired, synthesized, or ignored in the whole pipelineFigure 6 shows scatter plots of the output predicted probabilities of Experiments I–II and I–III. The ground‐truth labels are also displayed. Additionally,
values from the identity linear regressions are provided. A better agreement of points in plots in the upper row (Experiments I–II) compared with the plots in the lower row (Experiments I–III) can be observed. This better agreement is also confirmed with the higher values of
obtained. Additionally, the ICC values measured are 0.983 and 0.292 for FLAIR and 0.964 and 0.027 for T2w for the pairs Experiments I–II and I–III, respectively.
FIGURE 6
Scatter plots of the predicted probabilities obtained at the output of the RS for the experiment with the acquired versus the synthesized images (top row) and versus an RS trained from scratch without considering this weighted image as input (bottom row). The plots are shown for FLAIR (A) and T2w (B). Dashed lines represent the threshold fixed in the RS to classify survival (>480 days). Each point represents one glioblastoma patient and its color corresponds to the ground‐truth labels.
values from the identity linear regressions are provided
Scatter plots of the predicted probabilities obtained at the output of the RS for the experiment with the acquired versus the synthesized images (top row) and versus an RS trained from scratch without considering this weighted image as input (bottom row). The plots are shown for FLAIR (A) and T2w (B). Dashed lines represent the threshold fixed in the RS to classify survival (>480 days). Each point represents one glioblastoma patient and its color corresponds to the ground‐truth labels.
values from the identity linear regressions are providedFinally, Figure 7 shows boxplots of the probability differences for the pairs Experiments I–II and I–III, for both FLAIR and T2w. As can be seen, the median of the boxplots is closer to zero in Pair I–II than in Pair I–III for both weighted images. A lower interquartile range (IQR) in boxplots of Experiments I–II compared with Experiments I–III can be also observed, together with a median shift from zero in the I–III experiment, an effect which is more prominent for T2w.
FIGURE 7
Boxplots of the differences between the probabilities obtained at the output of the RS for Experiments I–II and I–III. Each point represents the probability difference for each glioblastoma patient and the dashed line corresponds to zero difference
Boxplots of the differences between the probabilities obtained at the output of the RS for Experiments I–II and I–III. Each point represents the probability difference for each glioblastoma patient and the dashed line corresponds to zero difference
DISCUSSION
In this work, we have thoroughly analyzed the replacement of an actually acquired weighted image with a synthesized version for predicting survival of glioblastoma patients with a completely independent RS. Starting from two acquired weighted images, we synthesized a new weighted image using a CNN‐based method. The RS was trained using as input acquired images only. Then, the system was tested using as input acquired images on one side, and replacing one acquired image with a synthesized image on the other. We also compared performance with a system trained from scratch ignoring this additional channel.Results show that multicontrast‐demanding quantitative applications, such as radiomics, can be leveraged by synthesized images. Synthesized images may allow widespread usage of these RSs in clinical practice, by retrospectively completing databases with missing modalities and/or replacing artifacted images. Further, these synthesized images have the potential to speed up acquisition protocols by replacing some acquired images with their synthesized versions. In particular, removing FLAIR or T2w from an average brain protocol may reduce the overall scan duration by of the order of 20%, and both of them are artifact‐prone sequences due to their sensitivity to motion. Thus, our results allow us to state that synthetic MRI does add to glioblastoma SP within a radiomics‐based approach. Indeed, the network described in Reference
is prepared to synthesize more than one contrast, so for protocols with more sequences than those used in this paper more reductions could be potentially achieved.Our work shows that synthesized weighted images are visually similar to the acquired images in both healthy and pathological tissues. Additionally, the values of image quality metrics prove the agreement between the synthesized and the actually acquired images. The performance achieved with the synthesized images in the RS is not only close to the performance achieved using the acquired images, but also substantially better than using a model trained without this weighting. This is confirmed by the classifier performance metrics (i.e., AUC, accuracy, precision, recall, and
‐score) for both FLAIR and T2w. The
of the identity linear regression and the ICC values also support this finding. Note that an ICC value above 0.9 is considered excellent.
It is worth noting that the synthesized images input to the XGB17 model improve the AUC compared with acquired images. This might be caused by the implicit filtering undergone during the synthesis procedure. Moreover, accuracy values obtained from the RS are on par with other RSs that rely exclusively on acquired images,
and set the survival threshold for classification, similarly to ourselves, to achieve balance between group sizes.Our work shows several differences from that described by Islam et al
that are worth highlighting. First, our feature selection procedure turned out to rely on radiomic features mainly based on textures; in contrast, Islam et al
propose a radiogenomic model with 51 features, 43 of which are genetic and 8 radiomic. Most of the latter are morphological, i.e., features that do not depend directly on the image intensity or texture but only indirectly through a segmentation process, and, consequently, they may not be the optimum features to capture the effects of the synthesized images in the classification. On the other hand, in their work Reference
image synthesis and classification are coupled, since classifiers are trained with synthetic images. Moreover, the results provided there are based on validation data, while our performance figures are calculated from a separate dataset, so we do provide a guarantee to avoid overly optimistic results.One might argue about the need to synthesize weighted images to feed an RS since relaxometry maps are generated as a previous step to synthesize such weighted images; certainly, the RS could have been designed directly on these maps, and this is a topic in which some other predictions have been properly proposed.
However, we have two reasons for our design choice. First, the glioblastoma databases we have used do not include relaxometry maps. Second, the approach proposed in Moya‐Sáez et al
has been trained with glioblastoma‐free images, both from synthetic data and with a small cohort of real patients. Hence, we needed to extend our original method
to accommodate glioblastoma information. This was achieved by means of a self‐supervised approach trained with weighted images, the ones to which we had access for this type of pathology.This work has several limitations. The test experiments were carried out on Dataset24, composed of a cohort of 24 glioblastoma patients. We made this design decision because Dataset24 is the only one in which the pulse sequence and the acquisition parameters remained steady across patients, and the self‐supervised synthesis method depends on these parameters. This dependence has the advantage of making the process specific to this parameter setting, so higher synthesis quality can be expected. The downside is the inherent limitation to this particular setting. Nevertheless, the self‐supervised method can be easily extended to accommodate more parameter values for which acquisitions are available. In any case, since 24 patients is not a very large number, experiments on a larger cohort would be advisable to further support our conclusions. Further, a multi‐institutional study could be necessary to analyze the system' generalization capability.As future work, performing experiments synthesizing post‐contrast weighted images might be of interest from a clinical point of view, in order to avoid the administration of contrast agents to patients. Recently, two such attempts
,
have been reported, although they still have some limitations: first, the lack of versatility, since neither the pulse sequence nor the contrast evolution can be controlled in these methodology; there are also physical limitations due to the fact that contrast‐related information might not be fully embedded in all kinds of pre‐contrast images.
On the other hand, for glioblastoma SP, there is still a wide gap for improvement. New data harmonization strategies, image resolution upgrade, and the combination of other data sources, such as histologic samples, diffusion tensor imaging, or genomics, may better characterize the broad heterogeneity of this disease.In conclusion, in this work we assessed the performance of an RS when an input actually acquired was replaced with a synthesized version. To this end, we synthesized realistic FLAIR and T2w images in a glioblastoma dataset with a deep learning approach. Furthermore, an RS for SP, which can classify patients into two groups (survival > 480 days and
480 days) was built. We evaluated the effects of the synthesized weighted images in the RS performance. Results support the utility of using synthesized images to feed an RS for SP of glioblastoma patients.nbm4754‐sup‐0001‐SupplementaryMaterial final.pdfClick here for additional data file.
Authors: Hanzhang Lu; Lidia M Nagae-Poetscher; Xavier Golay; Doris Lin; Martin Pomper; Peter C M van Zijl Journal: J Magn Reson Imaging Date: 2005-07 Impact factor: 4.813
Authors: Julián Pérez-Beteta; David Molina-García; José A Ortiz-Alhambra; Antonio Fernández-Romero; Belén Luque; Elena Arregui; Manuel Calvo; José M Borrás; Bárbara Meléndez; Ángel Rodríguez de Lope; Raquel Moreno de la Presa; Lidia Iglesias Bayo; Juan A Barcia; Juan Martino; Carlos Velásquez; Beatriz Asenjo; Manuel Benavides; Ismael Herruzo; Antonio Revert; Estanislao Arana; Víctor M Pérez-García Journal: Radiology Date: 2018-07 Impact factor: 11.105
Authors: Ahmad Chaddad; Michael Jonathan Kucharczyk; Paul Daniel; Siham Sabri; Bertrand J Jean-Claude; Tamim Niazi; Bassam Abdulkarim Journal: Front Oncol Date: 2019-05-21 Impact factor: 6.244
Authors: Bjoern H Menze; Marion I Menzel; Juan A Hernandez-Tamames; Carolin M Pirkl; Laura Nunez-Gonzalez; Florian Kofler; Sebastian Endt; Lioba Grundl; Mohammad Golbabaee; Pedro A Gómez; Matteo Cencini; Guido Buonincontri; Rolf F Schulte; Marion Smits; Benedikt Wiestler Journal: Neuroradiology Date: 2021-04-09 Impact factor: 2.804
Authors: Elisa Moya-Sáez; Rafael Navarro-González; Santiago Cepeda; Ángel Pérez-Núñez; Rodrigo de Luis-García; Santiago Aja-Fernández; Carlos Alberola-López Journal: NMR Biomed Date: 2022-05-21 Impact factor: 4.478