Literature DB >> 27243012

Quantitative Evaluation of Therapeutic Response by FDG-PET-CT in Metastatic Breast Cancer.

Dorothée Goulon¹, Hatem Necib², Brice Henaff³, Caroline Rousseau⁴, Thomas Carlier⁵, Françoise Kraeber-Bodere⁵.

Abstract

PURPOSE: To assess the therapeutic response for metastatic breast cancer with (18)F-FDG position emission tomography (PET), this retrospective study aims to compare the performance of six different metabolic metrics with PERCIST, PERCIST with optimal thresholds, and an image-based parametric approach.
METHODS: Thirty-six metastatic breast cancer patients underwent 128 PET scans and 123 lesions were identified. In a per-lesion and per-patient analysis, the performance of six metrics: maximum standardized uptake value (SUVmax), SUVpeak, standardized added metabolic activity (SAM), SUVmean, metabolic volume (MV), total lesion glycolysis (TLG), and a parametric approach (SULTAN) were determined and compared to the gold standard (defined by clinical assessment and biological and conventional imaging according RECIST 1.1). The evaluation was performed using PERCIST thresholds (for per-patient analysis only) and optimal thresholds (determined by the Youden criterion from the receiver operating characteristic curves).
RESULTS: In the per-lesion analysis, 210 pairs of lesion evolutions were studied. Using the optimal thresholds, SUVmax, SUVpeak, SUVmean, SAM, and TLG were significantly correlated with the gold standard. SUVmax, SUVpeak, and SUVmean reached the best sensitivity (91, 88, and 83%, respectively), specificity (93, 95, and 97%, respectively), and negative predictive value (NPV, 90, 88, and 83%, respectively). For the per--patient analysis, 79 pairs of PET were studied. The optimal thresholds compared to the PERCIST threshold did not improve performance for SUVmax, SUVpeak, and SUVmean. Only SUVmax, SUVpeak, SUVmean, and TLG were correlated with the gold standard. SULTAN also performed equally: 83% sensitivity, 88% specificity, and NPV 86%.
CONCLUSION: This study showed that SUVmax and SUVpeak were the best parameters for PET evaluation of metastatic breast cancer lesions. Parametric imaging is helpful in evaluating serial studies.

Entities: Chemical Disease Gene Species

Keywords: FDG; PERCIST; PET; SULTAN; breast cancer; parametric analysis; therapeutic evaluation

Year: 2016 PMID： 27243012 PMCID： PMC4861036 DOI： 10.3389/fmed.2016.00019

Source DB: PubMed Journal: Front Med (Lausanne) ISSN： 2296-858X

Introduction

Metastatic breast cancer is initially diagnosed in 6–10% of cases and during follow-up in 30% of cases (1). The treatment strategy in this situation is mainly based on chemotherapy, hormonal therapy, targeted therapies, and possibly external radiotherapy. The accurate and early assessment of therapeutic efficacy represents a major challenge but is crucial for limiting toxicity and reducing expensive treatments. Current therapeutic responses for solid tumors are conventionally assessed using the international standard RECIST 1.1 (2). However, RECIST has a number of intrinsic limitations such as moderate reproducibility of tumor measurement (3), late occurrence of morphological response compared to early metabolic changes, not applicable with non-measurable morphological lesions (bone lesions, lymphangitis, and effusions), and in targeted cytostatic therapies. Functional imaging by position emission tomography (PET) with 18-fluorodeoxyglucose (18FDG) represents a potential alternative (4, 5). Specific evaluation criteria for metabolic responses have been previously defined. These include measures of quantitative metrics and visual analysis tools to classify tumor progression and response, as defined by the European Organization for Research and Treatment of Cancer (EORTC) (6) or PERCIST (3). The 18FDG-PET showed interest in breast cancer management (7), for initial staging of locally advanced cancers (stages II–III) and/or inflammatory lesions (8), detection of recurrence with better performance than conventional imaging (7, 9, 10), evaluation of therapeutic response to neo-adjuvant therapy in inoperable locally advanced cancers or before conservative surgery or inflammatory lesions (7, 11), and therapy evaluation in metastatic disease (5, 12–18). However, although 18FDG-PET proved interest in several clinical studies, it is not used in clinical practice for therapy assessment because of the lack of standardization of imaging interpretation (12). Some studies suggested a benefit of using semi-quantitative analysis (mainly the change in SUVmax or SUVmean between two PET scans) rather than visual analysis only. However, the best metric and optimal threshold was not clearly defined. Moreover, it is worth noting that none of these studies were based on the PERCIST approach proposed by Wahl et al. (3). Semi-quantitative methods (3–6) have been proposed for therapeutic evaluation using PET to improve reproducibility based on the percentage variation of a metric (SUVmax for EORTC and SULpeak for PERCIST). Yet, they have not been validated in the context of specific tumors, especially breast cancer (12). Moreover, some requirements of PERCIST (mainly need for a tumor size >2 cm and no difference between liver signal between the two PET scans) may be difficult to achieve in clinical practice. New evaluation methods based on parametric analysis are also being developed, while the best metrics and optimal thresholds were not clearly defined (19). The SULTAN (longitudinal monitoring in tomography using factor analysis) method, for example, proposes a novel semi-automatic method to assist in tumor response assessment by studying the metabolic change at the voxel level (20, 21). SULTAN provides a parametric map of the tumor metabolic change using two or more PET scans and allows the heterogeneity of response within the tumor to be determined. The first objective of this retrospective study was to compare the performance of different metabolic metrics on a per-lesion and per-patient basis in the assessment of therapeutic response in metastatic breast cancer. The second objective was to assess the benefit of parametric imaging (SULTAN) in this population.

Materials and Methods

Patients and Imaging Protocols

For this single center study conducted from September 2009 to July 2014, 36 patients (median age 63.5 years, range: 39–85 years) with breast cancer of any histological grade and metastatic involvement (i.e., initially metastatic or metastatic following diagnostic evaluation), underwent at least two 18FDG-PET scans using the same PET system in the course of their therapy. Tumor phenotypes were classified as 26 invasive ductal carcinomas, 6 invasive lobular carcinomas, 3 intraductal carcinomas, and 1 colloid carcinoma. Twenty-eight tumors were estrogen receptor (ER) positive, 21 progesterone receptor (PR) positive, 4 HER2 over-expression (HER2), and 6 were triple-negative. Treatments consisted of adjuvant chemotherapy, hormonal therapies, targeted therapies, Herceptin, and/or radiotherapy. A total of 128 PET scans were acquired (median of 3 PET/patient, range: 2–9) with a median time interval of 3.7 months between two PET (range: 1.1–19.6). A total of 123 lesions were analyzed: 44 lymph nodes, 43 bone lesions, 17 liver lesions, 10 breast lesions, 5 lung lesions, and 4 peritoneal carcinomatosis. A total of 79 pairs of PET scans were analyzed in 36 patients. Position emission tomography scans were conducted in patients fasted for at least 6 h, with normal blood glucose <10 mmol/L, 1 h after injection of 3 or 7 MBq/kg of 18FDG (depending on the PET system used), using either a Siemens Biograph mCT 40 camera (Siemens Healthcare Molecular Imaging USA, Inc.) or a General Electric Discovery LS (GE Medical Systems, Waukesha, WI, USA). The low-dose computed tomography acquisition was performed first without injection of iodinated contrast agent, followed by PET acquisition using 3 min per bed position (Siemens Biograph mCT) or 5 min for the GE Discovery LS. The following acquisition constraints according to the PERCIST framework were respected: similar activity between each PET scan (±20%), standardization against normal liver, and a similar delay between injection and acquisition (50–70 min after injection).

Image Analysis Using Semi-Quantitative Metrics

Six PET-based metrics were derived, for a maximum of five tumor targets (maximum of two targets per organ) as recommended by PERCIST (3): SUVmax, SUVpeak, SUVmean, metabolic volume (MV), total lesion glycolysis (TLG = SUVmean × MV), and standardized added metabolic activity (SAM) (22). SAM was proposed to overcome the partial volume effect. The segmentation approach proposed by Schaefer was used for computing SUVmean, MV, and TLG (23). The gold standard was defined by clinical assessment, and biological and conventional imaging by CT and MRI, performed 3 weeks after the PET evaluation. RECIST 1.1 (2) was used in these assessments. Each evolution was classified as either a responder or non-responder according to the gold standard. A “responder” as assessed by PET was defined as a metric decrease greater than the threshold, while a “non-responder” was defined as a decrease of less than the threshold or an increase in the metrics. The four different types of response were true positive (TP), responder according to PET and the gold standard; true negative (TN), non-responder according to PET and the gold standard; false negative (FN), non-responder according to PET but responder according to the gold standard; and false positive (FP), responder according to PET but non-responder according to the gold standard.

Image Analysis Using Parametric Imaging (SULTAN)

SULTAN is a parametric approach that compares two or more PET scans acquired before and during therapy (20, 21). In the context of this study, pairs of PET volumes acquired for the same patient were considered. This new approach involves a rigid registration between the two PET scans, followed by a factor analysis as briefly described in the following sections.

Registration of PET Volumes

To compare two PET images at a voxel level, these scans first need to be registered so that a given voxel corresponds to the same volume element in each of the two scans. The method used was described in Ref. (24). Briefly, the CT volumes were used to determine the appropriate transformation for aligning the PET images, as they include far more anatomical details for guiding registration than the PET images. The two CT volumes of interest (VOIs) were registered using a rigid transformation (three translation and three rotation parameters) derived from block-matching registration (19) as implemented in the Planet Onco software (Dosisoft). Local rigid transformation was assumed as only the region including mass was actually registered. The transformation mapping the second CT volume onto the first CT volume was then applied to the second PET scan so as to align it with the first PET scan, assuming the PET and CT of a given scan were perfectly registered.

Calculation of Parametric Image of Significant Tumor Changes

The two registered PET scans, denoted as PET1 and PET2, were analyzed using a factor analysis of dynamic sequences (FADS) approach (25) as implemented in the software Pixies [Apteryx, 2004]. The algorithm assumes that the two-component vector S(v, t) measured in each voxel (one value for the first scan and one value for the second scan) is a weighted sum of K basis functions. In this algorithm, the number K is constrained by the number of PET scans, hence is equal to 2. Let S(v, t) be the signal recorded at the voxel v for the time t (t = 1, 2). Then, where Cb(t) and Ce(t) are two basis kinetics, Ib is the spatial distribution of the voxel component following the Cb time course, Ie is the spatial distribution of the voxel component following the Ce time course, and e(v, t) is an additive error term. Factor analysis estimates the two functions Cb(t) and Ce(t), called factors, and their associated images Ib(v) and Ie(v), called factor images. Equation 1 is solved using a principal component analysis followed by an oblique rotation under a constant function constraint representing the constant voxels of the background (Cb) and without any other constraint on Ib(v) and Ie(v). The algorithm iteratively estimates the two factors, Cb and Ce, and the associated factor images, Ib and Ie (25). Therefore, the voxels that evolved between the two scans followed the Ce factor. A new image (SULTAN image) is then created whereby each pixel v is equal to Ie(v) if |Ie(v)| > 1 or 0 otherwise. Hence, each voxel reflects its evolution over time following the Ce factor (Ie > 0) or the opposite direction of Ce (Ie < 0). Finally, each lesion was classified as responder (main factor decreasing with Ie > 0 or main factor increasing with Ie < 0) or non-responder (main factor increasing with Ie > 0 or main factor decreasing with Ie < 0). Patient was considered as responder if all lesions were responders and non-responder otherwise. The results were then classified as VP, VN, FP, and FN by comparison with the gold standard.

Statistical Analysis

The study was performed using a per-lesion and a per-patient analysis. For each analysis, the metrics were compared using the area under the curve (AUC) determined with receiver operating characteristic (ROC) analysis. The optimal thresholds were derived using the Youden criterion [max (sensitivity + specificity − 1)] through the ROC analysis for the per-lesion and per-patient studies. The per-lesion analysis was performed using the percentage change using optimal threshold of each metabolic metric. Each lesion was then compared with the gold standard. The per-patient analysis was performed using the PERCIST criteria (the percentage change of each metabolic metric for the most intense lesion in each PET between two scans). The percentage change was interpreted as responder or non-responder using previously optimized thresholds but also using PERCIST threshold (30% for each metric, except 45% for TLG). Each pair of PET scans was then compared with the gold standard. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were then calculated for each index. Pearson’s chi-squared analysis with a type I error of 0.05 and 1 degree of freedom was performed to determine significant associations between the different quantitative metrics and the gold standard. Statistical significance was set to p < 0.05. Statistical analysis was performed using MedCalc Statistical Software version 14.12.0 (MedCalc Software, Ostend, Belgium; http://www.medcalc.org; 2014). We obtained informed consent from all patients allowing the use of their clinical data for research purposes under a protocol approved in our institution.

Results

Per-Lesion Analysis Using Quantitative Metrics

A total of 123 lesions and 210 pairs of lesion evolutions, followed on two to nine scans, were analyzed with 111 considered as responders and 99 as non-responders according to the gold standard. Figure 1 shows the results of the ROC study for the six metrics. The AUC values (Table 1) ranged from 0.55 for MV to 0.96 for SUVmax. The AUC intercomparison study distinguished three significantly distinct groups: SUVmax/SUVpeak/SUVmean, SAM/TLG, and MV (Figure 2).

Figure 1

ROC curves of metabolic indices for per-lesion analysis.

Table 1

Metabolic metrics AUC for per-lesion analysis.

Metrics	SUVmax	SUVpeak	SUVmean	SAM	MV	TLG
AUC	0.960	0.958	0.937	0.775	0.554	0.822

Figure 2

Synthetic scheme of the results of the intercomparison per-lesion study. Indices lying in the same circle were not significantly different.

ROC curves of metabolic indices for per-lesion analysis. Metabolic metrics AUC for per-lesion analysis. Synthetic scheme of the results of the intercomparison per-lesion study. Indices lying in the same circle were not significantly different. The optimal thresholds defined by the Youden criterion, were 21% for SUVmax, 23% for SUVpeak, 29% for SUVmean, 48% for SAM, 33% for MV, and 20% for TLG. Sensitivity, specificity, PPV, NPV, accuracy values, and Youden correlation coefficients were calculated for their optimal threshold (Table 2).

Table 2

Comparison of metabolic metrics for per-lesion analysis with optimal thresholds.

Metrics	SUVmax	SUVpeak	SUVmean	SAM	MV	TLG	SULTAN
Threshold (%)	−21	−23	−29	−48	−33	−20
Sensitivity (%)	91	88	83	66	27	68	86
Specificity (%)	93	95	97	83	89	86	75
PPV (%)	94	95	97	81	73	84	79
NPV (%)	90	88	83	68	52	70	83
Accuracy (%)	92	91	90	74	56	76	81
Youden correlation coefficient	0.84	0.83	0.80	0.49	0.16	0.53	0.61
Significance (χ²p < 0.05)	S	S	S	S	NS	S	S

Significance: correlation between the metric and the gold standard according to Youden correlation coefficient.

Comparison of metabolic metrics for per-lesion analysis with optimal thresholds. Significance: correlation between the metric and the gold standard according to Youden correlation coefficient. Five metrics (SUVmax, SUVpeak, SUVmean, SAM, and TLG) significantly correlated with the gold standard (p < 0.05), but the analysis of correlation coefficients (Youden index) showed that SUVmax, SUVpeak, and SUVmean led to the best performance in terms of sensitivity (91, 88, and 83%, respectively), specificity (93, 95, and 97%, respectively), and NPV (90, 88, and 83%, respectively).

Per-Patient Analysis Using Quantitative Metrics

A total of 79 pairs of PET scans were analyzed using the PERCIST criteria (the most intense lesion in each PET between two scans) with 36 responders and 43 non-responders. The AUC (Figure 3; Table 3) ranged from 0.61 for MV to 0.95 for SUVpeak. The AUC of SUVpeak, SUVmax, SUVmean, TLG, and SAM were significantly different from MV (p < 0.05) but not between each other (Figure 4).

Figure 3

ROC curves of metabolic indices for per-patient analysis.

Table 3

Metabolic metrics AUC for per-patient analysis.

Metrics	SUVmax	SUVpeak	SUVmean	SAM	MV	TLG
AUC	0.928	0.952	0.914	0.851	0.606	0.876

Figure 4

Synthetic scheme of the results of the intercomparison per-patient study. Indices lying in the same circle were not significantly different.

ROC curves of metabolic indices for per-patient analysis. Metabolic metrics AUC for per-patient analysis. Synthetic scheme of the results of the intercomparison per-patient study. Indices lying in the same circle were not significantly different. The percentage change of each metabolic metric was also interpreted as responder or non-responder according to the choice of the threshold (PERCIST or optimal) and then compared with the gold standard. With PERCIST thresholds (30% for each metric, except 45% for TLG), only SUVmax, SUVpeak, and SUVmean were significantly correlated with the gold standard (p < 0.05) (Table 4).

Table 4

Comparison of metabolic metrics for per-patient analysis according to PERCIST threshold.

Metrics	SUVmax	SUVpeak	SUVmean	SAM	MV	TLG
Threshold (%)	−30	−30	−30	−30	−30	−45
Sensitivity (%)	75	67	67	39	8	36
Specificity (%)	95	98	95	95	100	100
PPV (%)	93	96	92	88	100	100
NPV (%)	82	78	77	65	57	65
Accuracy (%)	86	84	82	70	58	71
Youden correlation coefficient	0.70	0.64	0.62	0.34	0.08	0.36
Significance (χ² p < 0.05)	S	S	S	NS	NS	NS

Comparison of metabolic metrics for per-patient analysis according to PERCIST threshold. The best thresholds were 36% for SUVmax, 26% for SUVpeak, 29% for SUVmean, 54% for SAM, 58% for MV, and 27% for TLG. After applying these optimized thresholds, the four metrics (SUVpeak, SUVmax, SUVmean, and TLG) were correlated with the gold standard (Table 5). Threshold optimization did not change the specificity of SUVmax (98 vs. 95%). The sensitivity using SUVpeak was slightly improved (72 vs. 67%) with a similar NPV (81 vs. 78%). The sensitivity, NPV, and accuracy of TLG were improved (53 vs. 36%, 72 vs. 65%, and 78 vs. 71%, respectively).

Table 5

Comparison of metabolic metrics and SULTAN for per-patient analysis according to optimized thresholds.

Metrics	SUVmax	SUVpeak	SUVmean	SAM	MV	TLG	SULTAN
Threshold (%)	−36	−26	−29	−54	−58	−27
Sensitivity (%)	75	72	67	39	8	53	83
Specificity (%)	98	98	95	98	100	100	88
PPV (%)	96	96	92	93	100	100	86
NPV (%)	82	81	77	66	57	72	86
Accuracy (%)	87	86	82	71	58	78	86%
Youden correlation coefficient	0.73	0.70	0.62	0.37	0.08	0.53	0.72
Significance (χ² p < 0.05)	S	S	S	NS	NS	S	S

Comparison of metabolic metrics and SULTAN for per-patient analysis according to optimized thresholds. Figure 5 highlights the benefit of using quantitative PET-derived metrics for a metastatic bone patient. CT images failed to correctly classify the therapeutic response, with the persistence of an osteo-condensation even though there was a primary tumor response, thus highlighting the fact that bone lesions cannot be evaluated using RECIST 1.1.

Figure 5

Example of metabolic assessment in a patient with metastatic bone evolution. (A) First examination: initial evaluation with multiple bone lesions (SUVmax = 11.8; SUVpeak = 7.1); (B) second examination: partial metabolic response on bone (SUVmax = 3.4 or 71% decrease; SUVpeak = 1.5 or 78% decrease); and (C) third examination: disease progression with new lesions and recurrence of some initial hypermetabolic lesions (SUVmax = 6.4 or 46% increase; SUVpeak = 4.3 or 65% increase). Persistence of sclerosis on all CT images does not allow to evaluate the response.

Per-Lesion and Per-Patient Analysis Using SULTAN

For the per-lesion analysis, results obtained with SULTAN (longitudinal monitoring in positron factor analysis) were compared with those obtained using SUVmax, SUVpeak, and SUVmean. No significant difference was found between the assessment of therapeutic response by the gold standard and SULTAN (p < 0.05). For the per-patient PET analysis, SULTAN was compared with SUVmax, SUVpeak, and SUVmean, which appeared to be the only metrics significantly correlated to the gold standard. SULTAN presented no significant difference with SUVmax, SUVpeak, and SUVmean results using the PERCIST threshold (sensitivity: 83 vs. 75, 72, and 67%; NPV: 86 vs. 82, 81, and 77%, respectively). However, specificity and PPV were found to be lower than quantitative metrics (specificity: 88 vs. 98, 98, and 95%; PPV: 86 vs. 96, 96, and 92%) (Table 6). Figures 6 and 7 show an example of a responder and a non-responder patient using SULTAN.

Table 6

Comparison of best metabolic metrics according to PERCIST and optimized thresholds and SULTAN for per-patient analysis.

Metrics	SULTAN	SUVmax		SUVpeak		SUVmean
Metrics	SULTAN	−30% PERCIST	−36%	−30% PERCIST	−26%	−30% PERCIST	−29%
Sensitivity (%)	83	75	75	67	72	67	67
Specificity (%)	88	95	98	98	98	95	95
PPV (%)	86	93	96	96	96	92	92
NPV (%)	86	82	82	78	81	77	77
Accuracy (%)	86	86	87	84	86	82	82
Youden correlation coefficient	0.72	0.70	0.73	0.64	0.70	0.62	0.62
Significance (χ² p < 0.05)	S	S	S	S	S	S	S

Figure 6

(A) Example of non-responder patient classified by SULTAN. First PET showed right hilar hypermetabolism, and second PET performed in therapeutic monitoring (exam 2) showed a progression with persistence of right hilar hypermetabolism and the appearance of a hypermetabolic right lung uptake. The evolution was classified as non-responder. Factorial image obtained by SULTAN was superimposed on the CT-scan 1 (B). Associated curves (C) represented the growing trend (red) or stable (blue) voxels. The developments described by factor analysis were similar to those of SUVmax (D) with a stability of hilar fixation and the appearance of a right pulmonary uptake.

Figure 7

Example of responder patient classified by SULTAN. First PET showed right axillary lymph nodes hypermetabolism and the second PET, performed during therapeutic monitoring (review 2), showed a disappearance of the right axillary hypermetabolism. Factorial image obtained by SULTAN was superimposed on the CT-scan 1 (B). Associated curves (C) represented the downward trend (green) or stable (blue) voxels. The developments described by factor analysis were similar to those of SUVmax (D) with a loss of the right axillary uptake.

Comparison of best metabolic metrics according to PERCIST and optimized thresholds and SULTAN for per-patient analysis. (A) Example of non-responder patient classified by SULTAN. First PET showed right hilar hypermetabolism, and second PET performed in therapeutic monitoring (exam 2) showed a progression with persistence of right hilar hypermetabolism and the appearance of a hypermetabolic right lung uptake. The evolution was classified as non-responder. Factorial image obtained by SULTAN was superimposed on the CT-scan 1 (B). Associated curves (C) represented the growing trend (red) or stable (blue) voxels. The developments described by factor analysis were similar to those of SUVmax (D) with a stability of hilar fixation and the appearance of a right pulmonary uptake. Example of responder patient classified by SULTAN. First PET showed right axillary lymph nodes hypermetabolism and the second PET, performed during therapeutic monitoring (review 2), showed a disappearance of the right axillary hypermetabolism. Factorial image obtained by SULTAN was superimposed on the CT-scan 1 (B). Associated curves (C) represented the downward trend (green) or stable (blue) voxels. The developments described by factor analysis were similar to those of SUVmax (D) with a loss of the right axillary uptake.

Discussion

Considering the limitations of morphological criteria and the subjectivity of visual analysis of metabolic imaging in the field of therapeutic evaluation, the use of quantitative PET-based metrics has gained interest in recent years (3, 7, 11–17, 26, 27). Depending on the disease studied, various metrics and thresholds have been established. In breast cancer, the majority of studies evaluating therapeutic response by metabolic metrics have been made in a neo-adjuvant setting, with histological confirmation, the true gold standard. In the adjuvant setting, the overall therapeutic response is usually assessed using morphological and metabolic imaging, and biological and clinical exams (7). The choice of a preferred biomarker differs between neo-adjuvant and adjuvant settings. In the neo-adjuvant setting, with a curative intent, the NPV is the preferred relevant statistical information in early detection of non-responders before a change of therapy. In the adjuvant setting for metastatic patients, false-negative PET may lead to a treatment change. This was designed to counteract a false-positive that may lead to a reduced survival. In this situation, choosing the best couple sensitivity–specificity may be considered as an acceptable compromise. It has been reported that a decrease of SUVmax or SUVmean after one or two cycles of chemotherapy was significantly correlated with a successful therapeutic response in the neo-adjuvant setting (7, 28–32). The optimal thresholds reported in these studies for discriminating responder and non-responder in per-patient analysis varied from 26 to 58%. These differences can be partly explained by the lack of consensus for the definition of responder and non-responder status (decrease in tumor mass >50% by histology or residual microscopic lesions), the population heterogeneity between studies (presence of hormone receptors, HER2 amplification, etc.), the time of PET completion (one, two, or three cycles of chemotherapy), and the criteria used to determine the best threshold. However, only a few studies have used PET scans for evaluating the treatment response in the context of adjuvant therapy. Couturier et al. (15) showed that a decrease of SUVmax or SUVmean was predictive of therapeutic response after three cycles of chemotherapy using the same gold standard considered in our study. They speculated that response assessment using metabolic metrics appeared to be superior to visual analysis. The SUV decrease ranged from 52 to 56% for responders and 16 to 26% for non-responders. Dose Schwarz et al. (17) found that a SUVmax reduction of 72 ± 21% after one cycle and 54 ± 16% after two cycles of chemotherapy was predictive of response to treatment. Furthermore, Specht et al. (16) and Tateishi et al. (18) concluded that a decrease of SUVmean, and to a lesser extent of TLG for bone metastases, was predictive of the duration of response to treatment. In the study of Tateishi (18), a SUVmean decrease ≥8.5% was a factor significantly related to the duration of response, while the TLG did not. Huyge et al. (33) highlighted the significant heterogeneity of the metabolic response for the same patient when considering the types of metastases (bone or visceral). Using the change of SUVmax, according to the EORTC criteria, they highlighted a poorer therapeutic response for bone lesions. Finally, Quon and Gambhir (34) has warned that the “paradoxical metabolic flare,” which corresponds to an increase of SUV in the first 10 days after commencement of hormone therapy, may be misconstrued as a sign of an early metabolic reaction. In our study, SUVmax, SUVpeak, and SUVmean were the most efficient metrics in the per-lesion and per-patient analysis. These observations are consistent with previously published results, which suggest the use of SUVmax (EORTC) or SULpeak (PERCIST). The SUVmax measurement is susceptible to be affected by noise due to its single-voxel determination (35). The use of SUVpeak may overcome this limitation and has been recommended as a more robust alternative due to its fixed volume of 1 cm3, therefore being less susceptible to noise than SUVmax. However, several definitions of SUVpeak are found in the literature differing in shape, size, and location of ROIpeak (36). As outlined in Section “Introduction,” many requirements imposed by the PERCIST criteria may be considered as too restrictive and difficult to apply in routine clinical situations. This is why we evaluated a “PERCIST-like” method with a SUV normalization against the mass of the patient (SUVpeak) rather than the lean body mass normalization (SULpeak) as recommended by PERCIST. The small size of the majority of measured lesions in our study, less than 2 cm, leads to a calculation of SUVpeak heavily weighted on SUVmax, thus explaining the high similarity of the results of the two indices. The SUVmean index gave results similar to SUVmax and SUVpeak for the per-patient analysis, also explained by the small size of the measured lesions. The SAM index was less efficient in our study and did not demonstrate benefit in our population. This index corresponds to the total excess SUV above the tumor background, reducing the impact of partial volume effect and lesion segmentation errors. Yet, Mertens et al. (22) reported good results with no significant difference with SUVmax in patients with colorectal cancer with progression to liver metastasis. The optimal threshold for differentiating responders and non-responders was set at 94.5 vs. 25.3% for the SUVmax, which is different from our results (54 vs. 36%). Additionally, we showed that MV and TLG failed to correctly classify patients. In this respect, MV performance was variable: the approach to this calculation differs among centers with the use of gradients, thresholds, or adaptive method. In the neo-adjuvant therapy evaluation of breast cancer by 18F-FDG, Hatt et al. (37) found that TLG or MV, determined by a fuzzy locally adapted Bayesian algorithm, were better predictors than SUVmax, but the lesions they considered were larger than in our study. In our study, an adaptive method based on that described by Schaefer (23) was used, but it failed to correctly delineate the lesion when the signal-to-noise ratio was poor, explaining the poor performance of volume-based metrics. Parametric imaging was found to be relevant in assessing the therapeutic response in breast cancer, with similar performance to SUVmax or SUVpeak. SULTAN has already been successfully assessed in patients with colorectal cancer and non-small lung cell carcinoma (20). SULTAN appears to be a valuable visual tool in routine clinical practice because of the otherwise tedious nature of measuring numerous lesions. Furthermore, using a single series of images, SULTAN provides a summary of all tumor evolutions from various scans without arbitrary threshold adjustment.

Conclusion

Even if our study has limitations (heterogeneous population with patients in either first-line or advanced treatment, with varied histological and phenotypic characteristics and different treatments), the results underline the importance of the metrics choice for PET evaluation. SUVmax, SUVpeak, and to a lesser extent SUVmean appeared to be the most relevant metrics. In addition, parametric analysis using the SULTAN approach is a reliable tool to guide visual interpretation. The poor performances of volumetric metrics underline the need for developing and validating a robust delineation method that could be applied in the context of small lesion with a poor signal-to-noise ratio. In the future, a comparison of metrics could be conducted in a prospective study performed in a homogeneous population.

Author Contributions

DG: data measure and paper writing; HN: statistical analysis and parametric imaging; BH: statistical analysis; CR: patient recruitment; and TC and FK-B: study conception and paper correction.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

34 in total

Review 1. FDG-PET and beyond: molecular breast cancer imaging.

Authors: Andrew Quon; Sanjiv S Gambhir
Journal: J Clin Oncol Date: 2005-03-10 Impact factor: 44.544

Review 2. A systematic review of positron emission tomography (PET) and positron emission tomography/computed tomography (PET/CT) for the diagnosis of breast cancer recurrence.

Authors: M Pennant; Y Takwoingi; L Pennant; C Davenport; A Fry-Smith; A Eisinga; L Andronis; T Arvanitis; J Deeks; C Hyde
Journal: Health Technol Assess Date: 2010-10 Impact factor: 4.014

3. Monitoring primary breast cancer throughout chemotherapy using FDG-PET.

Authors: Garry M McDermott; Andrew Welch; Roger T Staff; Fiona J Gilbert; Lutz Schweiger; Scott I K Semple; Tim A D Smith; Andrew W Hutcheon; Iain D Miller; Ian C Smith; Steven D Heys
Journal: Breast Cancer Res Treat Date: 2006-08-09 Impact factor: 4.872

4. Impact of the definition of peak standardized uptake value on quantification of treatment response.

Authors: Matt Vanderhoek; Scott B Perlman; Robert Jeraj
Journal: J Nucl Med Date: 2012-01 Impact factor: 10.057

5. Detection and characterization of tumor changes in 18F-FDG PET patient monitoring using parametric imaging.

Authors: Hatem Necib; Camilo Garcia; Antoine Wagner; Bruno Vanderlinden; Patrick Emonts; Alain Hendlisz; Patrick Flamen; Irène Buvat
Journal: J Nucl Med Date: 2011-03 Impact factor: 10.057

6. Measurement of clinical and subclinical tumour response using [18F]-fluorodeoxyglucose and positron emission tomography: review and 1999 EORTC recommendations. European Organization for Research and Treatment of Cancer (EORTC) PET Study Group.

Authors: H Young; R Baum; U Cremerius; K Herholz; O Hoekstra; A A Lammertsma; J Pruim; P Price
Journal: Eur J Cancer Date: 1999-12 Impact factor: 9.162

7. Sequential positron emission tomography using [18F]fluorodeoxyglucose for monitoring response to chemotherapy in metastatic breast cancer.

Authors: Olivier Couturier; Guy Jerusalem; Jean-Michel N'Guyen; Roland Hustinx
Journal: Clin Cancer Res Date: 2006-11-01 Impact factor: 12.531

8. Metabolic monitoring of breast cancer chemohormonotherapy using positron emission tomography: initial evaluation.

Authors: R L Wahl; K Zasadny; M Helvie; G D Hutchins; B Weber; R Cody
Journal: J Clin Oncol Date: 1993-11 Impact factor: 44.544

9. Effects of noise, image resolution, and ROI definition on the accuracy of standard uptake values: a simulation study.

Authors: Ronald Boellaard; Nanda C Krak; Otto S Hoekstra; Adriaan A Lammertsma
Journal: J Nucl Med Date: 2004-09 Impact factor: 10.057

Review 10. ¹⁸F-FDG PET/CT for Monitoring of Treatment Response in Breast Cancer.

Authors: Stefanie Avril; Raymond F Muzic; Donna Plecha; Bryan J Traughber; Shaveta Vinayak; Norbert Avril
Journal: J Nucl Med Date: 2016-02 Impact factor: 10.057

4 in total

1. Predictive markers for efficacy of everolimus plus exemestane in patients with luminal HER2-negative metastatic breast cancer.

Authors: Misato Okazaki; Yoshiya Horimoto; Masahiko Tanabe; Yuko Ichikawa; Emi Tokuda; Atsushi Arakawa; Toshiyuki Kobayashi; Mitsue Saito
Journal: Med Oncol Date: 2018-03-08 Impact factor: 3.064

Review 2. FDG-PET/CT Versus Contrast-Enhanced CT for Response Evaluation in Metastatic Breast Cancer: A Systematic Review.

Authors: Fredrik Helland; Martine Hallin Henriksen; Oke Gerke; Marianne Vogsen; Poul Flemming Høilund-Carlsen; Malene Grubbe Hildebrandt
Journal: Diagnostics (Basel) Date: 2019-08-27

3. Automatic Segmentation of Metastatic Breast Cancer Lesions on ¹⁸F-FDG PET/CT Longitudinal Acquisitions for Treatment Response Assessment.

Authors: Noémie Moreau; Caroline Rousseau; Constance Fourcade; Gianmarco Santini; Aislinn Brennan; Ludovic Ferrer; Marie Lacombe; Camille Guillerminet; Mathilde Colombié; Pascal Jézéquel; Mario Campone; Nicolas Normand; Mathieu Rubeaux
Journal: Cancers (Basel) Date: 2021-12-26 Impact factor: 6.639

4. Prognostic Value of Dual-Time-Point ¹⁸F-Fluorodeoxyglucose PET/CT in Metastatic Breast Cancer: An Exploratory Study of Quantitative Measures.

Authors: Mohammad Naghavi-Behzad; Charlotte Bjerg Petersen; Marianne Vogsen; Poul-Erik Braad; Malene Grubbe Hildebrandt; Oke Gerke
Journal: Diagnostics (Basel) Date: 2020-06-11

4 in total