Literature DB >> 34619573

Machine learning to predict individual patient-reported outcomes at 2-year follow-up for women undergoing cancer-related mastectomy and breast reconstruction (INSPiRED-001).

André Pfob¹, Babak J Mehrara², Jonas A Nelson², Edwin G Wilkins³, Andrea L Pusic⁴, Chris Sidey-Gibbons⁵.

Abstract

BACKGROUND: Women undergoing cancer-related mastectomy and reconstruction are facing multiple treatment choices where post-surgical satisfaction with breasts is a key outcome. We developed and validated machine learning algorithms to predict patient-reported satisfaction with breasts at 2-year follow-up to better inform the decision-making process for women with breast cancer.
METHODS: We trained, tested, and validated three machine learning algorithms (logistic regression (LR) with elastic net penalty, Extreme Gradient Boosting (XGBoost) tree, and neural network) to predict clinically important differences in satisfaction with breasts at 2-year follow-up using the validated BREAST-Q. We used data from 1553 women undergoing cancer-related mastectomy and reconstruction who were followed-up for two years at eleven study sites in North America from 2011 to 2016. 10-fold cross-validation was used to train and test the algorithms on data from 10 of the 11 sites which were further validated using the additional site's data. Area-under-the-receiver-operating-characteristics-curve (AUC) was the primary outcome measure.
RESULTS: Of 1553 women, 702 (45.2%) experienced an improved satisfaction with breasts and 422 (27.2%) a decreased satisfaction. In the validation set (n = 221), the algorithms showed equally high performance to predict improved or decreased satisfaction with breasts (all P > 0.05): For improved satisfaction AUCs were 0.86-0.87 and for decreased satisfaction AUCs were 0.84-0.85.
CONCLUSION: Long-term, individual patient-reported outcomes for women undergoing mastectomy and breast reconstruction can be accurately predicted using machine learning algorithms. Our algorithms may be used to better inform clinical treatment decisions for these patients by providing accurate estimates of expected quality of life.

Entities: Chemical

Keywords: Breast reconstruction; Breast surgery; Decision-making; INSPiRED; Individualized treatment; Machine learning

Mesh：

Year: 2021 PMID： 34619573 PMCID： PMC8551470 DOI： 10.1016/j.breast.2021.09.009

Source DB: PubMed Journal: Breast ISSN： 0960-9776 Impact factor: 4.380

confidence interval logistic regression local interpretable model agnostic explanation patient-reported outcome patient-reported outcome measurement Shapley Additive Explanations minimal clinically important difference

Background

Breast cancer surgery has undergone many changes over the past decades. Foremost among there is the trend towards de-escalation of surgical severity; (radical) mastectomy was once standard of care but many women now receive breast conserving therapy [1]. In spite of de-escalation trends still about 35% of all breast cancer patients have to undergo primary, secondary, or prophylactic mastectomy [2]. The negative impact of mastectomy on quality of life is well documented and reconstruction is often offered to minimize such impacts [3,4]. However, women undergoing cancer-related mastectomy and reconstruction face difficult treatment choices like implant-based vs. autologous reconstruction or timing of radiotherapy. While several trials and observational studies have provided evidence to inform these treatment decisions, tailoring multi-disciplinary care to the unique needs and preferences of individuals undergoing breast reconstruction remains complex with many knowledge gaps [[5], [6], [7]]. Modern predictive modeling using machine learning techniques may help reduce complexity, overcome knowledge gaps, and improve clinical care and outcomes for patients undergoing cancer-related breast reconstruction. Machine learning algorithms identify complex patterns in data to make accurate outcome predictions of future events at an individual level [[8], [9], [10]]. Such algorithms have shown great performance in other areas of breast cancer treatment like identifying exceptional responders to neoadjuvant treatment or patients at risk of experiencing financial toxicity related to their cancer treatment [11,12]. As post-surgical satisfaction with breasts is a recommended key outcome for women undergoing cancer-related mastectomy and breast reconstruction [13], we hypothesized that machine learning algorithms may allow accurate, individualized predictions of long-term satisfaction with reconstructed breasts prior to the initiation of the breast reconstruction process to better inform the decision-making process for these women. In this study, we developed and validated machine learning algorithms to predict individual, patient-reported satisfaction with breasts at 2-year follow-up to better inform decision-making for women undergoing cancer-related mastectomy and subsequent implant-based or autologous breast reconstruction.

Methods

Patient recruitment and selection

Patients were recruited as part of an international multicenter trial (Mastectomy Reconstruction Outcomes Consortium (MROC) study, NCT01723423). MROC patient recruitment was from 2012 to 2015, while data collection went from 2012 to 2017. The trial was conducted at 11 study sites across the US and Canada and included women who underwent cancer-related mastectomy and subsequent breast reconstruction. Study participants were followed-up for two years after surgery. Choice of breast reconstructive procedure was not randomly assigned but based on surgeon and patient preferences. Further details are published elsewhere [5,14]. Ethics approval was obtained from all study sites and all patients gave their written informed consent. Inclusion criteria of the MROC study were patient age 18 years or older, prophylactic or therapeutic, bilateral or unilateral, immediate or delayed, implant-based or autologous breast reconstruction. Exclusion criteria included previously failed breast reconstruction attempts.

Design and definitions

In this analysis, we developed and validated machine learning algorithms to predict changes in individual patient-reported satisfaction with reconstructed breasts at 2-year follow-up prior to the initiation of the breast reconstruction process. Patient-reported satisfaction with breasts was measured using the BREAST-Q ‘Satisfaction with Breasts’ subscale, a gold-standard measure for assessing patient-reported outcomes for women with breast cancer [15,16]. As described previously, we defined three types of outcome: improved, worsened, or stable satisfaction [17]. Changes greater than or equal to the minimal clinically important difference (MCID) were used as the outcome for both worsened (negative clinically-meaningful change in post-operative scores compared to baseline) and improved satisfaction with breasts (positive change compared to pre-operative score) [18]. For our predictive models, we used known clinical, patient, and patient-reported variables influencing the post-operative outcome after breast reconstruction [19,20]. A full list of co-variables and the outcome variable is shown in Table 1. In line with recent recommendations to avoid racial bias in predictive models we did not include socio-economic and ethnic information into the models but rather compared the predictive performance among different ethnic groups to ensure a fair algorithm performance [21,22]. The surgeon was not included as a predictive factor due to lacking objectivity and consensus with regard to suitable metrics for assessing a surgeon's performance (years of practice vs. caseload vs. patient satisfaction) and thus limited generalizability for future model iterations.

Table 1

Baseline demographic and clinical characteristics of participating women.

	Whole cohort (n = 1553)	Development set (n = 1332)	Validation set (n = 221)	P valuec
Patient variables
Agee, mean (SD), years	50.19 (9.98)	50.22 (10.10)	49.99 (9.22)	0.732^#
BMIe, mean (SD), kg/m²	26.49 (5.39)	26.61 (5.48)	25.72 (4.76)	0.012^#
Diabetese, no (%)				0.577a
No, no. (%)	1482 (95.4)	1269 (95.3)	213 (96.4)
Yes, no. (%)	71 (4.6)	63 (4.7)	8 (3.6)
Smokere
Never, no. (%)	1032 (66.5)	888 (66.7)	144 (65.2)	0.726a
Previous, no. (%)	482 (31.0)	409 (30.7)	73 (33.0)	0.534a
Current, no. (%)	26 (1.7)	24 (1.8)	2 (0.9)	0.498a
Unknown, no. (%)	13 (0.8)	11 (0.8)	2 (0.9)	1
Pre-operative patient-reported outcome data
BREAST-Q satisfaction with breaste, mean (SD), 0–100	59.96 (22.12)	59.76 (22.10)	61.33 (22.22)	0.330b
BREAST-Q psychosocial well-beinge, mean (SD), 0–100	69.57 (18.08)	69.49 (18.13)	70.04 (17.81)	0.673b
BREAST-Q physical well-being chest and upper bodye, mean (SD), 0–100	78.87 (14.55)	78.50 (14.56)	81.10 (14.31)	0.013b
BREAST-Q physical well-being abdomen, mean (SD)e, 0–100	89.53 (13.46)	89.27 (14.56)	91.11 (11.37)	0.033b
BREAST-Q sexual well-beinge, mean (SD), 0–100	54.90 (20.73)	55.03 (20.59)	54.11 (21.55)	0.562b
Clinical variables
Radiatione
After reconstruction, no. (%)	293 (18.9)	252 (18.9)	41 (18.6)	0.971a
Before reconstruction, no. (%)	220 (14.2)	186 (14.0)	34 (15.4)	0.648a
None, no. (%)	1040 (67.0)	894 (67.1)	146 (66.1)	0.817a
Mastectomye
Nipple-sparing, no. (%)	170 (10.9)	149 (11.2)	21 (9.5)	0.531a
Simple, no. (%)	1377 (88.7)	1178 (88.4)	199 (90.0)	0.560a
Other, no. (%)	6 (0.4)	5 (0.4)	1 (0.5)	1
Reconstruction techniquee
Tissue expander (TE), no. (%)	831 (53.5)	699 (52.5)	132 (59.7)	0.054a
Direct-to-implant (DTI), no. (%)	71 (4.6)	62 (4.7)	9 (4.1)	0.833a
Transverse rectus abdominis (TRAM) flap, no. (%)	121 (7.8)	100 (7.5)	21 (9.5)	0.374a
Deep inferior epigastric perforator (DIEP) flap, no. (%)	291 (18.7)	251 (18.8)	40 (18.1)	0.865a
Latissimus dorsi (LD) flap, no. (%)	49 (3.2)	46 (3.5)	3 (1.4)	0.149a
Gluteal artery perforator (GAP) flap, no. (%)	8 (0.5)	7 (0.5)	1 (0.5)	1a
Superficial inferior epigastric artery (SIEA) flap, no. (%)	48 (3.1)	48 (3.1)	0 (0.0)	0.008a
Crossover flap, no. (%)	60 (3.9)	52 (3.9)	8 (3.6)	0.988a
Mixed flaps, no. (%)	46 (3.0)	39 (2.9)	7 (3.2)	1a
Mixed implant and autologous, no. (%)	28 (1.8)	28 (2.1)	0 (0.0)	0.057a
Chemotherapye				0.584
Received, no. (%)	442 (28.5)	383 (28.8)	59 (26.7)
Not received, no. (%)	1111 (71.5)	949 (71.2)	162 (73.3)
Reconstruction lateralitye				0.390a
Unilateral, no. (%)	700 (45.1)	594 (44.6)	106 (48.0)
Bilateral, no. (%)	853 (54.9)	738 (55.4)	115 (52.0)
Mastectomy indicatione				0.706a
Therapeutic, no. (%)	1398 (90.0)	1197 (89.9)	201 (91.0)
Prophylactic, no. (%)	155 (10.0)	135 (10.1)	20 (9.0)
Axillary interventione
Axillary lymph node dissection (ALND), no. (%)	402 (25.9)	358 (26.9)	44 (19.9)	0.035a
Sentinel lymph node biopsy (SLNB), no. (%)	698 (44.9)	584 (43.8)	114 (51.6)	0.039a
None, no. (%)	453 (29.2)	390 (29.3)	63 (28.5)	0.878a
Socioeconomic and ethnic data
Marital status
Single, no. (%)	109 (7.1)	100 (7.5)	9 (4.1)	0.084a
Living with significant other, no. (%)	67 (4.3)	59 (4.5)	8 (3.6)	0.701a
Married, no. (%)	1176 (76.1)	1004 (75.8)	172 (77.8)	0.564a
Separated, no. (%)	27 (1.7)	23 (1.7)	4 (1.8)	1a
Divorced, no. (%)	125 (8.1)	104 (7.8)	21 (9.5)	0.483a
Widowed, no. (%)	42 (2.7)	35 (2.6)	7 (3.2)	0.825a
Education level
Some high school, no. (%)	31 (2.0)	29 (2.2)	2 (0.9)	0.319a
High school degree, no. (%)	121 (7.8)	112 (8.4)	9 (4.1)	0.036a
Some college/trade school, no. (%)	255 (16.5)	226 (17.0)	29 (13.1)	0.179a
College/trade school degree, no. (%)	602 (38.8)	517 (38.9)	85 (38.5)	0.960a
Some masters/doctoral, no. (%)	61 (3.9)	54 (4.1)	7 (3.2)	0.655a
Masters/doctoral degree, no. (%)	480 (31.0)	391 (29.4)	89 (40.3)	0.002a
Working status
Unable to work, no. (%)	37 (2.4)	29 (2.2)	8 (3.6)	0.295a
Unemployed, no. (%)	31 (2.0)	28 (2.1)	3 (1.4)	0.627a
Student, no. (%)	10 (0.7)	9 (0.7)	1 (0.5)	1a
Volunteer, no. (%)	8 (0.5)	7 (0.5)	1 (0.5)	1a
Retired, no. (%)	141 (9.2)	130 (9.9)	11 (5.0)	0.028a
Homemaker, no. (%)	179 (11.6)	152 (11.5)	27 (12.3)	0.842a
Part time employed, no. (%)	216 (14.1)	175 (13.3)	41 (18.6)	0.044a
Full time employed, no. (%)	863 (56.1)	746 (56.6)	117 (53.2)	0.376a
Other, no. (%)	52 (3.4)	41 (3.1)	11 (5.0)	0.218a
Household income per year
<25,000$, no. (%)	81 (5.4)	74 (5.8)	7 (3.2)	0.171a
25,000$ to 49,999$, no. (%)	163 (10.9)	149 (11.6)	14 (6.5)	0.032a
50,000$ to 74,999$, no. (%)	269 (17.9)	232 (18.1)	37 (17.1)	0.787a
75,000$ to 99,999$, no. (%)	233 (15.5)	207 (16.1)	26 (12.0)	0.144a
>100,000$, no. (%)	754 (50.3)	621 (48.4)	133 (61.3)	<0.001a
Ethnical background
Caucasian, no. (%)	1398 (90.9)	1199 (90.8)	199 (91.7)	0.750a
African American, no. (%)	69 (4.5)	61 (4.6)	8 (3.7)	0.662a
Asian, no. (%)	63 (4.1)	54 (4.1)	9 (4.1)	1a
American Indian, no. (%)	8 (0.5)	7 (0.5)	1 (0.5)	1a
Outcome – patient-reported satisfaction with breasts at 2-year follow-up compared to baseline
Improvedd, no. (%)	702 (45.2)	602 (45.2)	100 (45.2)	1a
Decreasedd, no. (%)	422 (27.2)	357 (26.8)	65 (29.4)	0.468a
Stable, no. (%)	429 (27.6)	373 (28.0)	56 (25.3)	0.460a

ALND = axillary lymph node dissection; SLNB = sentinel lymph node biopsy.

P values < 0.05 highlighted in bold.

P values refer to Chi-square tests for binary feature evaluation (feature true vs. feature not true).

P values refer to t-tests to evaluate mean differences of continuous data.

P values refer to differences in the development and validation set.

Increase or decrease equal or larger to minimal clinically important difference.

Variable used in the predictive models.

Baseline demographic and clinical characteristics of participating women. ALND = axillary lymph node dissection; SLNB = sentinel lymph node biopsy. P values < 0.05 highlighted in bold. P values refer to Chi-square tests for binary feature evaluation (feature true vs. feature not true). P values refer to t-tests to evaluate mean differences of continuous data. P values refer to differences in the development and validation set. Increase or decrease equal or larger to minimal clinically important difference. Variable used in the predictive models.

Algorithm development

Choice of algorithms, algorithm development, and reporting on them was informed by recent guidelines on how to use machine learning in medicine [23], how to report findings of diagnostic tests [24] and multivariate prediction models [25], and previously published research by our group [9,11,12,17]. We developed and validated three algorithms to predict clinically meaningful changes in satisfaction with reconstructed breasts: Logistic Regression (LR) with Elastic Net Penalty: We chose this algorithm because of its known ability to attenuate the influence of certain predictors on the model, leading to greater generalizability to new datasets [26,27]. Extreme Gradient Boosting (XGBoost) Tree: Decision trees are commonly used because of their ability to identify more complex, non-linear relations between variables while still being readily interpretable. Gradient boosting, a machine learning technique where the final prediction model consists of an ensemble of several stepwise built models, has been shown to further improve the predictive performance of decision trees [28]. We used Shappley Additive Explanation (SHAP) values to provide insights into the model predictions [29]. Neural network: Neural networks are state-of-the-art algorithms that mimic the structure of the mammalian cortex and that have shown great ability to detect even the most complex patterns in data. Neural networks are, however, prone to overfitting and can be difficult to interpret. We used Local Interpretable Model Interpretation (LIME) to provide insights into the model predictions [30,31]. According to guidelines for multivariable risk prediction models, validation of such a model is recommended in a dataset of at least 100 events [25]. For improved satisfaction with breasts, this requirement was satisfied by three trial sites of this large, international, multicenter trial of which the one with a maximum amount of events for decreased satisfaction was chosen as an independent validation set on which the final model was (externally) validated. The other 10 trial sites were used as development set (Fig. 1).

Fig. 1

Flow of participants.

Flow of participants. Data preparation steps (separately applied to every fold of the cross-validation process) included imputation of missing data using k-nearest neighbors (5 neighbors), removal of zero-variance variables, one hot-encoding for categorical variables, as well as feature scaling and centering for continuous variables. We used 10-fold cross-validation to train and tune the algorithms on the development set; we used a hypergrid-search for hyperparameter tuning (see Supplementary Appendix for optimal hyperparameters). To address possible class-imbalances we used the Kappa performance metric (mean value over the 10 folds) to select the final model. To avoid overfitting and to improve generalizability of our models we applied a tolerance threshold of 3% meaning that the simplest model within a 3% tolerance of the empirically optimal model was chosen as the final model [32]. A more detailed description of the algorithm development and compliance with the above-mentioned guidelines [[23], [24], [25]] can be found in the online Supplementary Appendix.

Statistical analysis

We used descriptive statistics including absolute and relative frequencies as well as chi-square tests for categorical data and mean and standard deviation alongside t-tests for continuous data to compare the distribution of baseline and outcome variables in the development and validation set. To assess the diagnostic performance of machine learning algorithms predicting clinically meaningful changes in patient-reported satisfaction with breasts, area under the receiver-operating characteristics curve (AUC) and accompanying 95% confidence intervals were calculated for every model using 2000 bootstrap replicates that were drawn from the validation dataset and stratified for the outcome variable (clinically meaningful increase/decrease). Additionally, model overall accuracy was computed by comparing the model predictions to the actual patient-reported outcome at 2-year follow up; 95% Clopper-Pearson confidence intervals were computed. We used calibration plots (observed vs. predicted probabilities [33]) and Spiegelhalter's Z statistic [34] to evaluate model calibration. No multiplicity adjustments against type-I-error inflation were performed. All P values should be interpreted descriptively and have no confirmatory value. A P value smaller than 0.05 was considered statistically significant. Analysis was conducted using R software, Version 3.6.1; the “caret” package of R was used for the model development.

Results

Clinical and demographic characteristics

Of 3058 women enrolled, 1503 returned both baseline and 2-year follow-up information for satisfaction with breasts. Of those 1503 patients, 1332 patients from 10 trial sites were included in the development set for the algorithm training and initial testing and one complete trial site (n = 221) was set aside as validation set (Fig. 1). Clinical and demographic characteristics in the whole cohort as well as in the development and validation datasets are listed in Table 1. A total of 702 (45.2%) women experienced a clinically meaningful improved satisfaction with breasts two years after surgery, 422 (27.2%) a clinically meaningful decreased satisfaction, and 429 (27.6%) a stable satisfaction with breasts. In the validation set, women had a significantly lower BMI (26.6 vs. 25.7), a higher pre-operative physical well-being for chest and upper body (78.5 points vs. 81.1) as well as abdomen (89.3 points vs. 91.1), they underwent autologous reconstructions with superficial inferior epigastric artery flaps less often (0% vs. 3.1%), and received SLNB (51.6% vs. 43.8%) instead of ALND (19.9% vs. 26.9%) more often.

Algorithm performance

Table 2 summarizes the performance of the LR with elastic net penalty, the XGBoost tree, and the neural network algorithm in the prediction of improved or worsened satisfaction with breasts at 2-year follow up.

Table 2

Evaluation of algorithms trained to predict satisfaction with breasts at two-year follow-up.

	2-year follow-up satisfaction lower than baseline		2-year follow-up satisfaction higher than baseline
	Accuracy (95% CI)	AUC (95% CI)	Accuracy (95% CI)	AUC (95% CI)
Logistic regression with elastic net penalty
Test set (n = 1332)	0.84 (0.83–0.85)	0.85 (0.84–0.87)	0.77 (0.76–0.78)	0.85 (0.84–0.86)
Additional validation set (n = 221)	0.83 (0.78–0.88)	0.84 (0.78–0.90)	0.78 (0.72–0.83)	0.87 (0.82–0.91)
XGBoost Tree
Test set (n = 1332)	0.84 (0.82–0.85)	0.85 (0.84–0.87)	0.76 (0.75–0.78)	0.85 (0.83–0.86)
Additional validation set (n = 221)	0.83 (0.77–0.88)	0.84 (0.78–0.90)	0.77 (0.71–0.83)	0.86 (0.81–0.91)
Neural network
Test set (n = 1332)	0.83 (0.82–0.84)	0.86 (0.85–0.87)	0.76 (0.74–0.77)	0.84 (0.83–0.86)
Additional validation set (n = 221)	0.84 (0.78–0.88)	0.85 (0.78–0.90)	0.78 (0.72–0.84)	0.87 (0.83–0.92)

AUC = Area under the Receiver Operating Characteristic Curve.

Evaluation of algorithms trained to predict satisfaction with breasts at two-year follow-up. AUC = Area under the Receiver Operating Characteristic Curve. For the prediction of worsened satisfaction in the validation set, the LR with elastic net penalty, the XGBoost tree, and the neural network showed an AUC of 0.84 (95%CI 0.78–0.90), 0.84 (95%CI 0.78–0.90), and 0.85 (95%CI 0.78–0.90), respectively. For the prediction of improved satisfaction in the validation set, the three algorithms showed an AUC of 0.87 (95%CI 0.82–0.91), 0.86 (95%CI 0.81–0.91), and 0.87 (95%CI 0.83–0.92) for improved satisfaction. When compared against each other, the performance of the different algorithms did not differ significantly (Fig. 2). Accompanying ROC curves are illustrated in Fig. 3.

Fig. 2

Performance comparison between the models to predict improved and decreased satisfaction with breasts at 2-year follow-up.

Fig. 3

Receiver Operating Characteristic Curves of the models to predict improved and decreased satisfaction with breasts at 2-year follow-up.

Performance comparison between the models to predict improved and decreased satisfaction with breasts at 2-year follow-up. Receiver Operating Characteristic Curves of the models to predict improved and decreased satisfaction with breasts at 2-year follow-up. Calibration plots of all algorithms in the validation set yielded good calibration (Fig. 4). Spiegelhalter's Z statistic confirmed a well-calibrated model for the XGBoost tree algorithm to predict worsened or improved satisfaction (z score 0.127 and −0.152, P value 0.449 and 0.440) but not for the other algorithms.

Fig. 4

Calibration Plots of the Machine Learning Models in the Validation Set. 4a. Decreased satisfaction – Logistic Regression with Elastic Net Penalty.4b. Decreased satisfaction – XGBoost Tree.4c. Decreased satisfaction – neural network.4d. Improved satisfaction - Logistic Regression with Elastic Net Penalty. 4e. Improved satisfaction – XGBoost Tree.4f. Improved satisfaction – neural network.

Predictive coefficients and insights into variable importance

The predictive coefficients of the LR with elastic net penalty (Table 3) illustrate that baseline satisfaction with breasts was most strongly associated with the outcomes at 2-year follow up followed by type of reconstruction, timing of radiotherapy, smoker status, mastectomy indication, and laterality. Specifically, a low baseline satisfaction with breasts, autologous reconstruction with TRAM or DIEP flaps, radiation before reconstruction, and never having smoked were associated with improved satisfaction at 2-year follow up. Contrary, a high baseline satisfaction with breasts, implant-based reconstruction using tissue expanders, autologous reconstruction using mixed flaps, radiation after reconstruction, a current smoker status, and unilateral reconstruction were associated with worsened outcomes at 2-year follow up.

Table 3

Regularized coefficients from the logistic regression with elastic net penalty.

	Regularized coefficient for lower satisfaction at 2-year follow-up (positive values indicate a positive correlation with low satisfaction)	Regularized coefficient for higher satisfaction at 2-year follow-up (positive values indicate a positive correlation with high satisfaction)
Patient variables
Age	0.01	0.0
BMI	0.05	0.0
Diabetes	0.0	0.0
Smoker
Never	−0.13	0.14
Previous	0.0	0.0
Current	0.42	−0.37
Patient-reported outcomes at baseline
Satisfaction with breasts	1.44	−1.26
Psychosocial well-being	0.0	0.0
Physical well-being chest and upper body	−0.04	0.0
Physical well-being abdomen	0.01	0.0
Sexual well-being	0.0	0.0
Clinical variables
Radiation
After reconstruction	0.52	−0.42
Before reconstruction	−0.22	0.02
None	0.0	0.0
Mastectomy
Nipple-sparing	0.0	−0.01
Simple	0.0	0.0
Other	0.11	0.0
Reconstruction – Implant-based
Tissue expander	0.40	−0.28
Direct-to-implant	0.0	0.0
Reconstruction – Autologous (flap)
TRAM	−0.76	0.16
DIEP	−0.22	0.23
LD	0.0	0.0
GAP	0.0	0.0
SIEA	−1.00	0.0
Crossover	0.0	0.0
Mixed flaps	0.28	0.0
Reconstruction – Mixed implants and autologous	−0.11	0.01
Chemotherapy
Received	−0.12	−0.03
Not received	0.12	0.03
Laterality
Unilateral reconstruction	0.16	−0.05
Bilateral reconstruction	−0.16	0.05
Mastectomy indication
Therapeutic	0.19	0.0
Prophylactic	−0.19	0.0
Axillary intervention
Axillary lymph node dissection	0.0	−0.13
Sentinel lymph node biopsy	0.01	0.0
No axillary intervention	0.0	0.21

Regularized coefficients from the logistic regression with elastic net penalty. Fig. 5, Fig. 6 provide insights into the variable importance of the XGBoost tree and the neural network using local interpretable methods. Generally, the same effects as for the logistic regression with elastic net penalty could be observed.

Fig. 5

Shapley Additive Explanations (SHAP) Value Summary Plot of the Extreme Gradient Boosting (XGBoost) Tree Model. 5a. Decreased satisfaction.5b. Improved satisfaction.

Fig. 6

Local interpretable model-agnostic explanations of the neural network. 6a. Decreased satisfaction.6b. Improved satisfaction.

Shapley Additive Explanations (SHAP) Value Summary Plot of the Extreme Gradient Boosting (XGBoost) Tree Model. 5a. Decreased satisfaction.5b. Improved satisfaction. Local interpretable model-agnostic explanations of the neural network. 6a. Decreased satisfaction.6b. Improved satisfaction. For comparison, the results of a traditional multivariable regression with decreased satisfaction at 2-year follow up as outcome variable are listed in Table 4.

Table 4

Multivariable logistic regression for decreased satisfaction with reconstructed breasts.

	Odds ratio (95% CI)	P value
Patient variables
Age	1.01 (0.99–1.02)	0.550
BMI	1.02 (0.99–1.06)	0.202
Diabetes
No	1 [reference]	–
Yes	1.74 (0.77–3.73)	0.166
Smoker
Never	1 [reference]	–
Previous	1.41 (1.01–1.98)	0.045
Current	2.06 (0.56–6.98)	0.256
Patient-reported outcomes at baseline
Satisfaction with breasts	1.11 (1.09–1.12)	<0.001
Psychosocial well-being	0.99 (0.98–1.00)	0.033
Physical well-being chest and upper body	0.99 (0.98–1.00)	0.067
Physical well-being abdomen	1.00 (0.99–1.01)	0.922
Sexual well-being	0.99 (0.98–1.00)	0.097
Clinical variables
Radiation
After reconstruction	2.62 (1.70–4.08)	<0.001
Before reconstruction	0.95 (0.49–1.78)	0.878
None	1 [reference]	–
Mastectomy
Nipple-sparing	1.09 (0.65–1.81)	0.745
Simple	1 [reference]	–
other	0.33 (0.01–3.35)	0.416
Reconstruction
Tissue expander	1 [reference]	–
Direct-to-implant	0.81 (0.40–1.60)	0.548
TRAM	0.23 (0.10–0.49)	<0.001
DIEP	0.35 (0.22–0.55)	<0.001
LD	0.43 (0.16–1.11)	0.093
GAP	0.40 (0.04–3.03)	0.417
SIEA	0.08 (0.02–0.25)	<0.001
Crossover	0.82 (0.38–1.73)	0.614
Mixed flaps	1.07 (0.40–2.60)	0.891
Mixed implants and autologous	0.23 (0.03–1.36)	0.137
Chemotherapy
Not received	1 [reference]	–
Received	1.31 (0.92–1.87)	0.138
Laterality
Unilateral reconstruction	1 [reference]	–
Bilateral reconstruction	0.72 (0.51–1.00)	0.052
Mastectomy indication
Therapeutic	1 [reference]	–
Prophylactic	0.45 (0.21–0.95)	0.042
Axillary intervention
Axillary lymph node dissection	0.69 (0.40–1.20)	0.192
Sentinel lymph node biopsy	0.90 (0.56–1.45)	0.662
No axillary intervention	1 [reference]	–

Multivariable logistic regression for decreased satisfaction with reconstructed breasts.

Subgroup analysis

To assess potential racial bias of the predictive models, we evaluated the algorithms' performance among different ethnic groups. The logistic regression with elastic net penalty, the XGboost tree, and the neural network showed significantly better performance among African American women compared to Caucasian women (AUC 1 vs. 0.84; P < 0.001). No significant differences could be observed when comparing the algorithms’ performance among Asian and Caucasian women or Asian and African American women (all P > 0.05).

Discussion

In this study, we demonstrate that machine learning algorithms can accurately predict individual, long-term patient-reported outcomes for women undergoing cancer-related mastectomy and subsequent implant-based or autologous breast reconstruction. The strength of these algorithms is that they may better inform clinical treatment decisions (e.g., autologous vs. implant-based reconstruction) for these patients by using contextualizing clinical, patient, and patient-reported variables to provide patient-relevant outcome predictions tailored to the individual patient's situation. Insights into the predictions made by the algorithms showed that baseline patient-reported variables were more important than clinical treatment decisions. Identifying and recommending optimal treatment decisions for women undergoing cancer-related mastectomy and breast reconstruction has been a major focus of clinical research during the past years [7]. For example, previous clinical trials have found that autologous reconstruction is generally associated with better outcomes compared to implant-based reconstruction, that radiotherapy is generally associated with poorer quality of life outcomes (but the optimal timing, before or after reconstruction, remains unclear), and that nipple-sparing mastectomy may offer no benefit over simple mastectomy [6,19,20,35,36]. However, clinical applicability of these findings may have been limited so far due to inherently limited group-level inferences of traditional statistics [37]. For example, while autologous reconstruction may have statistically better outcomes compared to implants in a prospective study such a finding does not mean that all women will have better outcomes if they receive an autologous reconstruction. Traditional primary endpoint evaluations are performed under the assumption that all other co-variables (e.g. radiation, type of mastectomy, baseline satisfaction etc.) are held constant to eliminate their influence on the outcome. Thus, while it is important to know that autologous reconstruction was generally associated with better outcomes in prospective studies if all other influences were statistically eliminated, this may not fully represent the complex interactions between different treatment choices and the individual patient's situation within the breast reconstruction process. Accurate predictive models like our algorithms may better inform the decision-making process for patients and clinicians. A clinical trial comparing the outcomes of “intelligent” (algorithm-supported) decision-making against traditional decision-making for women undergoing mastectomy and breast reconstruction seems warranted. Our group has previously reported the results of predicting short-term patient-reported outcomes for women undergoing breast reconstruction [17]. Our present analysis does not only demonstrate that longer-term outcomes can be predicted with great accuracy but also that the drivers for predicting short- or long-term outcomes are different. For example, direct-to-implant reconstruction was strongly associated with worse short-term but not long-term outcomes (regularized β = 0.34 vs. 0.0) whereas the negative impact of implant-based reconstruction with tissue expanders on patient-reported satisfaction with breasts worsened in the long-term follow up (regularized β = 0.08 vs. 0.40). For autologous reconstruction, TRAM, DIEP, and SIEA flaps were associated with a decreased risk of a poorer short-term and long-term outcome. GAP and crossover flaps were previously found to be associated with worsened short-term follow up; this effect could not be observed for 2-year outcomes. Interestingly, also the importance of patient and patient-reported variables changed over time: A higher patient age was strongly associated with poorer short-term outcomes but not with long-term outcomes (regularized β = 0.66 vs. 0.01) and patient-reported baseline variables other than satisfaction with breasts (psychosocial, physical, and sexual well-being) were less important in predicting long-term outcomes. In interpreting our findings, some limitations of our study should be considered. First, although we used the largest dataset of its kind to develop and validate our algorithms, some procedures showed a limited sample size in the development and validation set. Moreover, most of our study sites were high-volume academic centers located in North America and although the algorithms performed equally well among different ethnic groups, our study population showed limited ethnic diversity. Future research may verify our findings in more diverse populations. Second, only about 50% of study participants completed long-term follow-up of the patient-reported outcome satisfaction with breasts. Possible bias arises from this loss to follow up but previous research in the field of patient-reported outcomes indicates similar completion rates [17,19,20,35]. The initiation of digitized patient-reported outcome assessments may allow improving engagement with those assessments in future trials [38,39]. Third, although satisfaction with reconstructed breasts is a key outcome for women undergoing cancer-related mastectomy and breast reconstruction it is not the only relevant outcome. Future research may look into developing additional prediction models for other patient-reported outcomes like psychological well-being, physical well-being, and sexual well-being after breast reconstruction to inform even more comprehensively about expected quality of life after breast reconstruction.

Conclusion

Long-term, individual patient-reported outcomes for women undergoing cancer-related mastectomy and breast reconstruction can be accurately predicted using machine learning algorithms. Our algorithms may be used to better inform clinical treatment decisions for these patients by providing accurate estimates of expected quality of life prior to the initiation of the breast reconstruction process.

Role of the funding source

The founding party played no role in analyzing or interpreting the data. Writing assistance No writing assistance was provided.

Ethics committee approval

Appropriate institutional review board or research ethics board approval was obtained from all sites. Clinical trial information: NCT01723423.

Declaration of competing interest

André Pfob No relationship to disclose. Babak J. Mehrara No relationship to disclose. Jonas A. Nelson No relationship to disclose. Edwin G. Wilkins No relationship to disclose. Andrea L. Pusic Patents, Royalties, Other Intellectual Property: I am co-developer of BREAST-Q and receive royalty payments when it is used in industry-sponsored trials. Chris Sidey-Gibbons No relationship to disclose.

3 in total

1. Developing Machine Learning Algorithms to Support Patient-centered, Value-based Carpal Tunnel Decompression Surgery.

Authors: Conrad J Harrison; Luke Geoghegan; Chris J Sidey-Gibbons; Paul H C Stirling; Jane E McEachan; Jeremy N Rodrigues
Journal: Plast Reconstr Surg Glob Open Date: 2022-04-18

Review 2. Breast and axillary surgery after neoadjuvant systemic treatment - A review of clinical routine recommendations and the latest clinical research.

Authors: André Pfob; Joerg Heil
Journal: Breast Date: 2022-01-22 Impact factor: 4.254

3. Development and validation of a clinicoradiomic nomogram to assess the HER2 status of patients with invasive ductal carcinoma.

Authors: Aqiao Xu; Xiufeng Chu; Shengjian Zhang; Jing Zheng; Dabao Shi; Shasha Lv; Feng Li; Xiaobo Weng
Journal: BMC Cancer Date: 2022-08-10 Impact factor: 4.638

3 in total