Literature DB >> 30189361

Development and Validation of Nomograms for Predicting Overall and Breast Cancer-Specific Survival in Young Women with Breast Cancer: A Population-Based Study.

Yue Gong¹, Peng Ji¹, Wei Sun², Yi-Zhou Jiang², Xin Hu², Zhi-Ming Shao³.

Abstract

INTRODUCTION: The objective of current study was to develop and validate comprehensive nomograms for predicting the survival of young women with breast cancer.
METHODS: Women aged <40 years diagnosed with invasive breast cancer between 1990 and 2010 were selected from the Surveillance, Epidemiology, and End Results database and randomly divided into training (n = 12,465) and validation (n = 12,424) cohorts. A competing-risks model was used to estimate the probability of breast cancer-specific survival (BCSS). We identified and integrated significant prognostic factors for overall survival (OS) and BCSS to construct nomograms. The performance of the nomograms was assessed with respect to calibration, discrimination, and risk group stratification.
RESULTS: The entire cohort comprised 24,889 patients. The 5- and 10-year probabilities of breast cancer-specific mortality were 11.6% and 20.5%, respectively. Eight independent prognostic factors for both OS and BCSS were identified and integrated for the construction of the nomograms. The calibration curves showed optimal agreement between the predicted and observed probabilities. The C-indexes of the nomograms in the training cohort were higher than those of the TNM staging system for predicting OS (0.724 vs 0.694; P < .001) and BCSS (0.733 vs 0.702; P < .001). Additionally, significant differences in survival were observed in patients stratified into different risk groups within respective TNM categories.
CONCLUSIONS: We developed and validated novel nomograms that can accurately predict OS and BCSS in young women with breast cancer. These nomograms may help clinicians in making decisions on an individualized basis.

Entities: Chemical Disease Gene Species

Year: 2018 PMID： 30189361 PMCID： PMC6126433 DOI： 10.1016/j.tranon.2018.08.008

Source DB: PubMed Journal: Transl Oncol ISSN： 1936-5233 Impact factor: 4.243

Introduction

Breast cancer is the most frequently diagnosed malignancy and the leading cause of cancer death among women worldwide [1]. However, breast cancer is rare in young women, and approximately 7% of all breast cancers are diagnosed in women under 40 years of age [2]. The incidence of breast cancer in women younger than 40 years has been stable for the past 20 years in most countries [3]. Some studies have demonstrated that breast cancer in younger women is correlated with a more aggressive biology and poorer outcomes than breast cancer in older women [4], [5], [6], [7]. Young women have relatively high proportions of estrogen receptor (ER)–negative and progesterone receptor (PR)–negative cancers, human epidermal growth factor 2 (HER2)–positive cancers, and high-grade cancers [8], [9]. They are also more likely to be associated with a positive family history and TP53-positive tumors [10], [11]. The benefit of chemotherapy in treating breast cancer in women younger than 50 years has been confirmed [12]. However, the answers to many questions remain unknown regarding the selection of therapeutic measures for young women with breast cancer, including whether all young women with breast cancer should receive chemotherapy and whether they are candidates for breast-conserving surgery. Therefore, it is important to divide patients into different risk subgroups that receive certain treatments. Nomograms have been widely used to estimate a numeric probability of death or recurrence in each patient by combining important prognostic factors [13], [14], [15]. To the best of our knowledge, nomograms for predicting overall survival (OS) and breast cancer–specific survival (BCSS) of young women with breast cancer have not been reported. In this study, we aimed to construct comprehensive and practical nomograms for young women with breast cancer based on a large population from the Surveillance, Epidemiology, and End Results (SEER) database. In addition, we compared our nomograms with traditional TNM staging systems to determine their predictive preciseness.

Material and Methods

Study Population

Data for this study were obtained from the current SEER database, which consists of 18 population-based cancer registries. This database represents approximately 28% of the total population in the United States. SEER*Stat Version 8.3.4 (http://www.seer.cancer.gov/seerstat) from the National Cancer Institute was used to identify eligible patients [16]. We included female patients, aged 18 to 39 years, who had been diagnosed with breast cancer as a first primary malignancy between 1990 and 2010. Patients diagnosed before 1990 were not included because ER and PR status was not recorded in the SEER database until 1990. Additionally, to ensure adequate follow-up time, patients diagnosed after 2010 were not included. Only histologically confirmed unilateral breast cancer cases were included. Cases diagnosed at autopsy or by death certificate only were excluded. All variables included in the analysis have less than 10% missing values. Other exclusion criteria for this study included patients with unknown race information, unknown specific surgical treatment including mastectomy and breast-conserving surgery, unknown histological grade or grade IV disease, unknown tumor size and number of positive lymph nodes, stage IV breast cancer, unknown ER and PR status, diagnosis with inflammatory breast cancer and Paget's disease, and incomplete survival data. After the exclusion criteria were applied, a total of 24,889 women were eventually eligible for analysis. The flow chart for data selection is shown in Supplementary Figure 1.

Supplementary Figure 1

Flow diagram for selection of the study cohort.

Construction of the Nomograms

To establish and validate competing-risks nomograms, the eligible patients were randomly divided into a training (n = 12,465) cohort and a validation (n = 12,424) cohort. Race/ethnicity in SEER database was classified into four major groups, including white, black, Asian or Pacific Islander, and American Indian/Alaska Native. Given the small number of American Indian/Alaska Native patients, we incorporate these patients with Asian or Pacific Islander patients into “others” group. Thus, we classified race/ethnicity into three mutually exclusive groups of 1) white, 2) black, and 3) others. The median follow-up was estimated as the median observed survival time. OS was calculated from the date of diagnosis to the date of death due to any cause, the date of last follow-up, or December 31, 2014. In the training cohort, univariate prognostic factors were determined using the Kaplan-Meier plots and compared using the log-rank tests. Variables that achieved significance at P < .05 were entered into the multivariable analysis via the Cox proportional-hazards model. The independent prognostic factors determined by the multivariate analysis were used to construct a nomogram for OS. BCSS was measured as the time from the date of diagnosis to the date of death attributed to breast cancer, date of last follow-up, or December 31, 2014. Deaths from other causes were considered competing risks. We used the cumulative incidence function (CIF) to assess the probability of death. Gray's test was conducted to test the difference in CIF among groups [17]. A subdistribution analysis of competing risks was performed to construct a competing-risks model [18]. In the Cox regression model analyzing disease-specific regression, patients who died from other causes were considered as censored at the data of last follow-up. Thus, a nomogram was developed by the integration of associated risk factors to predict 5- and 10-year BCSS of young patients with breast cancer.

Validation and Calibration of the Nomograms

The nomograms were subjected to 1000 bootstrap resamples for internal validation of the training cohort and external validation with the validation cohort. The concordance index (C-index) between the predicted probability and response was used to assess the discrimination performance of the nomograms [19]. The value of the C-index ranges from 0.5 to 1.0, with 0.5 indicating a random chance and 1.0 indicating a perfectly corrected discrimination. Comparison of the C-index of two different models was based on previously described methods [20]. Calibration is the ability of a model to make unbiased estimates of outcome. Marginal estimate versus average predictive probability of the models was used to construct calibration curves. The predictions were expected to fall on a 45° diagonal line in a well-calibrated model.

Risk Group Stratification Based on the Nomogram beyond TNM Staging

In addition to numerically comparing the discrimination ability based on the C-index, we sought to illustrate the independent discrimination ability of the nomogram for OS beyond standard TNM staging. To this end, we determined cutoff values by evenly dividing patients in the training cohort into different risk groups within a certain TNM category according to the total risk scores (from highest to lowest) from the nomogram for OS prediction. These values were then applied to the validation cohort, and the respective Kaplan-Meier survival curves were delineated.

Statistical Analysis

All statistical analyses were performed using R software, version 3.4.0 (http://www.r-project.org) and SPSS software, version 22.0 (SPSS, Chicago, IL). The R packages cmprsk [21] and rms [22] were used for modeling and developing the nomograms. Two-sided P values less than .05 were considered statistically significant.

Results

Patient Characteristics

The entire cohort comprised 24,889 young women with histologically confirmed malignant breast cancer, with 12,465 patients in the training cohort and 12,424 patients in the validation cohort. The demographic and clinical characteristics of the study cohort are shown in Table 1. The majority of tumors were infiltrating ductal carcinoma (84.1%), and most of the patients were non-Hispanic whites (73.1%). The median survival time was 99 months (interquartile range, 63-149 months). By the end of the last follow-up, 5501 patients (22.1%) had died, including 4897 patients (19.7%) who died from breast cancer and 604 patients (2.4%) who died from other causes.

Table 1

Demographic and Clinical Characteristics of the Study Cohort

Demographic and Clinical Characteristic	All Patients	Training Cohort	Validation Cohort
Demographic and Clinical Characteristic	N = 24,889	N = 12,465	N = 12,424
Year of diagnosis
1990-1996	3294 (13.2%)	1676 (13.4%)	1618 (13.0%)
1997-2003	8270 (33.3%)	4197 (33.7%)	4073 (32.8%)
2004-2010	13,325 (53.5%)	6592 (52.9%)	6733 (54.2%)
Race
White	18,202 (73.1%)	9128 (73.2%)	9074 (73.0%)
Black	3695 (14.9%)	1853 (14.9%)	1842 (14.9%)
Others*	2992 (12.0%)	1484 (11.9%)	1508 (12.1%)
Laterality
Left	12,378 (49.7%)	6203 (49.8%)	6175 (49.7%)
Right	12,511 (50.3%)	6262 (50.2%)	6249 (50.3%)
Histology
IDC	20,935 (84.1%)	10,511 (84.3%)	10,424 (83.9%)
ILC	1805 (7.3%)	903 (7.2%)	902 (7.3%)
Others†	2149 (8.6%)	1051 (8.4%)	1098 (8.8%)
Grade
I	1866 (7.5%)	905 (7.3%)	961 (7.7%)
II	8288 (33.3%)	4178 (33.5%)	4110 (33.1%)
III	14,735 (59.2%)	7382 (59.2%)	7353 (59.2%)
Tumor size (cm)
≤2	11,833 (47.5%)	5970 (47.9%)	5863 (47.2%)
2-5	10,577 (42.5%)	5275 (42.3%)	5302 (42.7%)
> 5	2479 (10.0%)	1220 (9.8%)	1259 (10.1%)
No. of positive LNs
0	12,824 (51.5%)	6400 (51.3%)	6424 (51.7%)
1-3	7796 (31.3%)	3922 (31.5%)	3874 (31.2%)
4-9	2875 (11.6%)	1449 (11.6%)	1426 (11.5%)
≥ 10	1394 (5.6%)	694 (5.6%)	700 (5.6%)
ER status
Positive	15,746 (63.3%)	7858 (63.0%)	7888 (63.5%)
Negative	9143 (36.7%)	4607 (37.0%)	4536 (36.5%)
PR status
Positive	14,215 (57.1%)	7161 (57.4%)	7054 (56.8%)
Negative	10,674 (42.9%)	5304 (42.6%)	5370 (43.2%)
Surgery
BCS	11,119 (44.7%)	5500 (44.1%)	5619 (45.2%)
Mastectomy	13,770 (55.3%)	6965 (55.9%)	6805 (54.8%)
Survival months
Median (IQR)	99 (63-149)	99 (63-150)	99 (63-148)

Abbreviations: BCS, breast-conserving surgery; ER, estrogen receptor; IDC, infiltrating ductal carcinoma; ILC, infiltrating lobular carcinoma; IQR, interquartile range; LN, lymph node; PR, progesterone receptor.

Including American Indian/Alaskan native and Asian/Pacific Islander.

Including other histology of invasive breast cancer except IDC and ILC.

Demographic and Clinical Characteristics of the Study Cohort Abbreviations: BCS, breast-conserving surgery; ER, estrogen receptor; IDC, infiltrating ductal carcinoma; ILC, infiltrating lobular carcinoma; IQR, interquartile range; LN, lymph node; PR, progesterone receptor. Including American Indian/Alaskan native and Asian/Pacific Islander. Including other histology of invasive breast cancer except IDC and ILC.

Overall Survival

The results of the univariate and multivariate analyses are listed in Table 2. All variables except for laterality of breast cancer were significantly correlated with OS (P < .001 for all). The significant factors in the univariate analysis were subjected to a multivariate analysis based on a Cox proportional-hazards regression model. Race, histology, tumor grade, tumor size, number of positive lymph nodes, ER status, and surgery type were confirmed to be independently associated with OS (P < .05 for all).

Table 2

Univariate and Multivariate Analysis of OS in the Training Cohort

	Univariate Analysis	Multivariate Analysis
Variable	P Value	HR	95% CI	P Value
Race	<.001			<.001
White		Reference
Black		1.555	1.416-1.707	<.001
Others*		1.040	0.921-1.175	.526
Laterality	.973
Left
Right
Histology	<.001			<.001
IDC		Reference
ILC		1.071	0.930-1.233	.342
Others†		0.741	0.634-0.866	<.001
Grade	<.001			<.001
I		Reference
II		1.648	1.304-2.083	<.001
III		1.986	1.574-2.507	<.001
Tumor size (cm)	<.001			<.001
≤2		Reference
2-5		1.383	1.267-1.510	<.001
>5		1.857	1.646-2.095	<.001
No. of positive LNs	<.001			<.001
0		Reference
1-3		1.806	1.641-1.987	<.001
4-9		3.195	2.859-3.571	<.001
≥10		5.361	4.725-6.084	<.001
ER status	<.001			<.001
Positive		Reference
Negative		1.221	1.088-1.370	<.001
PR status	<.001			.425
Positive		Reference
Negative		1.046	0.936-1.170	.425
Surgery	<.001			.004
BCS		Reference
Mastectomy		1.129	1.040-1.226	.004

Abbreviation: HR, hazard ratio.

Including American Indian/Alaskan native and Asian/Pacific Islander.

Including other histology of invasive breast cancer except IDC and ILC.

Univariate and Multivariate Analysis of OS in the Training Cohort Abbreviation: HR, hazard ratio. Including American Indian/Alaskan native and Asian/Pacific Islander. Including other histology of invasive breast cancer except IDC and ILC.

Breast Cancer–Specific Survival

Estimates of probabilities of death resulting from breast cancer and other causes according to clinical characteristics are listed in Table 3. The 5- and 10-year probabilities of death from breast cancer were 11.6% and 20.5%, respectively, while the 5- and 10-year cumulative incidences of death from other causes were 1.1% and 2.6%, respectively. Young black patients exhibited higher cumulative incidence of death than white and “other” patients (P < .001 for all outcomes). There was no significant difference between different lateralities. Tumor grade, tumor size, number of positive lymph nodes, and surgery type were significantly associated with probabilities of death (P < .05 for all outcomes). Infiltrating ductal carcinoma and infiltrating lobular carcinoma, negative ER status, and negative PR status were associated with a significantly higher cumulative incidence of death only among patients who died of breast cancer (P < .001). All variables significantly correlated with cumulative incidences of death resulting from breast cancer were used to construct the nomogram to predict 5- and 10-year BCSS.

Table 3

Five- and Ten-year Cumulative Incidences of Death Among Patients in the Training Cohort

	Cumulative Incidence of Death Resulting From Breast Cancer			Cumulative Incidence of Death Resulting From Other Causes
Variable	5 y	10 y	P Value	5 y	10 y	P Value
All patients	0.116	0.205		0.011	0.026
Race			<0.001			<0.001
White	0.105	0.192		0.009	0.022
Black	0.185	0.295		0.020	0.042
Others*	0.099	0.171		0.012	0.032
Laterality			0.881			0.753
Left	0.115	0.202		0.011	0.025
Right	0.118	0.208		0.010	0.026
Histology			<0.001			0.602
IDC	0.120	0.208		0.011	0.026
ILC	0.111	0.245		0.011	0.030
Others†	0.087	0.141		0.013	0.018
Grade			<0.001			<0.001
I	0.015	0.051		0.009	0.022
II	0.069	0.170		0.005	0.018
III	0.156	0.243		0.014	0.031
Tumor size (cm)			<0.001			<0.001
≤2	0.059	0.123		0.009	0.022
2-5	0.148	0.256		0.010	0.026
>5	0.269	0.401		0.027	0.050
No. of positive LNs			<0.001			<0.001
0	0.057	0.112		0.006	0.017
1-3	0.117	0.214		0.013	0.028
4-9	0.245	0.399		0.020	0.039
≥10	0.397	0.604		0.033	0.091
ER status			<0.001			0.085
Positive	0.079	0.190		0.008	0.024
Negative	0.181	0.232		0.015	0.028
PR status			<0.001			0.106
Positive	0.079	0.186		0.009	0.024
Negative	0.167	0.231		0.014	0.028
Surgery			<0.001			0.009
BCS	0.084	0.151		0.009	0.019
Mastectomy	0.142	0.251		0.012	0.032

Including American Indian/Alaskan native and Asian/Pacific Islander.

Including other histology of invasive breast cancer except IDC and ILC.

Five- and Ten-year Cumulative Incidences of Death Among Patients in the Training Cohort Including American Indian/Alaskan native and Asian/Pacific Islander. Including other histology of invasive breast cancer except IDC and ILC. Nomograms were constructed based on the Cox regression model to predict 5- and 10-year OS and BCSS (Figure 1). The point assignment of nomograms for OS and BCSS is shown in Supplementary Table 1. Based on the nomograms, tumor grade, tumor size, and number of positive lymph nodes were sharing the largest contribution to prognosis, followed by race and histology. By adding up all points and locating them on the bottom scales, we were easily able to calculate the estimated 5- and 10-year survival probabilities.

Figure 1

Nomogram for predicting 5- and 10-year probabilities of (A) OS and (B) BCSS of breast cancer in young women. Draw a vertical straight line from the variable value to the axis labeled “Points” to identify points for each variable. Add up all points, and the total points projected on the bottom scales correspond to the 5- and 10-year survival. Abbreviations: BCS, breast-conserving surgery; ER, estrogen receptor; IDC, infiltrating ductal carcinoma; ILC, infiltrating lobular carcinoma; LN, lymph node.

Calibration and Validation of the Nomograms

The calibration plots for the OS and BCSS nomograms in the training cohort (Supplementary Figure 2) and validation cohort (Figure 2) demonstrated an acceptable agreement between the nomogram prediction and observed estimates for 5- and 10-year OS and BCSS. As shown in Supplementary Table 2, in the training cohort, the Harrell's C-indexes of the nomograms for the prediction of OS and BCSS were 0.724 [95% confidence interval (CI), 0.714-0.733] and 0.733 (95% CI, 0.723-0.743), respectively, which were significantly higher than those of the TNM staging system for OS (0.694; 95% CI, 0.684-0.704; P < .001) and BCSS (0.702; 95% CI, 0.692-0.713; P < .001). The C-indexes for the nomogram were similar in the validation cohort: 0.722 (95% CI, 0.712-0.732) for OS and 0.733 (95% CI, 0.723-0.743) for BCSS. Additionally, C-indexes were significantly greater than those of the TNM staging system at 0.699 (95% CI, 0.689-0.709) and 0.710 (95% CI, 0.700-0.720) for OS and BCSS, respectively.

Supplementary Figure 2

Calibration curves for predicting (A) 5-year and (B) 10-year OS and (C) 5-year and (D) 10-year disease-specific survival (BCSS) in the training cohort. Nomogram-predicted survival is plotted on the x-axis, and actual survival is plotted on the y-axis. Vertical bars represent 95% CIs measured by Kaplan-Meier analysis. Dashed lines along the 45° line through the origin point represent a perfect calibration model.

Figure 2

Calibration curves for predicting (A) 5-year and (B) 10-year OS and (C) 5-year and (D) 10-year BCSS in the validation cohort. Nomogram-predicted survival is plotted on the x-axis, and actual survival is plotted on the y-axis. Vertical bars represent 95% CIs measured by Kaplan-Meier analysis. Dashed lines along the 45° line through the origin point represent a perfect calibration model.

Performance of the Nomograms in Stratifying Patients According to Risk Scores

We calculated the total points of OS nomogram for every patient in the training cohort and determined the cutoff values by dividing the patients evenly into three subgroups based on total score (0 to 78, 79 to 116, and ≥117). Supplementary Table 3 and Supplementary Figure 3 show that the low-risk subgroup had the best prognosis and the high-risk subgroup had the worst survival. Furthermore, in the validation cohort, patients stratified into different risk subgroups based on cutoff values within each TNM category also exhibited significant differences in survival (Figure 3).

Supplementary Figure 3

Kaplan-Meier curves for overall survival within each TNM stage (A, all patients; B-G, stage I-IIIC) according to risk group stratification in the training cohort. Subgroups with fewer than 20 patients were omitted from the graphs.

Figure 3

Kaplan-Meier curves for overall survival within each TNM stage (A, all patients; B-G, stage I-IIIC) according to risk group stratification in the validation cohort. Subgroups with fewer than 20 patients were omitted from the graphs.

Discussion

Breast cancer in young women has several characteristics that differentiate them from breast cancer in other population [3]. Although several nomograms have been previously reported to predict prognoses in some specific subtypes of breast cancer, no comprehensive nomogram has been developed for young patients with breast cancer [23], [24], [25]. In this study, we developed and validated nomograms to predict 5- and 10-year OS and BCSS for breast cancer in young women. Because the SEER database represents approximately 28% of the US population, the nomograms we developed are highly generalizable and provide personalized estimates of OS and BCSS that can be used by patients and clinicians in making personalized treatment decisions and designing clinical studies. Although most young women with breast cancer experience breast cancer–associated mortality; some of these patients die from other cancers or noncancer causes. Non–breast cancer-related death might preclude the possibility of death resulting from breast cancer, and censoring those events might lead to biased results [26], [27]. Therefore, we introduced a competing-risks model in this study. Competing-risks models have been published in recent years for predicting prognoses in thyroid cancer, breast cancer, prostate cancer, and localized renal cell carcinoma [23], [25], [28], [29], [30]. In this study, the 5- and 10-year probabilities of death were 12.7% and 23.1%, respectively. In addition, 5- and 10-year cumulative incidences of death resulting from breast cancer were 11.6% and 20.5%, respectively, indicating a nearly eight-fold higher risk of death from breast cancer than from other causes. Using log-rank tests, Cox proportional-hazards regression analyses, and competing-risks model, we identified race, histology, tumor grade, tumor size, number of positive lymph nodes, ER status, PR status, and surgery type as independent prognostic factors for both OS and BCSS. These findings were highly concordant with the results of previous studies [31], [32], [33], [34], [35]. Previous data have highlighted that young black women have a higher risk of probability of death than young white women [35], [36]. Our study confirmed that, after adjustment for other risk factors identified for breast cancer, young white patients have a better OS and BCSS than young black patients. However, there are still some other prognostic markers and molecular profiles that the SEER database did not offer that could be used to predict the survival of breast cancer patients. According to the AJCC 8th edition, HER2 status and multigene panel (such as Oncotype DX) status should also be considered as biology factors that affect the prognosis of breast cancer [37]. Furthermore, a higher number of young patients with breast cancer carry a pathogenic BRCA1 or BRCA2 mutation compared with patients with onset of breast cancer at an older age [38], [39]. The cumulative risk of developing breast cancer is relatively high for BRCA1 or BRCA2 carriers [40]. Although whether a germline BRCA1 or BRCA2 mutation has independent prognostic implications after an initial cancer diagnosis is unclear, genetic factor should be considered when applying the nomograms. In addition, adjuvant therapies including chemotherapy and radiotherapy were not selected as candidate factors due to the lack of complete data for treatment history in the SEER database. Thus, it is difficult to accurately distinguish between the categories “no treatment” and “unknown if patients received treatment.” Another reason for not selecting treatment as candidate factor is that adjuvant therapies are recommended for patients who have a potentially high risk for disease recurrence or death. Thus, if we include adjuvant therapies into the nomograms, it might result in a certain degree of bias. In our study, calibration plots showed optimal agreement between predicted and actual probabilities of 5- and 10-year OS and BCSS, thereby demonstrating the reliability of the established nomograms. The C-indexes of our nomograms for OS and BCSS were significantly higher than those for the TNM staging system in both training and validation cohorts, demonstrating good discrimination power. We also separated patients in both cohorts with distinct survival outcomes by stratifying them into three risk groups using total prognostic score. We believe that the identification of subgroups of patients at different risks might have an effect on treatment or care option. Nevertheless, several limitations should be considered while interpreting our results. First, we excluded a proportion of patients because of missing data for some important variables such as tumor grade, tumor size, and ER and PR status. This might have resulted in some bias in our models. Second, genetic factors and some prognostic parameters including HER2 status, multigene panel status, Ki-67 positivity, body mass index, and smoking status were not recorded in the SEER database between 1990 and 2010, but these factors might improve the robustness and effectiveness of the nomograms [41], [42], [43], [44], [45]. Third, the long duration of our study period (1990-2010) may affect the results due to the change of therapeutic strategies, including the establishment of breast-conserving surgery and sentinel lymph node biopsy, improvement of chemotherapy, and application of endocrine therapy and targeted therapy. Although information on radiation therapy and chemotherapy could be accessed from SEER database, they were not recommended for the analysis of survival due to the incompleteness of the variables and biases associated with who receives treatment according to the SEER program. Fourth, young age at diagnosis is a risk factor for recurrence [46], [47]. However, the SEER database does not provide information on disease recurrence; thus, we were unable to determine an individualized estimate of the risk of recurrence. Fifth, our models are limited by the retrospective nature of data collection, and thus, these nomograms must be further validated in a prospective cohort before being applied for clinical use.

Conclusion

Using a larger, population-based cohort, we established and validated novel nomograms for predicting the probability of OS and BCSS in young patients with breast cancer. Our developed nomograms perform excellently in both training and validation cohorts. Thus, these nomograms can assist clinicians to precisely estimate the survival of individuals and to identify patients at high risk of death who need more individualized and specialized treatment strategy. The following are the supplementary data related to this article.

Supplementary Table 1

Point Assignment of Nomograms for OS and BCSS

Supplementary Table 2

The Harrell’s C-Index for the Nomograms to Predict OS and BCSS

Supplementary Table 3

Risk Group and Estimated Survival in the Training Cohort Flow diagram for selection of the study cohort. Calibration curves for predicting (A) 5-year and (B) 10-year OS and (C) 5-year and (D) 10-year disease-specific survival (BCSS) in the training cohort. Nomogram-predicted survival is plotted on the x-axis, and actual survival is plotted on the y-axis. Vertical bars represent 95% CIs measured by Kaplan-Meier analysis. Dashed lines along the 45° line through the origin point represent a perfect calibration model. Kaplan-Meier curves for overall survival within each TNM stage (A, all patients; B-G, stage I-IIIC) according to risk group stratification in the training cohort. Subgroups with fewer than 20 patients were omitted from the graphs.

Declarations

Ethics Approval and Consent to Participate

Our study was approved by Shanghai Cancer Center Ethical Committee. Because cancer is a reportable disease in every state in the United States, informed patient consent is not required for the data released by the SEER database.

Consent for Publication

Not applicable.

Availability of Data and Material

The datasets generated and analyzed during the current study are available from Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence-SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2016 Sub (1973-2014 varying), National Cancer Institute, DCCPS, Surveillance Research Program, released April 2017, based on the November 2016 submission.

Competing Interests

The authors declare no competing financial interests.

Funding

This study was supported by a grant from the Ministry of Science and Technology of China (MOST2016YFC0900300, National Key R&D Program of China), grants from the National Natural Science Foundation of China (81672601, 81602311), and grants from the Shanghai Committee of Science and Technology Funds (15410724000, 15411953300). The funders had no role in the study design, collection and analysis of the data, decision to publish, or manuscript preparation.

Authors' Contributions

Conception and design: Y. G., P. J., X. H., and Z. M. S. Development of methodology: Y. G., P. J., WS, X. H., and Z. M. S. Acquisition of data: Y. G. and P. J.. Analysis and interpretation of data: Y. G., P. J., X. H., and Z. M. S. Writing, review, and/or revision of manuscript: Y. G., P. J., Y. Z. J., X. H., and Z. M. S. Study supervision: X. H. and Z. M. S. All authors read and approved the final manuscript.

42 in total

1. Kaplan-Meier methods yielded misleading results in competing risk scenarios.

Authors: Danielle A Southern; Peter D Faris; Rollin Brant; P Diane Galbraith; Colleen M Norris; Merril L Knudtson; William A Ghali
Journal: J Clin Epidemiol Date: 2006-10 Impact factor: 6.437

2. Prognostic models with competing risks: methods and application to coronary risk prediction.

Authors: Marcel Wolbers; Michael T Koller; Jacqueline C M Witteman; Ewout W Steyerberg
Journal: Epidemiology Date: 2009-07 Impact factor: 4.822

3. Elucidating prognosis and biology of breast cancer arising in young women using gene expression profiling.

Authors: Hatem A Azim; Stefan Michiels; Philippe L Bedard; Sandeep K Singhal; Carmen Criscitiello; Michail Ignatiadis; Benjamin Haibe-Kains; Martine J Piccart; Christos Sotiriou; Sherene Loi
Journal: Clin Cancer Res Date: 2012-01-18 Impact factor: 12.531

4. Population-based study evaluating and predicting the probability of death resulting from thyroid cancer and other causes among patients with thyroid cancer.

Authors: Limin Yang; Weidong Shen; Naoko Sakamoto
Journal: J Clin Oncol Date: 2012-12-26 Impact factor: 44.544

5. Isolated loco-regional recurrence of breast cancer is more common in young patients and following breast conserving therapy: long-term results of European Organisation for Research and Treatment of Cancer studies.

Authors: G H de Bock; J A van der Hage; H Putter; J Bonnema; H Bartelink; C J van de Velde
Journal: Eur J Cancer Date: 2005-11-28 Impact factor: 9.162

6. Distinct clinical and prognostic features of infiltrating lobular carcinoma of the breast: combined results of 15 International Breast Cancer Study Group clinical trials.

Authors: Bernhard C Pestalozzi; David Zahrieh; Elizabeth Mallon; Barry A Gusterson; Karen N Price; Richard D Gelber; Stig B Holmberg; Jurij Lindtner; Raymond Snyder; Beat Thürlimann; Elizabeth Murray; Giuseppe Viale; Monica Castiglione-Gertsch; Alan S Coates; Aron Goldhirsch
Journal: J Clin Oncol Date: 2008-05-05 Impact factor: 44.544

7. Young age at diagnosis correlates with worse prognosis and defines a subset of breast cancers with shared patterns of gene expression.

Authors: Carey K Anders; David S Hsu; Gloria Broadwater; Chaitanya R Acharya; John A Foekens; Yi Zhang; Yixin Wang; P Kelly Marcom; Jeffrey R Marks; Phillip G Febbo; Joseph R Nevins; Anil Potti; Kimberly L Blackwell
Journal: J Clin Oncol Date: 2008-07-10 Impact factor: 44.544

8. Recent trends in breast cancer among younger women in the United States.

Authors: Louise A Brinton; Mark E Sherman; J Daniel Carreon; William F Anderson
Journal: J Natl Cancer Inst Date: 2008-11-11 Impact factor: 13.506

9. Comparisons between different polychemotherapy regimens for early breast cancer: meta-analyses of long-term outcome among 100,000 women in 123 randomised trials.

Authors: R Peto; C Davies; J Godwin; R Gray; H C Pan; M Clarke; D Cutter; S Darby; P McGale; C Taylor; Y C Wang; J Bergh; A Di Leo; K Albain; S Swain; M Piccart; K Pritchard
Journal: Lancet Date: 2011-12-05 Impact factor: 79.321

10. Hormone receptor status, tumor characteristics, and prognosis: a prospective cohort of breast cancer patients.

Authors: Lisa K Dunnwald; Mary Anne Rossing; Christopher I Li
Journal: Breast Cancer Res Date: 2007 Impact factor: 6.466

7 in total

1. Genome Instability Profiles Predict Disease Outcome in a Cohort of 4,003 Patients with Breast Cancer.

Authors: Annette Lischka; Natalie Doberstein; Sandra Freitag-Wolf; Ayla Koçak; Timo Gemoll; Kerstin Heselmeyer-Haddad; Thomas Ried; Gert Auer; Jens K Habermann
Journal: Clin Cancer Res Date: 2020-06-10 Impact factor: 12.531

2. Nomogram for predicting cancer-specific survival in undifferentiated pleomorphic sarcoma: A Surveillance, Epidemiology, and End Results -based study.

Authors: Fengshuo Xu; Fanfan Zhao; Xiaojie Feng; Chengzhuo Li; Didi Han; Shuai Zheng; Yue Liu; Jun Lyu
Journal: Cancer Control Date: 2021 Jan-Dec Impact factor: 3.302

3. Competing-risks nomograms for predicting cause-specific mortality in parotid-gland carcinoma: A population-based analysis.

Authors: Fengshuo Xu; Xiaojie Feng; Fanfan Zhao; Qiao Huang; Didi Han; Chengzhuo Li; Shuai Zheng; Jun Lyu
Journal: Cancer Med Date: 2021-05-07 Impact factor: 4.452

4. The E3 Ubiquitin Ligase Cbl-b Predicts Favorable Prognosis in Breast Cancer.

Authors: Xiuming Liu; Yuee Teng; Xin Wu; Zhi Li; Bowen Bao; Yunpeng Liu; Xiujuan Qu; Lingyun Zhang
Journal: Front Oncol Date: 2020-05-05 Impact factor: 6.244

5. Development and validation of a nomogram to predict the prognosis of patients with squamous cell carcinoma of the bladder.

Authors: Mei-Di Hu; Si-Hai Chen; Yuan Liu; Ling-Hua Jia
Journal: Biosci Rep Date: 2019-12-20 Impact factor: 3.840

6. Development and validation of a novel prognostic model for long-term overall survival in liposarcoma patients: a population-based study.

Authors: Shuai Cao; Jie Li; Kai Yang; Jun Zhang; Jiawei Xu; Chaoshuai Feng; Haopeng Li
Journal: J Int Med Res Date: 2020-12 Impact factor: 1.671

7. Construction and Validation of Nomograms Predicting Survival in Triple-Negative Breast Cancer Patients of Childbearing Age.

Authors: Xiang Cui; Deba Song; Xiaoxu Li
Journal: Front Oncol Date: 2021-02-08 Impact factor: 6.244

7 in total