Literature DB >> 29096247

A Long Noncoding RNA Signature That Predicts Pathological Complete Remission Rate Sensitively in Neoadjuvant Treatment of Breast Cancer.

Gen Wang¹, Xiaosong Chen², Yue Liang², Wei Wang², Kunwei Shen³.

Abstract

BACKGROUND: Mounting evidence suggests that long noncoding RNAs (lncRNAs) are closely related to pathological complete response (pCR) in neoadjuvant treatment of breast cancer. Here, we construct lncRNA associated models to predict pCR rate.
METHODS: LncRNA expression profiles of breast cancer patients treated with neoadjuvant chemotherapy (NAC) were obtained from Gene Expression Omnibus by repurposing existing microarray data. The prediction model was firstly built by analyzing the correlation between pCR and lncRNA expression in the discovery dataset GSE 25066 (n=488). Another three independent datasets, GSE20194 (n=278), GSE20271 (n=178), and GSE22093 (n=97), were integrated as the validation cohort to assess the prediction efficiency.
RESULTS: A novel lncRNA signature (LRS) consisting of 36 lncRNAs was identified. Based on this LRS, patients with NAC treatment were divided into two groups: LRS-high group and LRS-low group, with positive correlation of pCR rate in the discovery dataset. In the validation cohort, univariate and multivariate analyses both demonstrated that high LRS was associated with higher pCR rate. Subgroup analysis confirmed that this model performed well in luminal B [odds ratio (OR)=5.4; 95% confidence interval (CI)=2.7-10.8; P=1.47e-06], HER2-enriched (OR=2.5; 95% CI=1.1-5.7; P=.029), and basal-like (OR=5.5; 95% CI=2.3-16.2; P=5.32e-04) subtypes. Compared with other preexisting prediction models, LRS demonstrated better performance with higher area under the curve. Functional annotation analysis suggested that lncRNAs in this signature were mainly involved in cancer proliferation process.
CONCLUSION: Our findings indicated that our lncRNA signature was sensitive to predict pCR rate in the neoadjuvant treatment of breast cancer, which deserves further evaluation.

Entities: Chemical Disease Gene Species

Year: 2017 PMID： 29096247 PMCID： PMC5671410 DOI： 10.1016/j.tranon.2017.09.005

Source DB: PubMed Journal: Transl Oncol ISSN： 1936-5233 Impact factor: 4.243

Introduction

Breast cancer, one of the most common cancers, is still a fatal malignant tumor that could lead to nearly half a million of deaths in a year [1]. Neoadjuvant chemotherapy (NAC) has emerged as a new and effective option for locally advanced breast cancer. By shrinking tumor size before surgery, NAC is able to improve the feasibility of conventional breast operation and conservative surgery in locally advanced breast cancer patients [2]. In addition, numerous studies have shown that a pathological complete response (pCR) after NAC is significantly correlated with good further clinical outcome [3]. However, other data also demonstrated that only less than 40% to 50% breast cancer patients can achieve pCR after NAC [4]. Therefore, clinicians usually need to make a decision as to whether NAC is necessary or not for breast cancer patients by evaluating the pros and cons of chemotherapy before the operation according to patients' tumor stage, histologic grade, age, estrogen receptor (ER), and human epidermal growth factor receptor 2 (HER2) status as these clinicopathological characteristics are associated with the probability of pCR [5]. Meanwhile, utilizing different mRNA signatures to predict pCR rate when making a decision is also attracting researchers' attention. Over the past decades, breast cancer has been classified into five different “intrinsic subtypes”, including luminal A, luminal B, HER2 enriched, basal-like, and normal-like according to its mRNA expression pattern [6]. To date, much data have exhibited that different subtypes of breast cancer have distinct biological behaviors as well as responses to NAC [7]. Specifically, basal-like tumors generally achieve much higher pCR rate than luminal A [8]. Oncotype DX, Gene70, and Gene Expression Grade Index (GGI) signature are previously established mRNA signatures employed to predict breast cancer patient survival [9], [10], [11]. It has also been shown that high expression level of these mRNA signatures is associated with higher probability of pCR [12]. As a matter of fact, at least 98% of the human genome is transcribed into noncoding RNAs rather than protein-coding mRNAs, implying that these novel moleculars also play a vital role in biological processes and are potential biomarkers to powerfully predict NAC response in breast cancer in addition to mRNA [13]. Long noncoding RNAs (lncRNAs) are generally non–protein-coding transcripts longer than 200 nucleotides in length [14]. These long RNAs function directly at RNA level rather than being translated into functional proteins. Recent studies have revealed that lncRNAs are involved in varieties of biological processes including tumorigenesis, invasion, metastasis, and drug resistance of breast cancer at both transcriptional and posttranscriptional levels [15], [16], [17]. What's more, a growing number of lncRNAs are validated to be associated with patients' outcomes and pCR in NAC, thus providing a new option for model construction to predict pCR of NAC besides mRNA signature [18]. However, there is no lncRNA signature built based on a large number of breast cancer patients treated with NAC up to now. In this study, we aimed at identifying an lncRNA signature to fully investigate the relationship between lncRNA expression pattern and pCR rate in breast cancer patients with NAC treatment. By reannotating previously published Affymetrix HG-U133A array profiles from Gene Expression Omnibus (GEO), we established an lncRNA signature comprising of 36 lncRNAs from NAC-treated patients. Then, the prediction efficiency of this signature was further assessed in a validation cohort with three independent datasets. Our final finding indicated that this signature could be potentially utilized as a novel biomarker to predict pCR rate in addition to traditional clinicopathological markers and mRNA signature, which needs further evaluation.

Materials and Methods

Gene Expression Profiles of NAC-Treated Patients Obtained from GEO

We searched for qualified gene expression datasets in regard to breast cancer with NAC treatment in GEO database, which is publically available. Only datasets that met the following criteria were selected: First, gene expression data were assayed using Affymetrix HG-U133 A platform. Second, pCR in the study was defined as the absence of invasive tumor in both breast and lymph nodes [19]. Third, nearly or more than 100 cases were included in the dataset. Finally, 1041 NAC-treated breast cancer patients in total were collected from GSE25066 [20], GSE20194 [21], GSE20271 [22], and GSE22093 [23] datasets. GSE25066, which consists of 488 samples, was used as the discovery dataset for model construction, while another 553 patients from GSE20194, GSE20271, and GSE22093 were combined and used as the validation dataset. Patients' characteristics of both discovery and validation datasets are listed in Table 1.

Table 1

NAC Patients' Clinicopathological Characteristics Included in the Discovery and Validation Cohort

Characteristics	DiscoveryGSE25066(n = 488)	Validation
Characteristics	DiscoveryGSE25066(n = 488)	All Trials(n = 553)	GSE20194(n = 278)	GSE20271(n = 178)	GSE22093(n = 97)
Age
≤50	268	287	133	100	54
>50	220	264	144	78	42
Unknown	0	2	1	0	1
ER
Negative	200	249	114	80	55
Positive	287	304	164	98	42
Unknown	1	0	0	0	0
HER2
Negative	466	463	219	151	97
Positive	5	85	59	26	0
Unknown	17	1	0	1	0
cT
T0-1	29	42	26	13	3
T2	245	274	147	76	51
T3	140	113	50	37	26
T4	74	121	53	51	17
Unknown	0	3	2	1	0
cN
N0	154	159	79	59	21
N1	231	212	125	71	16
N2	64	79	31	38	10
N3	39	54	42	9	3
Unknown	0	49	1	1	47
Grade
1-2	201	225	117	76	32
3	252	271	152	72	47
Unknown	35	57	9	30	18
pCR
Yes	99	110	56	26	28
No	389	443	222	152	69

cT, clinical tumor stage; cN, clinical nodal status.

NAC Patients' Clinicopathological Characteristics Included in the Discovery and Validation Cohort cT, clinical tumor stage; cN, clinical nodal status.

Interrogate lncRNA Expression by Repurposing Microarray Probes

All raw data were normalized by Robust Multichip Average algorithm [24]. ComBat was utilized to remove the potential batch effects when combining batches of gene expression dataset [25]. LncRNA expression of the NAC-treated patients was obtained by remapping the probes of the array to human genome (GRCh38/hg38) using SeqMap [26] and then matching the probes to the lncRNA chromosomal positions from GENCODE (release 24, GRCh38) [27], [28], [29]. A total of 616 probes corresponding to 463 lncRNAs were obtained in the end. LncRNA expression values of multiple probes that target the same lncRNA were averaged arithmetically. Figure 1A demonstrated the order of the analysis and model construction.

Figure 1

The construction and optimization of the LRS model for pCR prediction. (A) The diagram of the construction and validation of the lncRNA pCR prediction model. (B) The 44 lncRNAs significantly different between non-pCR and pCR cases in the discovery dataset were employed for constructing the predictive DLDA model. The accuracy, sensitivity, specificity, PPVs, NPVs, and F1 score were calculated with each number of lncRNAs. This LRS ranked the highest F1 score when comprising the first 36 lncRNAs.

Diagonal Linear Discriminant Analysis Prediction Model Construction

Of the aforementioned 463 lncRNAs, 44 lncRNAs in the discovery dataset were found to be differentially expressed between the non-pCR and pCR patients by Welch's t test. (P < .001). These 44 lncRNAs were further used to construct the predictive model with Diagonal Linear Discriminant Analysis (DLDA) [30]. According to a previously described method, we employed the forward filtering method to optimize the model [31]. In the end, the DLDA model of 36 lncRNAs ranked the highest F1 score in the discovery dataset and was therefore defined as the lncRNA signature (LRS) for pCR prediction of NAC.

Comparison of the Efficiency of LRS with Other Signatures

Pam50 intrinsic subtypes, Oncotype DX, Gene70, and GGI score were obtained by using “genefu” package in R [32]. The receiver operating characteristic (ROC) curves of LRS and other signatures were plotted and compared by R package ROCR [33]. The area under the curve (AUC) was calculated correspondingly.

Function Annotation of the lncRNA Signature

Integrative analysis of lncRNA-mRNA association was employed to infer the potential function of the lncRNAs included in the LRS. We calculated Pearson correlation between lncRNAs and mRNAs in the discovery cohort to identify mRNAs that positively coexpressed with lncRNAs in LRS (Pearson correlation coefficient > 0.4 and ranked top 0.5%) [29], [34]. Then these mRNAs were further annotated by the Database for Annotation, Visualization, and Integrated Discovery (DAVID) using the functional annotation clustering option (version 6.8) [35], [36]. Clusters with enrichment score higher than 3.0 and functional annotations with P value lower than .001 were considered to be statistically significant. Finally, Cytoscape with the Enrichment Map plugin was used to visualize the significant enrichment results [37].

Statistical Analysis

The microarray datasets downloaded from GEO database were analyzed by R software (version 3.3.1) and Bioconductor. The clinicopathological parameters in the datasets were analyzed using two-sided χ2 test and Fisher's exact test with a P value < .05 as the cutoff. The logistic regression model was employed to perform the univariate and multivariate analyses.

Result

Establishment and Validation of the lncRNA Prediction Model for pCR

By reannotating microarray probes in the discovery dataset, 616 probes that corresponded to 463 lncRNAs were identified in total. Of the 463 lncRNAs we found, 44 lncRNAs differentially expressed between non-pCR (n = 389) and pCR (n = 99) patients with P < .01, which were further used for model construction. Next, through DLDA with model optimization by leaving one out cross-validation and forward filtering method, we finally observed that the DLDA model of 36 lncRNAs ranked the highest F1 score, which divided patients into two groups: chemotherapy-sensitive group (n = 202) with high LRS score and chemotherapy-insensitive group (n = 286) with low LRS score (Figure 1B). The accuracy, sensitivity, specificity, positive predictive values (PPVs), negative predictive values (NPVs), and F1 score were 69.9%, 77.8%, 67.9%, 38.1%, 92.3%, and 51.2%, respectively. The heat map illustrated that these two groups based on LRS score had distinct lncRNA expression patterns (Figure 2A). More importantly, the high–LRS score group was more likely to achieve a higher pCR rate when compared with the other one (Figure 2C, P = 4.733e-16). Multivariate analysis showed that only tumor grade (P = .04922), LRS score (P = .00373), and intrinsic subtype (P = .02463) were independent factors for pCR in the discovery set (Table 2).

Figure 2

Table 2

Univariate and Multivariate Analysis for Parameters Associated with pCR in the Discovery Dataset

Characteristics	n	pCR	Univariate Analysis			Multivariate Analysis
Characteristics	n	pCR	OR	95% CI	P	OR	95% CI	P
Age
≤50	268	60	1
>50	220	39	0.75	0.47-1.17	.204
Unknown
ER
Negative	200	69	1
Positive	287	30	0.22	0.14-0.35	6.20e-10	0.70	0.38-1.28	.25115
Unknown	1
HER2
Negative	466	93	1
Positive	5	2	2.67	0.35-16.36	.285
Unknown	17	4
cT
T0-2	274	58	1
T3-4	214	41	0.88	0.56-1.38	.584
Unknown	0
cN
Negative	154	28	1
Positive	334	71	1.21	0.75-2.00	.433
Unknown
Grade
1-2	201	13	1
3	252	77	6.36	3.53-12.36	5.73e-09	2.07	1.02-4.40	.04922
Unknown	35
lncRNA score
Low score	286	22	1
High score	202	77	7.39	4.47-12.68	4.41e-14	2.74	1.40-5.51	.00373
Intrinsic subtype
Luminal A	124	2	1
Others	364	97	25.78	7.99-157.88	6.76e-06	6.91	1.50-50.6	.02463
Oncotype DX
Low & medium score	80	2	1
High score	408	97	12.16	3.74-74.79	5.73e-04	1.41	0.33-9.76	.67782
Gene70
Low score	82	3	1
High score	406	96	8.15	2.96-33.77	4.63e-04	0.92	0.25-4.49	.91205
GGI
Low score	195	13	1
High score	293	86	5.82	3.25-11.24	2.15e-08	0.94	0.41-2.23	.88345

The lncRNA expression profile and pCR prediction were analyzed in the discovery and validation dataset. NAC patients were classified into LRS-high (predicted pCR) and LRS-low (predicted non-pCR) groups. lncRNA expression profile and pCR distribution are shown in (A) the discovery dataset and (B) the validation dataset. Then, patients' pCR statuses in different LRS groups were further compared in (C) the discovery dataset and (D) the validation dataset. Univariate and Multivariate Analysis for Parameters Associated with pCR in the Discovery Dataset To further evaluate the prediction efficiency of pCR with the LRS, we collected a total of 553 breast cancer patients with NAC treatment by integrating three independent datasets (GSE20194, GSE20271, and GSE22093) as the validation cohort. As expected, the results gained by using validation dataset (Figure 2, B and D) shared similar findings with those of the discovery dataset, with distinct lncRNA expression patterns and higher pCR rate in the high–LRS score group (P = 6.026e-16). The accuracy, sensitivity, specificity, PPV, NPV, and F1 score for validation dataset were 68.2%, 77.3%, 65.9%, 36.0%, 92.1%, and 49.1%, respectively. Multivariate analysis demonstrated that only ER status (P = 3.79e-05), HER2 status (P = .01147), and LRS score (P = 2.97e-05) were independent factors for predicting pCR in the validation dataset (Table 3).

Table 3

Univariate and Multivariate Analysis for Predicting Parameters Associated with pCR in the Validation Dataset

Characteristics	n	pCR	Univariate Analysis			Multivariate Analysis
Characteristics	n	pCR	OR	95% CI	P	OR	95% CI	P
Age
≤50	287	58	1
>50	264	52	0.97	0.64-1.47	.881
Unknown	2	0
ER
Negative	249	83	1			1
Positive	304	27	0.19	0.12-0.31	1.51e-11	0.28	0.15-0.50	3.79e-05
Unknown	0	0
HER2
Negative	463	81	1			1
Positive	85	29	2.00	1.25-3.26	4.33e-3	2.18	1.19-4.00	.01147
Unknown	1	0
cT
T0-2	316	66	1
T3-4	234	44	0.88	0.57-1.34	.546
Unknown	3	0
cN
Negative	159	21	1
Positive	345	66	1.56	0.93-2.71	.101
Unknown	49	23
Grade
1-2	225	18	1			1
3	271	80	4.84	2.86-8.60	1.67e-08	1.58	0.82-3.11	.17662
Unknown	57	12
lncRNA score
Low score	317	25	1			1
High score	236	85	6.57	4.10-10.89	3.6e-14	4.17	2.18-8.36	2.97e-05
Intrinsic Subtype
Luminal A	122	5	1			1
Others	431	105	7.54	3.31-21.73	1.74e-05	0.99	0.28-3.96	.98660
Oncotype DX
Low & medium score	81	3	1			1
High score	472	107	7.62	2.77-31.51	6.87e-04	1.07	0.27-5.42	.92960
Gene70
Low score	101	5	1			1
High score	452	105	5.81	2.54-16.80	1.93e-04	0.86	0.27-3.04	.80190
GGI
Low score	226	21	1			1
High score	327	89	3.65	2.23-6.22	6.74e-07	1.35	0.65-2.89	.43185

Univariate and Multivariate Analysis for Predicting Parameters Associated with pCR in the Validation Dataset

Relationship between Clinicopathological Features and LRS Model

We uncovered that high LRS score was significantly associated with ER negativity (P < 2.2e-16), T3 and T4 tumors’ size (P < .017), lymph node positivity (P < .026), grade 3 tumor (P < 2.2e-16), and non–luminal A subtype (P < .026) in discovery dataset. In the validation set, high–LRS score patients were more likely to be ER negative (P < 2.2e-16), HER2 positive (P = .047), grade 3 (P < 2.2e-16), and non–luminal A subtype (P < 2.2e-16). (Table 4).

Table 4

Association between Clinicopathological Parameters and LRS in the Discovery and Validation Dataset

Characteristics	Discovery Set			Validation Set
Characteristics	LRS Low	LRS High	P	LRS Low	LRS High	P
n	286	202		317	236
Age
≤50	154	114		168	119
>50	132	88	.6356	148	116	.6165
Unknown	0	0		1	1
ER
Negative	43	157		85	164
Positive	242	45	<2.2e-16	232	72	<2.2e-16
Unknown	1	0		0	0
HER2
Negative	275	191		277	190
Positive	1	4	.1651	40	45	.0474
Unknown	10	7		0	1
cT
T0-2	174	100		186	130
T3-4	112	102	.01673	128	106	.3749
Unknown	0	0		3	0
cN
Negative	102	52		104	55
Positive	184	150	.02615	202	143	.1717
Unknown	0	0		10	39
Grade
1-2	174	27		176	49
3	97	155	<2.2e-16	101	170	<2.2e-16
Unknown	15	20		40	17
Intrinsic subtype
Luminal A	135	2		117	5
Others	151	200	<2.2e-16	200	231	<2.2e-16
Oncotype DX
Low & medium score	79	1		77	4
High score	207	201	4.211e-15	240	232	2.647e-13
Gene70
Low score	81	1		96	5
High score	205	201	1.527e-15	221	231	<2.2e-16
GGI
Low score	180	15		188	38
High score	106	187	<2.2e-16	129	198	<2.2e-16
pCR status
pCR	22	77		25	85
Non-pCR	264	125	<2.2e-16	292	151	<2.2e-16

Association between Clinicopathological Parameters and LRS in the Discovery and Validation Dataset

Evaluation of the Prediction of LRS in Different Intrinsic Subtypes

By using PAM50 subtypes classification, 259 cases of luminal A, 298 cases of luminal B, 135 cases of HER2-enriched, 311 cases of basal-like, and 38 cases of normal-like subtype were identified in the whole 1041 NAC-treated patients. The normal-like subtype was excluded in the subgroup analysis because of its small sample size. In luminal A subtype, which is less likely to undergo NAC, LRS failed to predict pCR rate. However, in luminal B, HER2-enriched, and basal-like subtypes, high LRS score was significantly associated with a higher chance to achieve pCR (Figure 3A).

Figure 3

The evaluation of the predictive power of LRS in the subgroup analysis and comparison of ROCs with other different prediction models. As determined by Pam50, luminal A, luminal B, HER2-enriched, and basal-like subtypes of breast cancer were subjected to subgroup analysis to evaluate the predictive power of LRS in all NAC patients (A). Then, ROCs of the LRS, Oncotype DX, Gene70, and GGI were compared in (B) the discovery and (C) validation datasets. The AUC was also calculated for each curve. In detail, for luminal B subtype, LRS-high patients achieved a higher pCR rate compared with LRS-low patients [32.9% vs 8.3%; odds ratio (OR) = 5.4; 95% confidence interval (CI) = 2.7-10.8; P = 1.47e-06]. That is quite similar for HER2-enriched and basal-like subtypes, with 35.1% versus 18%; OR = 2.5; 95% CI = 1.1-5.7; P = .029 and 40.6% versus 11.1%; OR = 5.5; 95% CI = 2.3-16.2; P = 5.32e-04, respectively.

Comparison of LRS Predictive Power with Other Preexisting Signatures

We then compared the predictive capability of LRS with other preexiting signatures by ROC curves (Figure 3, B and C). The AUC value of LRS was approximately 0.8, indicating that it can effectively distinguish pCR patients from non-pCR patients. Compared with other preexisting predictive signatures, our LRS performed better with higher AUC values in both discovery and validation datasets when used to predict pCR response of patients undergoing NAC.

Combination of LRS with Other Different pCR Predictors

In order to inspect the clinical significance of LRS, we then integrated LRS with age, ER status, HER2 status, tumor size, lymph node metastasis status, tumor grade, Gene21 index, Gene70 index, and GGI which divided patients into four groups respectively. Breast cancer patients with high LRS score and younger age (≤50 years old), negative ER status, positive HER2 status, smaller tumor size (≤5 cm), positive lymph node, tumor of grade 3, Gene21 index high, Gene70 index high, and GGI high were significantly more likely to achieve pCR after NAC (Figure 4), which implied that LRS could be potentially utilized to improve pCR prediction in combination with clinicopathological biomarkers and preexisting predictive models in clinical practice.

Figure 4

The results of the analysis combining LRS and other different pCR predictors performed for the whole group of patients. In order to inspect the clinical significance of LRS, we combined LRS with other factors which may be related to pCR. Age (A), ER status (B), HER2 status (C), tumor size (D), lymph node metastasis status (E), tumor grade (F), Gene21 index (G), Gene70 index (H), and GGI (I) which were available in our cohort were integrated, respectively, with LRS and divided patients into four groups. The pCR rates were compared among each of the four groups. Enrichment analysis of the mRNAs positively correlated with lncRNAs in LRS. mRNAs that positively correlated with lncRNAs in LRS were analyzed by DAVID functional annotation tool. And results were organized by functional enrichment map in Cytoscape (A). Each node represents a functional term annotated by the DAVID tool. The size of the node represents the number of genes in the terms. (B) Then, the significant GO terms in the functional annotation were illustrated with barplot according to their P value.

Identification of Biological Processes Related to the lncRNA Signature

We conducted Gene Ontology (GO) enrichment analysis to determine the biological processes associating with LRS [38]. To this end, we firstly employed Pearson correlation to identify the most correlated mRNAs with each of our lncRNAs (correlation effecient > 0.4 and ranked top 0.5%). Then, using functional annotation clustering option from DAVID (https://david.ncifcrf.gov/, version 6.8), we collected annotation clusters with enrichment score > 3.0 and P value less than .001 (supplementary file). Finally, Cytoscape was employed to visualize the significantly enriched clusters based on similar functions (Figure 5, A), and statistically significant GO processes were also plotted by P value (Figure 5, B). Surprisingly, cancer-related functional clusters such as DNA replication, cell cycle, cell-cell adhesion, and microtubule metabolism were clearly identified, indicating that the LRS signature might mainly be related to cancer proliferation process.

Figure 5

Enrichment analysis of the mRNAs positively correlated with lncRNAs in LRS. mRNAs that positively correlated with lncRNAs in LRS were analyzed by DAVID functional annotation tool. And results were organized by functional enrichment map in Cytoscape (A). Each node represents a functional term annotated by the DAVID tool. The size of the node represents the number of genes in the terms. (B) Then, the significant GO terms in the functional annotation were illustrated with barplot according to their P value.

Discussion

During the past decades, lots of effort have been devoted by clinicians to develop tools for the prediction of the response to NAC in order that the candidate patients could undergo the most suitable therapy. Since the response to NAC is related to patients' clinicopathological features/characteristics, a pCR prediction nomogram has been built based on clinical stage, ER status, histologic grade, and number of preoperative chemotherapy cycles [39]. Moreover, some researchers also utilize Oncotype DX and Gene70 score (Mamaprint score), which are the most widely accepted mRNA signatures, for prediction of response to NAC for luminal subtype breast cancer. Until now, a number of models have been utilized to predict breast cancer survival and/or pCR in NAC. In this study, we identified a 36-lncRNA signature to predict pCR with high specificity by using lncRNA remapping approach in 488 breast cancer patients treated with NAC. Furthermore, LRS signature was demonstrated to be an independent factor that highly related to the pCR of NAC-treated patients by univariate and multivariate analysis in both discovery and validation sets. It has also been shown that the combination of LRS with other pCR predictors significantly improves the prediction of pCR in NAC. This LRS was the first lncRNA signature built on a large scale of NAC-treated patients. It could be potentially utilized as a prediction tool with high specificity to assess the possibilities of pCR for patients who undergo NAC, in addition to traditional clinicopathological biomarkers and mRNA signatures. Oncotype DX score and Gene70 score have been originally built to predict patients' prognosis, while GGI has been developed for better assessing histologic grade firstly. Although these mRNA signatures can also predict the pCR rate despite their original goal, they were not specifically developed for patients treated with NAC. Until now, there is still no gene signature specifically designed for pCR prediction to NAC. It is noteworthy that our LRS was the first gene model developed for pCR rate prediction to NAC patients specifically. With this end of LRS, clinicians could select the most suitable candidates for NAC to better realize individualized therapy. Luminal B, HER2-enriched, and basal-like are the three major subtypes of breast cancers that are more likely to go through NAC. In these three subgroups, LRS distinguished non-pCR from pCR patients significantly, although it did not work well for HER2-enriched subgroup which might be mainly due to its distinct biological behaviors. More importantly, it should be noted that all patients in our datasets treated with NAC regiments did not receive trastuzumab. HER2, which drives breast cancer progression, invasion, and metastasis, was reported to have an unneglectable influence on NAC in varieties of ways [40], [41], [42]. Without the treatment of trastuzumab, lower pCR rate might be induced by HER2, which led to the relatively inaccuracy of our signature in the HER2-enriched subtype. Meanwhile, we also included luminal A subtype in the subgroup analysis, which usually does not undergo NAC. LRS was not able to predict NAC responses effectively in luminal A subtype, which is also a common issue for many other predictive models with unknown reasons [43]. In both discovery and validation datasets, patients with high LRS score were more likely to be ER negative and grade 3. Previous studies suggested that ER-negative breast cancer especially triple-negative breast cancer is generally more aggressive or grows faster than ER-positive [44]. On the other hand, grade 3 tumors typically also have a higher proliferation rate than lower grades [45]. What's more, gene enrichment analysis demonstrated that these 36 lncRNAs were mainly associated with tumor proliferation. Therefore, we hypothesize that patients with high LRS score were more sensitive to NAC because of the high proliferation rate of their tumors. Chemotherapeutic agents are usually cytotoxic by means of interfering with cell division [46]. Cancer cells are more susceptible to these agents because they have a higher proliferation rate than normal cells. In other words, high cell proliferation rate leads to a high sensitivity of chemotherapy [47]. Patients with high LRS score were more likely to be with high proliferation tumors, which caused their being more sensitive to NAC. It must be acknowledged that there are some limitations in this study. First, lncRNAs in our signature came from the reannotation of probes in microarray platform. LncRNAs that could not be identified by repurposing microarray data could be omitted and thus might affect the sensitivity and specificity of the analysis results. Second, only a handful of lncRNAs identified in this study have been functionally characterized before. Experimental studies on these lncRNAs are desperately needed to provide important information to understand their functional roles. What's more, further validation of the signature in clinical trials will be a better option to finally turn it into clinical practice. In conclusion, our study suggests an important role of the lncRNA signature in predicting pCR rate of NAC in addition to traditional clinicopathological markers and mRNA signature, which might deserve further evaluation. What's more, these findings also imply a therapeutic strategy through targeting these lncRNAs to improve patients' clinical outcomes in the future. The following are the supplementary data related to this article.

Competing Interests

The authors declare that they have no competing interests. All authors read and approved the final manuscript.

46 in total

1. Neoadjuvant therapy with paclitaxel followed by 5-fluorouracil, epirubicin, and cyclophosphamide chemotherapy and concurrent trastuzumab in human epidermal growth factor receptor 2-positive operable breast cancer: an update of the initial randomized study population and data of additional patients treated with the same regimen.

Authors: Aman U Buzdar; Vicente Valero; Nuhad K Ibrahim; Deborah Francis; Kristine R Broglio; Richard L Theriault; Lajos Pusztai; Marjorie C Green; Sonja E Singletary; Kelly K Hunt; Aysegul A Sahin; Francisco Esteva; William F Symmans; Michael S Ewer; Thomas A Buchholz; Gabriel N Hortobagyi
Journal: Clin Cancer Res Date: 2007-01-01 Impact factor: 12.531

2. Gene modules and response to neoadjuvant chemotherapy in breast cancer subtypes: a pooled analysis.

Authors: Michail Ignatiadis; Sandeep K Singhal; Christine Desmedt; Benjamin Haibe-Kains; Carmen Criscitiello; Fabrice Andre; Sherene Loi; Martine Piccart; Stefan Michiels; Christos Sotiriou
Journal: J Clin Oncol Date: 2012-04-16 Impact factor: 44.544

Review 3. Neoadjuvant treatments for triple-negative breast cancer (TNBC).

Authors: G von Minckwitz; M Martin
Journal: Ann Oncol Date: 2012-08 Impact factor: 32.976

4. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation.

Authors: Daniele Merico; Ruth Isserlin; Oliver Stueker; Andrew Emili; Gary D Bader
Journal: PLoS One Date: 2010-11-15 Impact factor: 3.240

Review 5. Long noncoding RNAs and tumorigenesis: genetic associations, molecular mechanisms, and therapeutic strategies.

Authors: Fan Zhang; Liang Zhang; Caiguo Zhang
Journal: Tumour Biol Date: 2015-11-19

6. Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network.

Authors: Qi Liao; Changning Liu; Xiongying Yuan; Shuli Kang; Ruoyu Miao; Hui Xiao; Guoguang Zhao; Haitao Luo; Dechao Bu; Haitao Zhao; Geir Skogerbø; Zhongdao Wu; Yi Zhao
Journal: Nucleic Acids Res Date: 2011-01-18 Impact factor: 16.971

7. Response and survival of breast cancer intrinsic subtypes following multi-agent neoadjuvant chemotherapy.

Authors: Aleix Prat; Cheng Fan; Aranzazu Fernández; Katherine A Hoadley; Rossella Martinello; Maria Vidal; Margarita Viladot; Estela Pineda; Ana Arance; Montserrat Muñoz; Laia Paré; Maggie C U Cheang; Barbara Adamo; Charles M Perou
Journal: BMC Med Date: 2015-12-18 Impact factor: 8.775

8. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors: Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal: Nucleic Acids Res Date: 2008-11-25 Impact factor: 16.971

9. Comparison of prognostic gene expression signatures for breast cancer.

Authors: Benjamin Haibe-Kains; Christine Desmedt; Fanny Piette; Marc Buyse; Fatima Cardoso; Laura Van't Veer; Martine Piccart; Gianluca Bontempi; Christos Sotiriou
Journal: BMC Genomics Date: 2008-08-21 Impact factor: 3.969

10. Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer.

Authors: Zhou Du; Teng Fei; Roel G W Verhaak; Zhen Su; Yong Zhang; Myles Brown; Yiwen Chen; X Shirley Liu
Journal: Nat Struct Mol Biol Date: 2013-06-02 Impact factor: 15.369

6 in total

1. Long Noncoding RNA Signature and Disease Outcome in Estrogen Receptor-Positive Breast Cancer Patients Treated with Tamoxifen.

Authors: Gen Wang; Xiaosong Chen; Yue Liang; Wei Wang; Yan Fang; Kunwei Shen
Journal: J Breast Cancer Date: 2018-09-12 Impact factor: 3.588

2. LncRNA Profile Study Reveals a Three-LncRNA Signature Associated With the Pathological Complete Response Following Neoadjuvant Chemotherapy in Breast Cancer.

Authors: Ying Zeng; Guo Wang; Cheng-Fang Zhou; Hai-Bo Zhang; Hong Sun; Wei Zhang; Hong-Hao Zhou; Rong Liu; Yuan-Shan Zhu
Journal: Front Pharmacol Date: 2019-05-28 Impact factor: 5.810

3. Single-cell long noncoding RNA (lncRNA) transcriptome implicates MALAT1 in triple-negative breast cancer (TNBC) resistance to neoadjuvant chemotherapy.

Authors: Hibah Shaath; Radhakrishnan Vishnubalaji; Ramesh Elango; Shahryar Khattak; Nehad M Alajez
Journal: Cell Death Discov Date: 2021-01-25

4. Machine learning models based on immunological genes to predict the response to neoadjuvant therapy in breast cancer patients.

Authors: Jian Chen; Li Hao; Xiaojun Qian; Lin Lin; Yueyin Pan; Xinghua Han
Journal: Front Immunol Date: 2022-07-22 Impact factor: 8.786

5. Dissecting the Role of N6-Methylandenosine-Related Long Non-coding RNAs Signature in Prognosis and Immune Microenvironment of Breast Cancer.

Authors: Jinguo Zhang; Benjie Shan; Lin Lin; Jie Dong; Qingqing Sun; Qiong Zhou; Jian Chen; Xinghua Han
Journal: Front Cell Dev Biol Date: 2021-10-06

6. LncRNA BCAR4 expression predicts the clinical response to neoadjuvant chemotherapy in patients with locally advanced breast cancer.

Authors: Feng-Jiao Gan; Yi Li; Meng-Xi Xu; Tie Zhou; Shun Wu; Kang Hu; Yan Li; Su-Hong Sun; Qing Luo
Journal: Cancer Biomark Date: 2021 Impact factor: 4.388

6 in total