Literature DB >> 31432096

Identification of an eight-lncRNA prognostic model for breast cancer using WGCNA network analysis and a Cox‑proportional hazards model based on L1-penalized estimation.

Zhenbin Liu¹, Menghu Li¹, Qi Hua¹, Yanfang Li¹, Gang Wang¹.

Abstract

An ever‑increasing number of long noncoding (lnc)RNAs has been identified in breast cancer. The present study aimed to establish an lncRNA signature for predicting survival in breast cancer. RNA expression profiling was performed using microarray gene expression data from the National Center for Biotechnology Information Gene Expression Omnibus, followed by the identification of breast cancer‑related preserved modules using weighted gene co‑expression network (WGCNA) network analysis. From the lncRNAs identified in these preserved modules, prognostic lncRNAs were selected using univariate Cox regression analysis in combination with the L1‑penalized (LASSO) Cox‑proportional Hazards (Cox‑PH) model. A risk score based on these prognostic lncRNAs was calculated and used for risk stratification. Differentially expressed RNAs (DERs) in breast cancer were identified using MetaDE. Gene Set Enrichment Analysis pathway enrichment analysis was conducted for these prognostic lncRNAs and the DERs related to the lncRNAs in the preserved modules. A total of five preserved modules comprising 73 lncRNAs were mined. An eight‑lncRNA signature (IGHA1, IGHGP, IGKV2‑28, IGLL3P, IGLV3‑10, AZGP1P1, LINC00472 and SLC16A6P1) was identified using the LASSO Cox‑PH model. Risk score based on these eight lncRNAs could classify breast cancer patients into two groups with significantly different survival times. The eight‑lncRNA signature was validated using three independent cohorts. These prognostic lncRNAs were significantly associated with the cell adhesion molecules pathway, JAK‑signal transducer and activator of transcription 5A pathway, and erbb pathway and are potentially involved in regulating angiotensin II receptor type 1, neuropeptide Y receptor Y1, KISS1 receptor, and C‑C motif chemokine ligand 5. The developed eight‑lncRNA signature may have clinical implications for predicting prognosis in breast cancer. Overall, this study provided possible molecular targets for the development of novel therapies against breast cancer.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 31432096 PMCID： PMC6713414 DOI： 10.3892/ijmm.2019.4303

Source DB: PubMed Journal: Int J Mol Med ISSN： 1107-3756 Impact factor: 4.101

Introduction

Breast cancer is the most common cancer in females worldwide and affects ~12% of women (1). Treatments for cancer usually include surgery, chemotherapy and radiation therapy (2). In addition, hormone-blocking therapy is preferred by hormone receptor-positive cancers (2). The prognosis of breast cancer varies according to a number of factors, such as stage of cancer, grade of cancer and age (3). Better prognosis prediction can potentially improve patient survival and guide tailored therapy of patients. Therefore, identifying prognostic biomarkers has become an urgent and highly active area of research. Long non-coding RNAs (lncRNAs) are non-coding transcripts that are longer than 200 bp. Although previously believed to be junk DNA, studies have increasingly recognized the critical roles of lncRNAs in regulating cell machinery, cell cycle, differentiation and apoptosis (4,5). Emerging evidence have proved that lncRNAs are involved in the regulation of diverse genes and pathways involved in the development of breast cancer and endocrine resistance (6,7). LncRNA HOX transcript antisense intergenic RNA (HOTAIR) is not only an independent biomarker for metastasis in estrogen receptor (ER)-positive breast cancer patients (8), but is also known to strengthen ER signaling and facilitate tamoxifen resistance (9). Recently, Li et al (10) reported that lncRNA Angelman syndrome chromosome region represses the invasion and metastatic capability of breast cancer cells by modulating the degradation of Enhancer of Zeste Homolog 2. Furthermore, Tracy et al (11) identified a list of lncRNAs that participate in breast cancer progression based on transcriptome-wide sequencing of breast cancer cell lines. Although a few lncRNAs have been implicated in the biology of breast cancer, the prognostic potential of lncRNAs in breast cancer has not been fully elucidated. Meng et al (12) developed a four-lncRNA signature that is predictive for breast cancer prognosis. However, their results were only based on lncRNA expression profiling in 887 breast cancer patients from Gene Expression Omnibus (GEO). Likewise, Sun et al (13) identified an eight-lncRNA signature which could serve as an independent biomarker for prediction of overall survival of breast cancer using case and control datasets only downloaded from The Cancer Genome Atlas (TCGA) database. In the present study, an integrated analysis was performed using publicly available microarray expression profiles of breast cancer patients from the GEO, TCGA and Breast Cancer Molecular Taxonomy of Breast Cancer International Consortium (BRCA METABRIC) repositories to identify prognostic lncRNAs through weighted gene co-expression network (WGCNA) network analysis, univariate Cox regression analysis, and Cox-proportional Hazards (PH) model based on L1-penalized (LASSO) estimation. Moreover, the biological significance of these lncRNAs in breast cancer was explored by constructing lncRNA-mRNA networks and performing pathway enrichment analysis.

Materials and methods

Data sources

The microarray expression profiles of at least 100 human breast cancer tissue samples were searched in the National Center for Biotechnology Information GEO (http://www.ncbi.nlm.nih.gov/geo/) based on the Affymetrix-GPL570 platform. The search retrieved the datasets GSE21653 (n=266), GSE76124 (n=198), GSE5460 (n=129) and GSE58812 (n=107). The four datasets were used for WGCNA network analysis. GSE21653 was also used as the training set for the survival analysis of this study. The GSE20685 (Affymetrix-GPL570 platform) dataset comprising 327 breast cancer samples was downloaded from the GEO database. RNA-seq expression data of 1,063 breast cancer samples were obtained from the TCGA database (https://portal.gdc.cancer.gov/projects/TCGA-BRCA) and RNA-seq expression data of 1,904 breast cancer samples were acquired from BRCA METABRIC (Illumina High-seq 2000 platform). The GSE20685, TCGA and BRCA METABRIC datasets were used as the validation sets. The clinical characteristics of patients in the four datasets are shown in Table I.

Table I

Summary of clinical characteristics in GSE21653, GSE20685, TCGA and BRCA METABRIC datasets.

Clinical characteristics	GSE21653 (N=266)	GSE20685 (N=327)	TCGA (N=1,063)	BRCA-METABRIC (N=1,904)
Age (years)	54.48±14.06	47.89±10.69	58.24±13.19	61.09±12.98
Molecular subtype
Basal	75	-	141	199
ERBB2	24	-	64	220
LuminalA	89	-	413	679
LuminalB	49	-	190	461
-	29	-	255	345
Histology
IDC	213	-	771	1,727
ILC	22	-	190	141
TUB	6	-	0	0
Others	21	-	100	36
-	4	-	2	0
pN
Stage 0	120	-	-	-
Stage 1	140	-	-	-
-	6	-	-	-
pT
Stage 1	59	-	-	-
Stage 2	126	-	-	-
Stage 3	68	-	-	-
-	13	-	-	-
SBR grade
Grade 1	45	-	-	-
Grade 2	89	-	-	-
Grade 3	125	-	-	-
-	7	-	-	-
ER status
Positive	150	-	784	1,445
Negative	113	-	236	429
-	3	-	43	30
erbB2 status
Positive	29	-	-	188
Negative	216	-	-	1,512
-	21	-	-	204
Ki67 status
Positive	144	-	-	-
Negative	58	-	-	-
-	64	-	-	-
p53 status
Positive	69	-	-
Negative	125	-	-
-	72	-	-	-
PR status
Positive	136	-	682	-
Negative	127	-	337	-
-	3	-	44	-
Death
Dead	83	83	139	1,103
Alive	169	244	930	801
-	14	0	0	0
Overall survival time (months)	60.03±41.38	94.71±38.45	36.27±34.78	125.03±76.33

Age and over survival time were expressed as the mean ± standard deviation; ERBB2, epidermal growth factor receptor 2; IDC, invasive ductal carcinoma; ILC, invasive lobular carcinoma; TUB, tubular carcinoma; pN, category: regional lymph nodes; pT, pathologic tumor size; SBR, Scarff-Bloom-Richardson; ER, estrogen-receptor; PR, progesterone receptor; -, information unavailable; TCGA, The Cancer Genome Atlas; BRCA METABRIC, Breast Cancer Molecular Taxonomy of Breast Cancer International Consortium.

In addition, microarray expression data was retrieved from NCBI GEO based on the following criteria: Both human breast cancer tissue samples and paired normal tissues samples were included; the total number of samples was >100; the platform used was Affymetrix-GPL570 platform. Three datasets, including GSE65194 (14) (153 breast cancer samples and 11 normal samples), GSE29044 (15) (73 breast cancer samples and 36 normal samples), and GSE42568 (16) (104 breast cancer samples and 17 normal samples), were retrieved for the identification of consensus differentially expressed RNAs (DERs) between breast cancer and normal samples using MetaDE analysis.

Data preprocessing

Raw CEL profiles of the datasets generated using the Affymetrix-GPL570 platform were subjected to median normalization, background normalization and quantile normalization using the oligo (17) package (version1.41.1, http://www.bioconductor.org/packages/release/bioc/html/oligo.html) in R (version 3.4.1). Fragments per kilobase of exon per million reads mapped (FPKM) expression values of the datasets downloaded from the BRCA METABRI and TCGA repositories were subjected to quantile normalization using the preprocessCore package (18) (version1.40.0, http://bioconductor.org/packages/release/bioc/html/preprocessCore.html) in R (version 3.4.1). For all datasets used in the study, the probe sets with RefSeq IDs were identified according to Affymetrix-GPL570 annotation files. From all probe sets with RefSeq transcript IDs, the probe sets were selected that were annotated as non-coding RNAs in the Refseq database (19). Moreover, the sequencing reads provided by Affymetrix-GPL570 were mapped to the GRCh38 human genome assembly using Clustal 2 (20) (http://www.clustal.org/clustal2/). The resulting lncRNAs combined with the annotated lncRNAs in the Refseq database were used for subsequent analysis.

WGCNA network analysis

Using GSE21653 as the training set, GSE76124, GSE5460 and GSE58812 as validation sets, a WGCNA network was constructed as previously described (21,22) using the WGCNA package (23) (version 1.61, https://cran.r-project.org/web/packages/WGCNA/index.html) to identify breast cancer-related modules. Briefly, the four datasets were compared and analyzed by correlation analysis. The soft threshold power of β was calculated using scale free topology criterion, followed by generation of the weighted adjacency matrix. Modules with >80 RNAs at the minimum cut height of 0.99 were selected using the dynamic tree cut method. Among these selected modules, the preserved modules were then determined using the WGCNA package. In addition, functional annotation analysis of the preserved modules was performed using the userListEnchment function of the WGCNA package. In addition, the correlations of these modules with clinical factors of patients in GSE21653 were investigated using the WGCNA package.

Survival analysis

For survival analysis, GSE21653 was used as a training set, whereas GSE20685, the TCGA and BRCA METABRIC datasets were used as the test sets. Based on the survival information time in GSE21653, univariate Cox regression analysis was performed to analyze the associations of the lncRNAs included in the preserved WGCNA modules with prognosis with the survival package in R. The lncRNAs with log-rank P<0.05 were identified as prognosis-related lncRNAs. Based on these prognosis-related lncRNAs, a Cox-PH model was applied based on LASSO estimation to select the optimal panel of prognostic lncRNAs as previously described (24,25). The optimal lambda was determined after running 1,000 stimulations through cross-validation likelihood. The risk score is the logarithm of hazard ratio from the fitted Cox-PH model to dichotomize the samples (24,25). Using the Cox-PH coefficients and the optimal group of these prognostic lncRNAs, a risk scoring model was generated for prognosis prediction as follows: Risk-score = Σ (βlncRNAn × exprlncRNAn). βlncRNAn denotes Cox-PH coefficient of lncRNAn while exprlncRNA denotes the lncRNAn expression levels. Using this risk scoring model, the risk score was calculated for each sample in GSE21653 (training set). All patients in this set were divided into the high and low risk groups based on the median risk score. Moreover, the robustness of this prognostic model was evaluated in the GSE20685, TCGA and BRCA METABRIC datasets (validation sets). For this purpose, all samples were dichotomized in each set into two different risk groups with the median risk score as cutoff. The two risk groups for survival were compared using a Kaplan-Meier curve with Wilcoxon log rank test.

Selection of consensus DERs

As mentioned above, the GSE22866, GSE50161 and GSE4290 comprised both breast cancer and normal tissue samples. Consensus DERs between breast cancer and normal samples across the three sets were screened using the MetaDE package (26) (https://cran.r-project.org/web/packages/MetaDE/). The strict threshold was tau2=0, Q pval>0.05, P<0.05 and FDR<0.05.

Pathway enrichment analysis

The DERs associated with the prognostic lncRNAs were focused on in the preserved modules identified from the WGCNA network. lncRNA-mRNA networks were constructed using these DERs and the prognostic lncRNAs. To explore the potential biological roles of these prognostic lncRNAs selected by the LASSO Cox-PH model in breast cancer, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed for these lncRNA-mRNA networks using gene set enrichment analysis (27) (http://software.broadinstitute.org/gsea/index.jsp) software. Nominal P<0.05 was chosen as the cutoff value.

Results

WGCNA network module mining

The GSE21653, GSE76124, GSE5460 and GSE58812 datasets were used for WGCNA network analysis. After data preprocessing, 15,998 mRNAs were identified to be shared among the GSE21653, GSE76124, GSE5460 and GSE58812 datasets, out of which 851 were lncRNAs. Correlation analysis showed good correlation among all pairs of the four datasets based on the expression levels of the shared RNAs (correlation coefficients >0.9, P<1×10-200; Fig. 1).

Figure 1

Correlation analysis between the GSE21653, GSE76124, GSE5460, and GSE58812 datasets.

A WGCNA network was constructed using GSE21653 (training set). With WGCNA applying scale free topology criterion, the soft threshold power of β was 5 when scale-free topology model fit R2 was maximized (0.9) and the mean connectivity for the network was 5. A total of seven modules were identified (module size ≥80 and cut height ≥0.99) in the network (Blue, Brown, Green, Turquoise, Yellow, Red, and Grey; Fig. 2A). In addition, module mining was separately conducted for GSE76124, GSE5460 and GSE58812 (validation sets; Fig. 2B-D). For the three sets, genes were processed in the same manner as for GSE21653. A multi-dimensional scaling plot was generated to analyze the expression of genes in the seven modules of GSE21653. Results revealed that the genes within the same module cluster together (Fig. 3A). Hierarchical clustering analysis of the seven modules was independently performed for each of the four datasets. The modules on the same branches showed similar gene expression patterns (Fig. 3B).

Figure 2

Identification of weighted gene co-expression network modules in the datasets. (A) GSE21653, (B) GSE58812, (C) GSE76124 and (D) GSE5460. The RNAs are organized into various modules that are marked with different colors. Blue, Cell adhesion; Brown, Immune response; Green, Cell cycle phase; Grey, Epidermis development; Red, Vesicle-mediated transport; Yellow, Response to estrogen stimulus; Turquoise, Ectoderm development.

Figure 3

Analysis of WGCNA modules. (A) A multi-dimensional scaling plot showing the expression of genes in the seven modules of GSE21653. Dimension 1 and 2, respectively represents the first and second principal component. (B) Hierarchical clustering analysis of WGCNA modules for GSE21653, GSE58812, GSE76124, and GSE5460. Blue, Cell adhesion; Brown, Immune response; Green, Cell cycle phase; Grey, Epidermis development; Red, Vesicle-mediated transport; Yellow, Response to estrogen stimulus; Turquoise, Ectoderm development. WGCNA, weighted gene co-expression network.

According to the results of module preservation analysis, the Blue, Brown, Green, Turquoise and Yellow modules were highly preserved (preservation Z-score >10, Table II), thereby indicating that the five modules are breast cancer-related modules. According to results of module annotation, each of the five modules were associated with cell adhesion, immune response, cell cycle phase, response to estrogen stimulus and ectoderm development (Table II).

Table II

Analysis of weighted gene co-expression network modules.

Module	Color	Module size	Number of mRNAs	Number of lncRNAs	Preservation Z-score	Module annotation
MEblue	Blue	423	417	6	21.8833	Cell adhesion
MEbrown	Brown	413	374	39	39.0982	Immune response
MEgreen	Green	209	205	4	20.3943	Cell cycle phase
MEgrey	Grey	1,215	1,171	44	2.6106	Epidermis development
MEred	Red	82	80	2	0.8947	Vesicle-mediated transport
MEturquoise	Turquoise	623	601	22	15.1238	Response to estrogen stimulus
MEyellow	Yellow	242	240	2	10.1586	Ectoderm development

Module size, number of mRNAs and lncRNAs; preservation Z-score >10, modules with highly preservation. Lnc, long noncoding.

Identification of a prognostic signature of lncRNAs and development of prognostic scoring model

The five preserved modules included 73 lncRNAs. Of these, 39 lncRNAs were found to be related to prognosis in GSE21653 (training set) based on univariate cox regression analysis (P<0.05; Table III). With expression of the 39 prognosis-related lncRNAs as input, Cox-PH model based on LASSO penalization identified eight prognostic lncRNAs, including IGHA1, IGHGP, IGKV2-28, IGLL3P, IGLV3-10, AZGP1P1, LINC00472 and SLC16A6P1 (Table IV). Based on cox-PH coefficients and expression values of the eight lncRNAs, the risk score for each patient was calculated as follows: Risk-score = (−0.7451) × ExpIGHA1 + (−0.2334) × ExpIGHGP + (−0.0358) × ExpIGKV2-28 + (0.1031) × ExpIGLL3P + (−0.9832) × ExpIGLV3-10 + (−0.3707) × ExpAZGP1P1 + −1.6801) × ExpLINC00472 + (−1.6795) × Exp SLC16A6P1.

Table III

Prognosis-related lncRNAs identified by univariate Cox regression analysis.

LncRNAs	Module-color	P-value
SNHG14	Blue	0.0047
IGHA1	Brown	0.0003
IGLL3P	Brown	0.0005
IGKV1OR2-108	Brown	0.0010
IGLV3-10	Brown	0.0010
IGKV1-39	Brown	0.0013
IGKV3-20	Brown	0.0015
IGKV1-37	Brown	0.0019
IGKV2-28	Brown	0.0019
IGKV4-1	Brown	0.0021
IGHGP	Brown	0.0023
TRBC2	Brown	0.0029
IGHV3-72	Brown	0.0032
TRDC	Brown	0.0037
IGKV1OR2-2	Brown	0.0038
IGKV1-13	Brown	0.0042
IGHV3-20	Brown	0.0048
IGHV3-7	Brown	0.0048
HCP5	Brown	0.0062
IGKV1-17	Brown	0.0074
IGHV3-23	Brown	0.0078
IGHV4-61	Brown	0.0080
GBP1P1	Brown	0.0083
BNIP3P1	Green	0.0004
CKS1BP2	Green	0.0022
LINC00472	Turquoise	0.0005
SLC16A6P1	Turquoise	0.0010
AZGP1P1	Turquoise	0.0062
GSTT2	Turquoise	0.0098
RNU1-123P	Turquoise	0.0130
RNU4-46P	Turquoise	0.0150
RNU6-564P	Turquoise	0.0170
RASA4CP	Turquoise	0.0200
RN7SL494P	Turquoise	0.0260
CYP2B7P	Turquoise	0.0270
CYP21A1P	Turquoise	0.0280
GOLGA2P5	Turquoise	0.0300
FABP5P2	Yellow	0.0140
RNU6-146P	Yellow	0.0170

Prognosis-related lncRNAs were identified by univariate Cox and LncRNAs with P<0.05 were retained. Lnc, long noncoding.

Table IV

Information of optimal panel of prognostic lncRNAs.

LncRNAs	Coef	Hazard ratio	P-value
IGHA1	−0.7451	0.8667	0.0030
IGHGP	−0.2334	0.8775	0.0220
IGKV2-28	−0.0358	0.8779	0.0183
IGLL3P	−0.1031	0.7999	0.0051
IGLV3-10	−0.9832	0.8309	0.0095
AZGP1P1	−0.3706	0.8855	0.0462
LINC00472	−1.6801	0.6592	0.0044
SLC16A6P1	−1.6795	0.8025	0.0101

Optimal panel of prognostic lncRNAs were screened using Cox-proportional Hazards model based on L1-penalized estimation. Coef, Cox-PH coefficient; HR, Hazard Ratio; Lnc, long noncoding.

Based on the median risk score, all patients in GSE21653 were classified into a high risk group (n=126; >median risk score) and a low-risk group (n=126; BRCA METABRIC datasets). Similarly, risk scores were determined for each set. Samples with the risk scores larger than the median risk score were classified under the high-risk group, while samples with risk score smaller than median risk score were classified under the low risk group. As shown in Fig. 4B-D, all patients in each dataset were split into two risk groups with significantly different survival times (GSE20685, logRank P=7.97×10−05; TCGA, logRank P=2.07×10−04; BRCA METABRIC, logRank P=1.87×10−08). These observations highlighted the prognostic value of the eight lncRNAs in breast cancer.

Figure 4

Kaplan-Meier survival curves for low-risk and high-risk groups in the datasets. (A) GSE21653 set, (B) GSE20685 set, (C) TCGA set and (D) BRCA METABRIC set. Black and red curves are for low risk and high risk groups, respectively. TCGA, The Cancer Genome Atlas; BRCA METABRIC, Breast Cancer Molecular Taxonomy of Breast Cancer International Consortium.

Construction of lncRNA-mRNA networks and pathway enrichment analysis

All eight prognostic lncRNAs were included in the Brown and Turquoise modules. As mentioned in Methods section, consensus DERs between breast cancer and normal samples were screened across GSE22866, GSE50161, and GSE4290. Consequently, 1,372 consensus DERs (tau2=0, Qpval>0.05, P <0.05 and FDR<0.05) were obtained, including 55 lncRNAs and 1,317 coding RNAs. As shown in Fig. 5, lncRNA-mRNA networks were constructed. These prognostic lncRNAs and the DERs related to these prognostic DERs are shown in the Brown and Turquoise modules.

Figure 5

LncRNA-mRNA networks. RNAs from Turquoise or Brown modules are colored in turquoise or brown, respectively. Square nodes represent lncRNAs. Regular or inverted triangles indicate upregulated or downregulated consensus DERs identified by the MetaDE method, respectively. Brown, Immune response; Turquoise, Ectoderm development. Round nodes represent non-consensus DERs. Black or green link denotes positive or negative correlation between two nodes.

KEGG pathway enrichment analysis was performed using the generated lncRNA-mRNA networks. Enrichment analysis revealed that genes in modules related to AZGP1P1, IGLL3P, IGHA1, IGLV3-10, IGHGP, LINC00472, IGKV2-28 and SLC16A6P1 were associated with several pathways, such as cell adhesion molecules (CAMS) pathway, T cell receptor pathway, JAK-signal transducer and activator of transcription pathway, and erbb pathway (Fig. 6). Moreover, a number of genes enriched in these pathways, such as angiotensin II receptor type (AGTR)1, neuropeptide Y receptor Y1 (NPY1R), KISS1 receptor (KISS1R) and C-C motif chemokine ligand (CCL) 5 were identified.

Figure 6

Results of pathway enrichment analysis on the lncRNA-mRNA networks. Horizontal axis indicates the genes, and vertical axis indicates the pathways. A red diamond indicates that a gene is significantly enriched in a pathway. Deeper red symbols indicate stronger correlations. Blank diamond indicates that a gene is not significantly enriched in a pathway.

Discussion

Although lncRNAs are well recognized as playing important roles in the biology of tumorigenesis (28), additional studies focusing on the involvement of lncRNAs in breast cancer should be conducted. Based on the analysis of 1,222 breast cancer cases and control datasets downloaded from the TCGA database, Sun et al (13) reported that an eight-lncRNA signature consisted of AC007731.1, AL513123.1, C10orf126, WT1-AS, ADAMTS9-AS1, SRGAP3-AS2, TLR8-AS1, and HOTAIR was an independent prognostic factor associated with overall survival by WGCNA analysis and multivariate Cox hazard model. By mining the microarray gene expression data in the GEO, TCGA and BRCA METABRIC repositories, an eight-lncRNA signature (IGHA1, IGHGP, IGKV2-28, IGLL3P, IGLV3-10, AZGP1P1, LINC00472, SLC16A6P1) was determined for prognosis prediction using WGCNA network analysis, univariate Cox regression analysis, and a LASSO PH model. The eight-lncRNA signature found in the present study is different from the study of Sun et al (13). Moreover, risk scores were calculated based on the eight-lncRNA signature, which could dichotomize patients into two risk groups with different survival times. The predictive capability of this eight-lncRNA signature was successfully confirmed in three independent datasets. The above findings suggested that the eight lncRNAs identified could serve as prognostic biomarkers for breast cancer. To the best of our knowledge, all eight prognostic lncRNAs, except for LINC00472, have not been reported in breast cancer to date. Shen et al (29) provided evidence that LINC00472 plays a tumor suppressive role in breast cancer and could thus serve as a prognostic biomarker. Similarly, Shen et al (30) revealed that LINC00472 is significantly linked to disease-free survival in patients with grade 2 breast cancer. In addition, Lu et al (31) showed that the inhibition of LINC00472 in breast cancer progression is mediated by miR-141 and mRNA programmed cell death 4. Another highlight of the present study was that the lncRNA-mRNA networks were constructed with these prognostic lncRNAs and the DERs associated with these prognostic lncRNAs in breast cancer. Pathway enrichment analysis was performed for these lncRNAs and DERs to elucidate the underlying mechanisms. The results of the present study showed that these lncRNAs were significantly associated with the CAMS pathway, JAK-STAT pathway and erbb pathway. CAMS have been established to participate in breast cancer cell angiogenesis, migration, invasion and metastasis (32,33). The JAK-STAT pathway plays an important role in inflammation and carcinogenesis (34) and has been regarded as a novel therapeutic target in breast cancer (35). The ErbB receptor tyrosine kinase family is comprised of ErbB1, ErbB2, ErbB3 and ErbB4. A large body of evidence demonstrated that the ErbB2 gene and protein serve as biomarkers for prognosis and therapy response (36,37). Furthermore, HER kinase domain mutations enhance tumor progression could potentially serve as prognostic markers (38). In addition, De Cola et al (39) showed that downregulation of ErbB receptors contributes to targeted therapy resistance. The above findings confirmed that the CAMS, JAK-STAT and erbb pathways are important for breast cancer and indicated that these prognostic lncRNAs participate in the regulation of these pathways in breast cancer. In the present study, a list of genes that were enriched in these significant pathways for these prognostic lncRNAs, such as AGTR1, NPY1R, KISS1R and CCL5 were also identified. An in vitro study by Oh et al (40) showed that AGTR1 promotes tumor growth and angiogenesis. There is evidence that NPY1R in peripheral blood is significantly linked to tumor metastasis and prognosis of breast cancer patients (41). KISS1R, a G protein coupled receptor, promotes invadopodia formation and invasion in breast cancer cells (42). CCL5 facilitates cell proliferation and survival of breast cancer cells by regulating metabolism (43). These genes are potentially involved in the mechanisms underlying the predictive value of this eight-lncRNA signature and could serve as molecular biomarkers for breast cancer. The present study has certain limitations. Although the eight-lncRNA prognostic signature was validated in three independent datasets, further testing on clinical data is warranted. In addition, the present study focused solely on microarray expression datasets. Therefore, experimental studies are required to verify the findings of the present study. Additionally, further studies should be conducted to elucidate the mechanisms underlying the actions of these prognostic lncRNAs. In conclusion, the present study recommends an eight-lncRNA signature for survival prediction in breast cancer. These prognostic lncRNAs could affect cancer development partly by regulating the CAMS, JAK-STAT, erbb pathways, as well as AGTR1, NPY1R, KISS1R and CCL5. These findings hold promise for the identification of promising therapeutic targets for breast cancer.

37 in total

1. HER family kinase domain mutations promote tumor progression and can predict response to treatment in human breast cancer.

Authors: Delphine R Boulbes; Stefan T Arold; Gaurav B Chauhan; Korina V Blachno; Nanfu Deng; Wei-Chao Chang; Quanri Jin; Tzu-Hsuan Huang; Jung-Mao Hsu; Samuel W Brady; Chandra Bartholomeusz; John E Ladbury; Steve Stone; Dihua Yu; Mien-Chie Hung; Francisco J Esteva
Journal: Mol Oncol Date: 2014-11-11 Impact factor: 6.603

Review 2. Breast cancer management: quality-of-life and cost considerations.

Authors: Davide Radice; Alberto Redaelli
Journal: Pharmacoeconomics Date: 2003 Impact factor: 4.981

3. KISS1R signaling promotes invadopodia formation in human breast cancer cell via β-arrestin2/ERK.

Authors: Cameron G Goertzen; Magdalena Dragan; Eva Turley; Andy V Babwah; Moshmi Bhattacharya
Journal: Cell Signal Date: 2015-12-23 Impact factor: 4.315

4. Long non-coding RNA HOTAIR is an independent prognostic marker of metastasis in estrogen receptor-positive primary breast cancer.

Authors: Kristina P Sørensen; Mads Thomassen; Qihua Tan; Martin Bak; Søren Cold; Mark Burton; Martin J Larsen; Torben A Kruse
Journal: Breast Cancer Res Treat Date: 2013-11-21 Impact factor: 4.872

5. Relevance of BCAR4 in tamoxifen resistance and tumour aggressiveness of human breast cancer.

Authors: M F E Godinho; A M Sieuwerts; M P Look; D Meijer; J A Foekens; L C J Dorssers; T van Agthoven
Journal: Br J Cancer Date: 2010-09-21 Impact factor: 7.640

6. LINC00472 expression is regulated by promoter methylation and associated with disease-free survival in patients with grade 2 breast cancer.

Authors: Yi Shen; Zhanwei Wang; Lenora W M Loo; Yan Ni; Wei Jia; Peiwen Fei; Harvey A Risch; Dionyssios Katsaros; Herbert Yu
Journal: Breast Cancer Res Treat Date: 2015-11-13 Impact factor: 4.872

7. The degradation of EZH2 mediated by lncRNA ANCR attenuated the invasion and metastasis of breast cancer.

Authors: Zhongwei Li; Pingfu Hou; Dongmei Fan; Meichen Dong; Musong Ma; Hongyuan Li; Ruosi Yao; Yuxin Li; Guannan Wang; Pengyu Geng; Adhanom Mihretab; Dongxu Liu; Yu Zhang; Baiqu Huang; Jun Lu
Journal: Cell Death Differ Date: 2016-10-07 Impact factor: 15.828

8. Breast cancer cell migration is regulated through junctional adhesion molecule-A-mediated activation of Rap1 GTPase.

Authors: Elaine A McSherry; Kieran Brennan; Lance Hudson; Arnold D K Hill; Ann M Hopkins
Journal: Breast Cancer Res Date: 2011-03-23 Impact factor: 6.466

Review 9. Cell cycle regulation by long non-coding RNAs.

Authors: Masatoshi Kitagawa; Kyoko Kitagawa; Yojiro Kotake; Hiroyuki Niida; Tatsuya Ohhata
Journal: Cell Mol Life Sci Date: 2013-07-24 Impact factor: 9.261

10. Transcriptome analysis of Wnt3a-treated triple-negative breast cancer cells.

Authors: Sylvie Maubant; Bruno Tesson; Virginie Maire; Mengliang Ye; Guillem Rigaill; David Gentien; Francisco Cruzalegui; Gordon C Tucker; Sergio Roman-Roman; Thierry Dubois
Journal: PLoS One Date: 2015-04-07 Impact factor: 3.240

16 in total

1. Identification of HMG-box family establishes the significance of SOX6 in the malignant progression of glioblastoma.

Authors: Lan Jiang; Hui Yang; Tianbing Chen; Xiaolong Zhu; Jingjing Ye; Kun Lv
Journal: Aging (Albany NY) Date: 2020-05-10 Impact factor: 5.682

2. Identification of key modules and genes associated with breast cancer prognosis using WGCNA and ceRNA network analysis.

Authors: Xin Yin; Pei Wang; Tianshu Yang; Gen Li; Xu Teng; Wei Huang; Hefen Yu
Journal: Aging (Albany NY) Date: 2020-12-09 Impact factor: 5.682

3. Genetic co-expression networks contribute to creating predictive model and exploring novel biomarkers for the prognosis of breast cancer.

Authors: Yuan-Kuei Li; Huan-Ming Hsu; Meng-Chiung Lin; Chi-Wen Chang; Chi-Ming Chu; Yu-Jia Chang; Jyh-Cherng Yu; Chien-Ting Chen; Chen-En Jian; Chien-An Sun; Kang-Hua Chen; Ming-Hao Kuo; Chia-Shiang Cheng; Ya-Ting Chang; Yi-Syuan Wu; Hao-Yi Wu; Ya-Ting Yang; Chen Lin; Hung-Che Lin; Je-Ming Hu; Yu-Tien Chang
Journal: Sci Rep Date: 2021-03-31 Impact factor: 4.379

4. Identification of genes that correlate clear cell renal cell carcinoma and obesity and exhibit potential prognostic value.

Authors: Jiale Zhou; Zhaolin Yang; Xiaorong Wu; Jin Zhang; Wei Zhai; Yonghui Chen
Journal: Transl Androl Urol Date: 2021-02

5. Identification of the hub genes in gastric cancer through weighted gene co-expression network analysis.

Authors: Chunyang Li; Haopeng Yu; Yajing Sun; Xiaoxi Zeng; Wei Zhang
Journal: PeerJ Date: 2021-03-05 Impact factor: 2.984

6. Construction of a Co-Expression Network for lncRNAs and mRNAs Related to Urothelial Carcinoma of the Bladder Progression.

Authors: Yeqing Mao; Chao Wen; Zitong Yang
Journal: Front Oncol Date: 2022-02-25 Impact factor: 6.244

7. LINC00472 suppresses oral squamous cell carcinoma growth by targeting miR-455-3p/ELF3 axis.

Authors: Xiu Liu; Xinrong Ma; Hongyu Li; Yu Wang; Minghui Mao; Chao Liang; Ying Hu
Journal: Bioengineered Date: 2022-01 Impact factor: 3.269

8. Prognostic Models for Nonmetastatic Triple-Negative Breast Cancer Based on the Pretreatment Serum Tumor Markers with Machine Learning.

Authors: Huihui Chen; Shijie Wu; Jun Hu; Kun Zhang; Kaimin Hu; Yuexin Lu; Jiapan He; Tao Pan; Yiding Chen
Journal: J Oncol Date: 2021-05-15 Impact factor: 4.375

9. Screening of Biological Target Molecules Related to Glucocorticoid-Induced Cataract (GIC) on the Basis of Constructing ceRNA Network.

Authors: Zhan Shi; Xiumei Zhao; Ying Su; Chao Wang; Ping Liu; Hongyan Ge
Journal: Biochem Genet Date: 2021-06-02 Impact factor: 1.890

10. Identification of a nomogram based on long non-coding RNA to improve prognosis prediction of esophageal squamous cell carcinoma.

Authors: Wenli Li; Jun Liu; Hetong Zhao
Journal: Aging (Albany NY) Date: 2020-01-24 Impact factor: 5.682