Jieqiong Liu1,2, Yali Yao2, Zheyu Hu3, Hui Zhou3, Meizuo Zhong1. 1. Department of Oncology, Xiangya Hospital, Central South University, Changsha, China. 2. The First Hospital of Changsha City, Changsha, China. 3. Hunan Cancer Hospital/the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China.
Abstract
BACKGROUND: Long intergenic noncoding RNAs (lincRNAs) are a series of novel transcribed regions expressed in cancers that may represent candidate biomarkers for lung squamous cell carcinoma (LSqCC) treatment. In this study, we evaluated the lincRNA profile in LSqCC patients and screened valuable lincRNAs for diagnosis and prognosis. METHODS: Transcriptome profiling of 549 samples derived from 501 LSqCC patients were identified in TCGA database. 48 patients had paired primary tumor (PT) and solid normal (SN) tissue samples, while 453 patients had only PT samples. 1,771 lincRNA candidates were evaluated. Paired test (Wilcoxon two-sample paired signed rank tests) was performed in paired PT and SN samples. Logistic regression analysis were performed in independent 453 PT samples and 48 SN samples to screen the significant lincRNAs candidates for malignances. Independent 501 PT samples were further used to screen the significant lincRNAs candidates for prognosis. RESULTS: Among 1,771 lincRNAs, 10 lincRNAs were significant highly-expressed risk candidates in PT samples, and 10 protective lincRNAs candidates were significant lowly-expressed in PT samples. Among 10 highly-expressed risk lincRNAs, a small panel of LINC00487, LINC01927, and C10orf143 (LINC00959) could effectively predict malignancies in paired samples (AUC = 0.7274, 95%CI = (0.6264, 0.8285)). When combined with protective lincRNA candidates LINC02315, LINC00491, and LINC01697, the predictive efficiency was greatly improved in both paired samples (AUC = 0.8030, 95%CI = (0.7250, 0.8810)) and independent samples (AUC = 0.7481, 95%CI= (0.6642, 0.8320)). Additionally, three highly-expressed risk lincRNAs, LINC01031, LINC01088, and LINC01931, were significantly associated with poor prognosis in PT samples, suggesting potential targets for anti-LSqCC treatment. CONCLUSION: Therefore, lincRNAs could be promising biomarkers for predicting malignancies and potential anti-LSqCC targets for drug development.
BACKGROUND: Long intergenic noncoding RNAs (lincRNAs) are a series of novel transcribed regions expressed in cancers that may represent candidate biomarkers for lung squamous cell carcinoma (LSqCC) treatment. In this study, we evaluated the lincRNA profile in LSqCCpatients and screened valuable lincRNAs for diagnosis and prognosis. METHODS: Transcriptome profiling of 549 samples derived from 501 LSqCCpatients were identified in TCGA database. 48 patients had paired primary tumor (PT) and solid normal (SN) tissue samples, while 453 patients had only PT samples. 1,771 lincRNA candidates were evaluated. Paired test (Wilcoxon two-sample paired signed rank tests) was performed in paired PT and SN samples. Logistic regression analysis were performed in independent 453 PT samples and 48 SN samples to screen the significant lincRNAs candidates for malignances. Independent 501 PT samples were further used to screen the significant lincRNAs candidates for prognosis. RESULTS: Among 1,771 lincRNAs, 10 lincRNAs were significant highly-expressed risk candidates in PT samples, and 10 protective lincRNAs candidates were significant lowly-expressed in PT samples. Among 10 highly-expressed risk lincRNAs, a small panel of LINC00487, LINC01927, and C10orf143 (LINC00959) could effectively predict malignancies in paired samples (AUC = 0.7274, 95%CI = (0.6264, 0.8285)). When combined with protective lincRNA candidates LINC02315, LINC00491, and LINC01697, the predictive efficiency was greatly improved in both paired samples (AUC = 0.8030, 95%CI = (0.7250, 0.8810)) and independent samples (AUC = 0.7481, 95%CI= (0.6642, 0.8320)). Additionally, three highly-expressed risk lincRNAs, LINC01031, LINC01088, and LINC01931, were significantly associated with poor prognosis in PT samples, suggesting potential targets for anti-LSqCC treatment. CONCLUSION: Therefore, lincRNAs could be promising biomarkers for predicting malignancies and potential anti-LSqCC targets for drug development.
Lung squamous cell carcinoma (LSqCC) is historically the most common type of lung cancer, particularly in developingMETH countries (Stone & Zhou, 2016). The majority of stage I/II LSqCCpatients undergo surgery with or without chemo‐ and/or radio‐therapy; while the majority of stage III/IV patients only receive chemotherapy and/or radiation therapy (Miller et al., 2016). Based on Surveillance, Epidemiology, and End Results (SEER) data 2009–2013, the 5‐year survival rate for lung/bronchus cancerpatients is 18.1%. For stage I/II LSqCCpatients, the 5‐year survival is 34% (Takita et al., 1991). For unresectable advanced LSqCCpatients, the outcome is much worse.Targeted therapy agents, such as epidermal growth factor receptor (EGFR) inhibitors and angiogenesis inhibitors, are important in the treatment of nonsmall cell lung cancer (NSCLC) (Yang et al., 2016). In lung adenocarcinoma, patients with EGFR mutations could obtain survival benefits from EGFR inhibitors (Takano et al., 2008), especially for “never smokers” (Sun et al., 2010). However, targeting therapies are not efficacy for LSqCCpatients, because LSqCC has distinct genetic alterations and biomarkers from lung adenocarcinoma (Campbell et al., 2016; Paz‐Ares et al., 2016; Seto et al., 2014).To explore potential targeting candidates for LSqCC treatment, genome‐wide analysis is recommended. Genome‐wide analysis has revealed that the majority region of the genome is transcribed into noncoding RNAs (ncRNAs); ncRNAs regulate the expression of other protein‐coding genes, but they themselves are not transcribed into proteins (Deniz & Erman, 2017; Fu, 2014). According to the transcript lengths, ncRNAs are categorized into small ncRNAs (sncRNAs) and long ncRNAs (lncRNAs) (Brosnan & Voinnet, 2009). Classically, transfer RNAs, ribosomal RNAs, small nuclear RNAs, microRNAs, short interfering RNAs, and piwi‐interacting RNAs are sncRNAs (Taft, Pang, Mercer, Dinger, & Mattick, 2010); exonic lncRNAs, intronic lncRNAs, overlapping lncRNAs, and intergenic lncRNAs (lincRNAs) are lncRNAs (Shi, Sun, Liu, Yao, & Song, 2013).LincRNAs play significant roles in humancancers (Deniz & Erman, 2017; White et al., 2014). For example, TP53COR1 (LincRNAp21) functions as a tumor suppressor; it is down‐regulated in colorectal tumors (Zhai et al., 2013), hepatocellular carcinoma (Yang et al., 2015), prostate cancer (Isin et al., 2015) and breast cancer (Chen et al., 2017). CYTOR (LINC00152) overexpresses in pancreatic cancer (Muller et al., 2015) and gastric cancer (Pang et al., 2014). LINC00673 regulates PTPN11 and inhibits the proliferation of pancreatic cancer cells (Zheng et al., 2016). In NSCLC cells, LINC00673 promotes metastasis through EZH2 (Ma et al., 2017). In this study, we used LSqCC samples from TCGA database to perform genome‐wide analysis for lincRNA transcripts. We found a panel of lincRNAs as candidates for predicting malignancies and poor prognosis, representing promising targets for anti‐LSqCC treatment.
Methods
Database and cohort definition
The National Cancer Institute (NCI)‐supported harmonized cancer dataset TCGA‐LUSC transcriptome profiling was the data source in the present study. Sample description was extracted from metadata. 501 patients diagnosed with primary lung squamous cell carcinoma (sample site: C34.0‐C34.9; ICD‐O‐3 histology/behavior code: 8070/3, 8071/3, 8072/3, 8073/3, 8052/3 and 8083/3), with positive histology diagnostic confirmation were identified from TCGA‐LUSC project. As shown in Figure S1, of 501 included patients, 48 had paired primary tumor (PT) sample and solid normal (SN) tissue sample. Other 453 patients only had PT sample. To compare the transcriptional profiling of lincRNAs between SN and PT, paired test was performed in paired SN and PT samples. Additionally, 453 independent PT samples were compared to 48 independent SN samples. Definite tumor stage was reported in 497 patients, and 91 of them were in advanced stage (stage III/IV).
Statistical analysis for demographic and clinical‐pathological features
Information of demographic features (gender, gender, race, smoking history) and clinical‐pathological features (tumor stage, morphology, vital status, time to death, and time to last contact) were extracted from metadata and evaluated for included patients. Numeric variables were summarized as the mean (standard deviation) and median (interquartile range). Categorical variables were reported as counts (percentage). An analysis of variance was used to compare continuous variables with symmetric distributions across comparing subgroups. Chi‐square tests or Fisher's exact tests (n < 5) were used to compare categorical variables between clinical/ pathological subgroups.
Transcriptional profiling
In RNA‐seq database, the gene expression level was recorded as fragments per kilobase of transcript per million mapped reads (FPKM). The gene expression information of a total of 60,483 genes from 549 samples with histological confirmation were extracted from TCGA‐LUSC project RNA‐seq database. By using Bioconductor package “org.Hs.eg.db”, 36,095 genes were identified as protein coding genes, ncRNA genes and pseudogenes with approving unique symbols and names in HGNC (HUGO (Human Genome Organisation) gene nomenclature communnittee) database. 1,771 genes were recognized as lincRNA genes.
Predictive value of lincRNAs in diagnosis
To screen out the potential candidates in discriminate primary tumor samples from solid normal samples, Wilcoxon two‐sample paired signed rank test was used to compare the expression of lincRNAs between 48 paired samples. In independent PT and SN samples, logistic regression analysis was performed to investigate the risk and protective lincRNAs of malignancies. Odds ratio (OR) with 95% confidence interval (CI) was calculated for each tested lincRNA. As for selected potential candidate lincRNAs, receiver operating characteristic (ROC) curve and the area under the curve (AUC) measured the efficiency in predicting malignancies.To screen out potential lincRNAs in predicting advance tumors (stage III/IV), 497 independent PT samples with definite stage information were investigated. Logistic regression analysis was performed to evaluate the risk of advance stage for tested lincRNAs.
Prognosis analysis
Among 501 LSqCCpatients, 285 were alive and censored. To identify the risk lincRNAs for cancer‐related death, the Cox proportional hazards regression was performed to estimate the hazard ratio (HR). A receiver operating characteristic (ROC) curve was calculated to determine the optimal cut‐off expression level of risk lincRNAs that maximizes sensitivity and specificity in predicting death. According to the cutoff values of risk lincRNAs, patients were divided into lincRNA high‐expression level subgroup and low‐expression level subgroup. Lifetest with Kaplan‐Meier curves were performed to compare the survival probabilities between lincRNA high level subgroup and low level subgroup.
Statistic tools
Bioconductor packages were applied for RNA‐seq analysis. Package “RColorBrewer” and”ggplot" were used for plotting. All tests of hypotheses were two‐tailed and conducted at a significance level of 0.05. Statistical analyses were conducted using R 3.3.2 and SAS 9.4.
RESULTS
Demographic and clinical‐pathological features
As shown in Table 1, among LSqCC 501 patients, 317 (74.05%) were males and 130 (25.95%) were females; 439 (89.95%) were white patients and 30 (7.73%) were African American patients; 222 (44.31%) patients had smoking history, and the average length of smoking (LOS) was 39.77 (±12.13) years. 406 (81.69%) patients were in early stages (stage I/II) and 91 (18.31%) were in late stages (stage III/IV). Compared to alive patients, deceased patients were older (67.84 ± 11.61 vs. 65.25 ± 13.42, p = .02), and more of them were African American patients (11.70% vs. 4.61%, p = .03).
Table 1
Demographic and clinical‐pathological features of LSqCC patients
Covariate
Level
Overall patients (n = 501)
Survival status
Alive (n = 285)
Dead (n = 216)
p‐value
Age
66.37 ± 12.72, 68 (61,73)
65.25 ± 13.42, 67 (60,73)
67.84 ± 11.61, 69 (63,74)
.02
Gender
Male
371 (74.05%)
204 (71.58%)
167 (77.31%)
.17
Female
130 (25.95%)
81 (28.42%)
49 (22.69%)
Race
Asian
9 (2.34%)
5 (2.30%)
4 (2.34%)
.03
Black
30 (7.73%)
10 (4.61%)
20 (11.70%)
White
349 (89.95%)
202 (93.09%)
147 (85.96%)
Smoke
Yes
222 (44.31%)
132 (46.32%)
90 (41.67%)
.30
No
279 (55.69%)
153 (53.68%)
126 (58.33%)
LOS*
39.77 ± 12.13, 40 (30,50)
40 ± 11.36, 40 (32,50)
39.43 ± 13.23, 41 (30,50)
.73
Stage
I
244 (49.09%)
144 (50.88%)
100 (46.73%)
.10
II
162 (32.60%)
97 (34.28%)
65 (30.37%)
III
84 (16.90%)
40 (14.13%)
44 (20.56%)
IV
7 (1.41%)
2 (0.71%)
5 (2.34%)
Morphology (ICD−0–3)
Squamous cell carcinoma, NOS (8070/3)
466 (93.01%)
268 (94.04%)
198 (91.67%)
.07
Squamous cell carcinoma, keratinizing (8071/3)
13 (2.59%)
3 (0.30%)
10 (4.61%)
Squamous cell carcinoma, large cell, nonkeratinizing (8072/3)
3 (0.60%)
2 (0.70%)
1 (0.46%)
Squamous cell carcinoma, small cell, nonkeratinizing (8073/3)
1 (0.20%)
0 (0.00%)
1 (0.46%)
Papillary squamous cell carcinoma (8052/3)
4 (0.80%)
2 (0.70%)
2 (0.93%)
Basaloid squamous cell carcinoma (8083/3)
14 (2.79%)
10 (3.51%)
4 (1.85%)
LOS* indicated the length of smoking year in patients who have smoking history.
Demographic and clinical‐pathological features of LSqCCpatientsLOS* indicated the length of smoking year in patients who have smoking history.Cox proportional hazard regression analyses were performed to evaluate the risk factors for mortality. Late stages (stage III/IV) were significant risk factor for mortality (HR (95% CI) = 1.527 (1.110, 2.103), p = .009, Table S1). Besides stage, African American patients had marginally higher hazards of mortality compared to white patients (HR (95% CI) = 1.578 (0.985, 2.527), p = .06, Table S1).
LincRNAs transcriptional profile of LSqCC samples
In independent LSqCC PT samples (n = 453) and SN tissue samples (n = 48), Logistic regression analysis showed that 72 (4%) lincRNAs were significant risk candidates for malignancies (OR > 1, p < .05), while 34 (2%) lincRNAs were significant protective candidates against malignancies (OR < 1, p < .05) (Figure 1a). Figure 1b listed 34 significant protective lincRNAs, and Figure 1c listed 72 significant risk lincRNAs for malignancies. Here, OR (odds ratio) represented the times of odds of malignancies at 1 unit (FPKM) increase of gene expression. However, the expression levels of some lincRNAs in LSqCC samples were as low as 0.001 or even smaller. So, compared to their low expression levels, 1 unit (FPKM) increase might result in an extremely large or small OR. Therefore, we used the natural logarithm of OR (95% CI) [logOR with log(95% CI lower limit) and log(95% CI lower limit)] to narrow down the scale.
Figure 1
Transcriptional profile of lincRNAs in LSqCC tumor samples and solid normal tissue samples. (a) Transcriptional profile of 1,771 LincRNAs was compared between independent primary tumor LSqCC samples and solid normal tissue samples. 72 (4%) lincRNAs were significant risk predictive candidates for malignancies (OR > 1, p < .05), while 34 (2%) lincRNAs were siginificant protective candidates against malignancies (OR < 1, p < .05). (b) Natural logarithm of OR and 95% CI of OR, logOR (95% CI), of 34 significant protective lincRNAs. (c) Natural logarithm of OR and 95% CI of OR, logOR (95% CI), of 72 significant risk lincRNAs. (d) Transcriptional profile in paired primary LSqCC tumor samples and solid normal tissue samples. Gene expression level of 23 (1.30%) lincRNAs were significantly higher in primary LSqCC tumor samples than matched solid normal tissue samples (p < .05). The expression levels of 47 (2.65%) lincRNAs were significantly lower in primary tumors, compared to matched solid normal tissues (p < .05). (e). The natural logarithm value of the gene expression ratio of PT versus SN samples (log(PT/SN)) of significant lincRNAs that overexpressed (red) or lower‐expressed (blue) in primary LSqCC tumors, compared to their paired solid normal tissue samples
Transcriptional profile of lincRNAs in LSqCC tumor samples and solid normal tissue samples. (a) Transcriptional profile of 1,771 LincRNAs was compared between independent primary tumorLSqCC samples and solid normal tissue samples. 72 (4%) lincRNAs were significant risk predictive candidates for malignancies (OR > 1, p < .05), while 34 (2%) lincRNAs were siginificant protective candidates against malignancies (OR < 1, p < .05). (b) Natural logarithm of OR and 95% CI of OR, logOR (95% CI), of 34 significant protective lincRNAs. (c) Natural logarithm of OR and 95% CI of OR, logOR (95% CI), of 72 significant risk lincRNAs. (d) Transcriptional profile in paired primary LSqCC tumor samples and solid normal tissue samples. Gene expression level of 23 (1.30%) lincRNAs were significantly higher in primary LSqCC tumor samples than matched solid normal tissue samples (p < .05). The expression levels of 47 (2.65%) lincRNAs were significantly lower in primary tumors, compared to matched solid normal tissues (p < .05). (e). The natural logarithm value of the gene expression ratio of PT versus SN samples (log(PT/SN)) of significant lincRNAs that overexpressed (red) or lower‐expressed (blue) in primary LSqCC tumors, compared to their paired solid normal tissue samplesTo further validate the risk/protective lincRNAs in predicting LSqCC, we compared the transcriptional profile of lincRNAs in 48 paired LSqCC PT samples and SN tissue samples by using Wilcoxon two‐sample paired signed rank test. As shown in Figure 1d, the expression level of 23 (1.30%) lincRNAs were significantly higher in LSqCC PT samples, compared to the matched SN tissue samples (p < .05); the expression levels of 47 (2.65%) lincRNAs were significantly lower in PT samples, compared to the matched SN tissues (p < .05). Figure 1e listed all he significant lincRNAs that overexpressed (red) or lowly‐expressed (blue) in LSqCC PT samples with the natural logarithm of their expression level of PT versus SN (log(PT/SN)). Among 23 highly‐expressed lincRNAs in primary tumor samples, 10 lincRNAs were significant risk candidates for malignancies in Logistic regression (Figure 1c). These validated risk lincRNAs were LINC00487, LINC01927, C10orf143 (LINC00959), LINC01031, LINC01088, LINC01352, LINC01504, LINC01931, LINC01985, and LINC02395. Among 47 lowly‐expressed lincRNAs in PT samples (blue, Figure 1e), 10 lincRNAs were significant protective candidates against malignancies in Logistic regression (Figure 1b). These validated protective factors were LINC00307, LINC00491, LINC01063, LINC01385, LINC01448, LINC01524, LINC01697, LINC01719, and LINC02026. The ORs (95% CIs) of above risk and protective lincRNAs for malignancies were listed in Table 2.
Table 2
Risk of lincRNAs for primary LSqCC tumor
LincRNAs
Outcome: primary tumor
Outcome: advanced stage (stage III/IV)
Odds Ratio (95% CI)
p‐value
Odds Ratio (95% CI)
p‐value
LINC00487
6.34E + 5 (109, 4.43E + 10)
.008
0.78 (0.09, 2.10)
.73
LINC00959
4.85 (1.35, 21.78)
.03
0.84 (0.38, 1.62)
.62
LINC01031
65.71 (1.28, 1.97E + 4)
.04
0.66 (0.08, 2.35)
.63
LINC01088
1.87 (1.02, 6.86)
.05
0.99 (0.82, 1.09)
.88
LINC01352
10.77 (1.66, 151.5)
.04
0.65 (0.24, 1.46)
.34
LINC01504
2.24 (1.01, 5.80)
.05
0.86 (0.49, 1.42)
.57
LINC01927
3.39E + 3 (16.00, 6.20E + 6)
.01
1.18 (0.16, 6.40)
.86
LINC01931
2.22E + 30 (5.41E + 3, 7.28E + 66)
<.0001
0.45 (5.83E−11, 6.60E + 4)
.91
LINC01985
73.30 (1.85, 2.03E + 4)
.03
0.76 (0.13, 3.38)
.74
LINC02395
5.89E + 36 (7.67E + 4, 8.80E + 82)
.02
0.01 (3.46E−15, 15.44)
.53
LINC00307
0.02 (0.00, 0.55)
.02
0.05 (1.97E−5, 7.14)
.35
LINC00491
0.77 (0.62, 0.96)
.02
0.85 (0.70, 1.03)
.12
LINC01063
0.85 (0.76, 0.97)
.01
0.90 (0.76, 1.05)
.21
LINC01385
0.40 (0.18, 0.92
.02
0.86 (0.32, 2.03)
.75
LINC01448
0.22 (0.07, 0.79)
.01
0.45 (0.05, 2.27)
.40
LINC01524
0.07 (0.01, 0.59)
.01
0.38 (0.01, 4.91)
.52
LINC01697
0.71 (0.52, 1.00)
.05
0.52 (0.26, 0.88)
.03
LINC01719
0.42 (0.24, 0.75)
.003
0.61 (0.27, 1.23)
.20
LINC02026
0.48 (0.24, 0.99)
.04
0.46 (0.15, 1.12)
.13
LINC02315
0.84 (0.76, 0.92)
.0002
1.01 (0.92, 1.11)
.81
Risk of lincRNAs for primary LSqCC tumor
Predictive values of candidate lincRNAs in LSqCC diagnosis
To investigate the predictive value of the lincRNA candidates in LSqCC diagnosis, logistic regression with ROC curve was performed for twenty lincRNA candidates. The top three risk lincRNAs with the biggest area under ROC curve (AUC) values were selected out and integrated into a small joint panel A (LINC00487 + LINC01927 + C10orf143 (LINC00959)). The top three protective lincRNAs with the biggest AUC values were selected out and integrated into a small joint panel B (LINC02315 + LINC00491 + LINC01697). The ROC curves were compared among the joint full model (all 20 lincRNA candidates), risk joint model (LINC00487 + LINC01927 + C10orf143 (LINC00959)), protective joint model (LINC02315 + LINC00491 + LINC01697), and the risk/ protective combined model** (LINC00487 + LINC01927 + C10orf143 (LINC00959) + LINC02315 + LINC00491 + LINC01697). Figure 2a showed that the AUC (95% CI) values for above four models were 0.8121 (95% CI = (0.7479, 0.8764)), 0.6885 (95% CI = (0.6209, 0.7561)), 0.6549 (95% CI = (0.5714, 0.7384)), and 0.7481 (95% CI = (0.6642, 0.8320)), respectively, in independent samples (nonpaired 453 PT samples and 48 solid SN samples). Figure 2b showed that the AUC values for above four models were 0.9067 (95% CI = (0.8485, 0.9648)), 0.7274 (95% CI = (0.6264, 0.8285)), 0.7049 (95% CI = (0.6006, 0.4097)), and 0.8030 (95% CI = (0.7250, 0.8810)), respectively, in 48 pairs of PT samples and SN tissue samples. The sensitivities and specificities for above models in the independent samples and paired samples were listed in Table S2. The gene expression level (FPKM) of LINC02315, LINC00491, and LINC01697 in independent PT samples (blue) and SN tissue samples (yellow) were shown in Figure 2c. Boxplots in Figure 2d showed the gene expression level (FPKM) of LINC00487, LINC01927, and C10orf143 (LINC00959) in independent PT samples (red) and SN tissue samples (yellow). Table S3 listed the statistical summary for actual gene expression levels (unit: FPKM).
Figure 2
Predictive values of candidate lincRNAs in LSqCC diagnosis. (a) ROC curves for full model (20 lincRNA candidates), a small joint panel of risk lincRNAs (LINC00487 + LINC01927 + LINC00959), a small joint panel of protective lincRNAs (LINC02315 + LINC00491 + LINC01697), and joint model of risk and protective lincRNAs (model**: LINC00487 + LINC01927 + LINC00959 + LINC02315 + LINC00491 + LINC01697) in independent samples (nonpaired 453 PT samples and 48 SN tissue samples). (b) ROC curves for full model (20 lincRNA candidates), a small joint panel of risk lincRNAs (LINC00487 + LINC01927 + LINC00959), a small joint panel of protective lincRNAs (LINC02315 + LINC00491 + LINC01697), and joint model of risk and protective lincRNAs (model**: LINC00487 + LINC01927 + LINC00959 + LINC02315 + LINC00491 + LINC01697) in paired PT samples and 48 SN tissue samples. (c) The gene expression level of LINC02315, LINC00491 and LINC01697 in independent PT samples (blue) and SN tissue samples (yellow). (d) The gene expression level of LINC00487, LINC01927, and LINC00959 in independent PT samples (red) and SN tissue samples (yellow). (e) Predictive values of 10 protective lincRNA candidates in LSqCC staging. LogOR (95% CI) represents the natural logarithm transformation of OR (95% CI). (f) Predictive values of 10 risk lincRNA candidates in LSqCC staging. LogOR (95% CI) represents the natural logarithm transformation of OR (95% CI)
Predictive values of candidate lincRNAs in LSqCC diagnosis. (a) ROC curves for full model (20 lincRNA candidates), a small joint panel of risk lincRNAs (LINC00487 + LINC01927 + LINC00959), a small joint panel of protective lincRNAs (LINC02315 + LINC00491 + LINC01697), and joint model of risk and protective lincRNAs (model**: LINC00487 + LINC01927 + LINC00959 + LINC02315 + LINC00491 + LINC01697) in independent samples (nonpaired 453 PT samples and 48 SN tissue samples). (b) ROC curves for full model (20 lincRNA candidates), a small joint panel of risk lincRNAs (LINC00487 + LINC01927 + LINC00959), a small joint panel of protective lincRNAs (LINC02315 + LINC00491 + LINC01697), and joint model of risk and protective lincRNAs (model**: LINC00487 + LINC01927 + LINC00959 + LINC02315 + LINC00491 + LINC01697) in paired PT samples and 48 SN tissue samples. (c) The gene expression level of LINC02315, LINC00491 and LINC01697 in independent PT samples (blue) and SN tissue samples (yellow). (d) The gene expression level of LINC00487, LINC01927, and LINC00959 in independent PT samples (red) and SN tissue samples (yellow). (e) Predictive values of 10 protective lincRNA candidates in LSqCC staging. LogOR (95% CI) represents the natural logarithm transformation of OR (95% CI). (f) Predictive values of 10 risk lincRNA candidates in LSqCC staging. LogOR (95% CI) represents the natural logarithm transformation of OR (95% CI)
Predictive values of LincRNAs in LSqCC staging
For patients in early stage (I/II), surgery provides the best chance for cure (Brunelli, Kim, Berger, & Addrizzo‐Harris, 2013; Handforth et al., 2015). For inoperable LSqCCpatients (stage III/IV), the prognosis remains poor (Strom, Bremnes, Sundstrom, Helbekkmo, & Aasebo, 2015). To evaluate the predictive values of lincRNA candidates in LSqCC staging, logistic regression was performed and ORs were calculated in 497 PT samples with definite tumor stage. Figure 2e and f demonstrated that all the risk and protective lincRNA candidates were significant indicators for advance stages, except LINC01697. The ORs (95% CI) and p‐values for lincRNA candidates were listed in the right panel in Table 2 (LINC01697, p = .03). Figure 2e showed LINC01697 as a significant negative indicator of advance stage (logOR (95% CI) <0). When evaluating the lincRNAs profile, we found that except above 20 risk/ protective lincRNA candidates, other 12 lincRNAs were significantly risk factor for advanced stages (logOR (95% CI) >0); LINC01697 and other 23 lincRNAs were significantly protective factors for advanced stages (logOR (95% CI) <0) (Figure S2).
Potential lincRNAs related to LSqCC prognosis
The prognosis for LSqCCpatients was poor. KM curve for 501 included LSqCCpatients’ survival was exhibited in Figure S3. The 1‐year, 5‐year, 10‐year, and 14‐year survival rate were 69.59%, 47.65%, 24.27%, and 15.02% respectively (Table S4). Because all patients from TCGA‐LSqCC project underwent tumor surgery and had tumor biopsies, so the 5‐year survival rate of these patients (47.65%) was much higher than the 5‐year survival rate of lung/ bronchus cancer of 18.1% from SEER database. Among 10 protective lincRNA candidates, none of them showed protective effect against mortality; LINC01524 was even a significant risk factor for poor survival (HR (95% CI) >1, Table 3). Among ten risk lincRNA candidates, we luckily obtained three lincRNAs (LINC01031, LINC01088, and LINC01931) that were also significant risk factor for poor survival (HR (95%CI) >1, Table 3). Patients were divided into lincRNA high‐level subgroup and low‐level subgroup, according to the optimal cut‐off lincRNA expression level that maximized sensitivity and specificity in predicting mortality. The KM curves and life‐test did not show a significant poorer OS for patients with LINC01524 high‐level group (Logrank p = .1134, Figure 3a). The 1‐year, 5‐year, and 10‐year survival rate of LINC01524 low‐level subgroup were 0.7489, 0.5194, and 0.2897, respectively, which were not significantly higher than the 1‐year, 5‐year, and 10‐year survival rate of 0.6750, 0.4589, and 0.2252, respectively, in LINC01524 high‐level subgroup (Logrank p‐value = .1134, Wilcoxon p‐value = .2085, Table S5A). KM curves and lifetest confirmed a significant poorer OS for patients with high level of LINC01031 (Logrank p = .0244, Figure 3b), LINC01088 (Logrank p = .0379, Figure 3c), and LINC01931 (Logrank p = .0204, Figure 3d) respectively. The 1‐year, 5‐year, and 10‐year survival rate of LINC01031 low‐level subgroup were 0.7056, 0.4890, and 0.2495, respectively, which were significantly higher than the 1‐year, 5‐year, and 10‐year survival rate of 0.5892, 0.0.3549, and 0, respectively, in LINC01031 high‐level subgroup (Logrank p‐value = .0244, Wilcoxon p‐value = .0073, Table S5B). The 1‐year, 5‐year, and 10‐year survival rate of LINC01088 low‐level subgroup were 0.7268, 0.5003, and 0.3248, respectively, which were significantly higher than the 1‐year, 5‐year, and 10‐year survival rate of 0. 6,503, 0.4413, and 0.0984, respectively, in LINC01088 high‐level subgroup (Logrank p‐value = .0379, Wilcoxon p‐value = .0857, Table S5C). The 1‐year, 5‐year, and 10‐year survival rate of LINC01931 low‐level subgroup were 0.7189, 0.5080, and 0.2755, respectively, which were significantly higher than the 1‐year, 5‐year, and 10‐year survival rate of 0.6319, 0.3955, and 0, respectively, in LINC01931 high‐level subgroup (Logrank p‐value = .0204, Wilcoxon p‐value = .0693, Table S5D).
Table 3
Risk of lincRNAs for mortality
Risky LincRNAs (PT > SN)
Univariate Cox regression
Protective LincRNAs (SN > PT)
Univariate Cox regression
HR (95% CI)
p‐value
HR (95% CI)
p‐value
LINC00487
0.14 (0.03, 0.74)
.003
LINC00307
4.76 (0.66, 34.20)
.12
LINC00959
0.97 (0.69, 1.38)
.88
LINC00491
0.97 (0.87, 1.08)
.59
LINC01031
1.92 (1.07, 3.45)
.04
LINC01063
0.98 (0.91, 1.07)
.71
LINC01088
1.05 (1.01, 1.10)
.05
LINC01385
1.23 (0.75, 2.02)
.41
LINC01352
1.05 (0.64, 1.70)
.86
LINC01448
1.16 (0.56, 2.40)
.69
LINC01504
0.95 (0.69, 1.32)
.77
LINC01524
3.83 (1.26, 11.64)
.03
LINC01927
0.79 (0.28, 2.25)
.66
LINC01697
1.03 (0.83, 1.28)
.77
LINC01931
114.42 (1.05, 1.25E + 4)
.04
LINC01719
0.76 (0.52, 1.13)
.17
LINC01985
1.61 (0.63, 4.10)
.33
LINC02026
0.82 (0.52, 1.28)
.37
LINC02395
1.19 (0.10, 14.95)
.89
LINC02315
1.04 (0.98, 1.10)
.18
Figure 3
Potential lincRNAs related to LSqCC prognosis. (a) HR (95% CI) of 10 protective lincRNA candidates (blue), LINC01524 was significant risk factor for mortality (HR (95% CI)<1). (b) HR (95% CI) of 10 risk lincRNA candidates (red), LINC01031, LINC01088, and LINC01931 were significantly risk factors for mortality (HR (95% CI)>1). (c) The gene expression level of protective candidate LINC01524 in independent primary tumor samples and solid normal tissue samples. (d) The gene expression level of risk candidate LINC01031, LINC01088, and LINC01931 in independent primary tumor samples and solid normal tissue samples. (e) KM curves for LINC01524, LINC01031, LINC01088, and LINC01931 high‐expression level surbgroup and low‐expression level subgroup
Risk of lincRNAs for mortalityPotential lincRNAs related to LSqCC prognosis. (a) HR (95% CI) of 10 protective lincRNA candidates (blue), LINC01524 was significant risk factor for mortality (HR (95% CI)<1). (b) HR (95% CI) of 10 risk lincRNA candidates (red), LINC01031, LINC01088, and LINC01931 were significantly risk factors for mortality (HR (95% CI)>1). (c) The gene expression level of protective candidate LINC01524 in independent primary tumor samples and solid normal tissue samples. (d) The gene expression level of risk candidate LINC01031, LINC01088, and LINC01931 in independent primary tumor samples and solid normal tissue samples. (e) KM curves for LINC01524, LINC01031, LINC01088, and LINC01931 high‐expression level surbgroup and low‐expression level subgroup
DISCUSSION
Lung cancer is the main cause of cancer‐related deaths worldwide, accounting for 19.4% of all cancer deaths. About 1.82 million new lung cancer cases occurred globally in 2012 (Varghese, Carlos, & Shin, 2014). Lung cancer is characterized by many malignant traits, such as tumor heterogeneity, aggressive proliferation, a high propensity for distant metastasis, and metabolic disorders (Ang et al., 2001). Late‐stage diagnosis and the lack of effective biomarkers and therapeutic targets attribute to the low survival rate (Brown et al., 2013). According to the histopathological presentation, more than eighty percent of lung cancer cases are nonsmall cell lung cancer (NSCLC) (Brown, Eraut, Trask, & Davison, 1996). Despite the recent advances in multimodal treatments, the outcome of lung cancer remains unfavorable. Based on SEER database 2009–2013, the 5‐year overall survival rate of lung cancer is around 18.1%. Gene expression profiling offers a comprehensive molecular understanding of lung cancer that may grant insights into its pathophysiology and yield relevant information for subtype classification, staging, prognosis, and therapeutic decision‐making (Yu et al., 2015).While only 1%–2% of human genome contains the blueprint for protein‐coding transcripts, up to 70%–90% of human genome is transcribed into RNA (Carninci et al., 2005). LincRNAs are typically co‐expressed with their neighboring gene (Cabili et al., 2011). Till now, multiple strategies have been developed to target lncRNAs (Haemmerle & Gutschner, 2015), including gene knockout or replacement (Sauvageau et al., 2013), promoter removal or stop signal integration (Gutschner, 2015; Hung et al., 2011), and the use of RNA destabilization elements, such as Zinc finger nuclease (Gutschner, Baas, & Diederichs, 2011), etc. For cancer treatment, there have been many successful preclinical trials. In gastric cancer, the knockdown of lncRNA GHET1 by shRNA could suppress tumor proliferation, invasion, migration, and enhance apoptosis (Huang, Liao, Zhu, Liu, & Cai, 2017). In cancer cells, LncRNA HOTAIR (HOX transcript antisense RNA) modulates cancer epigenome and facilitates metastasis (Gupta et al., 2010); silencing HOTAIR is viable to suppress cancer cells (Li et al., 2013). In breast and lung cancers, high level of lncRNA BC200 (BCYRN1) indicates a poor prognosis (Booy, McRae, Koul, Lin, & McKenna, 2017; Hu & Lu, 2015). The knockdown of BCYRN1 by siRNA could suppress cell proliferation in a broad spectrum of tumors (Booy et al., 2017).In this study, high levels of LINC01031, LINC01088, and LINC01931 were not only valuable in tumor diagnosis, but also predicted a poor survival in LSqCCpatients. Since lincRNAs always co‐express with their neighboring genes and regulate the transcription of these gene (Cabili et al., 2011), so LINC01031, LINC01088, and LINC01931 might promote LSqCC through their neighboring genes. LINC01031 locates at 1q31.2, and the nearby protein‐coding genes includes B3GALT2 (β‐1,3‐Galactosyltransferase 2), CDC73 (cell division cycle 73) (Walls et al., 2017), Grx2 (glutaredoxin‐2) (Lundberg et al., 2001), and F13B (coagulation factor XIII B subunit) (Webb, Coggan, Ichinose, & Board, 1989), etc. LINC01088 locates at 4q21.21; the adjacent genes are NAA11 (Pang, Clark, Chan, & Rennert, 2011), GK2 and ANTXR2 (Burgi et al., 2017), etc. LINC01931 locates at 2q23.3; this region has been suggested to be associated with humancolorectal cancer in Hispanics (Schmit et al., 2016). Therefore, future work should focus on lincRNA pathway investigation and treatment strategy development.Overall, our study provided comprehensive evaluation of the lincRNA transcriptional profiles of TCGA‐LSqCC samples. We searched out the risk/ protective lincRNA candidates for LSqCC diagnosis and prognosis. LINC01031, LINC01088, and LINC01931 might be the promising treatment targets for LSqCCpatients.
CONCLUSION
Compared to the normal lung tissues, LSqCC primary tumors had distinct LincRNAs transcriptional profile. Some lincRNAs could effectively predict lung malignancies. In LSqCCpatients, high levels of LINC01031, LINC01088, and LINC01931 were significantly associated with poor prognosis, suggesting potential targets for anti‐LSqCC treatment.
CONFLICT OF INTEREST
The authors declare that they have no competing interests.
AUTHORS’ CONTRIBUTIONS
JL and ZH designed the study and interpreted the patient data regarding diagnosis and survival. ZH performed the data analysis. JL and ZH was major contributor in literature search and writing the manuscript. All authors read and approved the final manuscript.
CONSENT FOR PUBLICATION
Not applicable.Click here for additional data file.Click here for additional data file.Click here for additional data file.Click here for additional data file.Click here for additional data file.Click here for additional data file.Click here for additional data file.Click here for additional data file.
Authors: T Brown; G Pilkington; A Bagust; A Boland; J Oyee; C Tudur-Smith; M Blundell; M Lai; C Martin Saborido; J Greenhalgh; Y Dundar; R Dickson Journal: Health Technol Assess Date: 2013-07 Impact factor: 4.014
Authors: Lingjie Li; Bo Liu; Orly L Wapinski; Miao-Chih Tsai; Kun Qu; Jiajing Zhang; Jeff C Carlson; Meihong Lin; Fengqin Fang; Rajnish A Gupta; Jill A Helms; Howard Y Chang Journal: Cell Rep Date: 2013-09-26 Impact factor: 9.423
Authors: Nicole M White; Christopher R Cabanski; Jessica M Silva-Fisher; Ha X Dang; Ramaswamy Govindan; Christopher A Maher Journal: Genome Biol Date: 2014-08-13 Impact factor: 13.583
Authors: Mustafa Işın; Ege Uysaler; Emre Özgür; Hikmet Köseoğlu; Öner Şanlı; Ömer B Yücel; Uğur Gezer; Nejat Dalay Journal: Front Genet Date: 2015-05-06 Impact factor: 4.599
Authors: Michele Salemi; Maria Paola Mogavero; Giuseppe Lanza; Laura M Mongioì; Aldo E Calogero; Raffaele Ferri Journal: Cells Date: 2022-06-15 Impact factor: 7.666
Authors: Michele Salemi; Giuseppe Lanza; Maria Paola Mogavero; Filomena I I Cosentino; Eugenia Borgione; Roberta Iorio; Giovanna Maria Ventola; Giovanna Marchese; Maria Grazia Salluzzo; Maria Ravo; Raffaele Ferri Journal: Int J Mol Sci Date: 2022-01-28 Impact factor: 5.923