MicroRNA (miR) signatures may aid the diagnosis and prediction of cancer; therefore, miRs associated with the prognosis of esophageal squamous cell carcinoma (ESCC) were screened. miR‑sequencing (seq) and mRNA‑seq data from early‑stage ESCC samples were downloaded from The Cancer Genome Atlas (TCGA) database, and samples from subjects with a >6‑month survival time were assessed with Cox regression analysis for prognosis‑associated miRs. A further two miR expression datasets of ESCC samples, GSE43732 and GSE13937, were downloaded from the Gene Expression Omnibus database. Common miRs between prognosis‑associated miRs, and miRs in the GSE43732 and GSE13937, datasets were used for risk score calculations for each sample, and median risk scores were applied for the stratification of low‑ and high‑risk samples. A prognostic scoring system of signature miRs was subsequently constructed and used for survival analysis for low‑ and high‑risk samples. Differentially‑expressed genes (DEGs) corresponding to all miRs were screened and functional annotation was performed. A total of 34 prognostic miRs were screened and a scoring system was created using 10 signature miRs (hsa‑miR‑140, ‑33b, ‑34b, ‑144, ‑486, ‑214, ‑129‑2, ‑374a and ‑412). Using this system, low‑risk samples were identified to be associated with longer survival compared with high‑risk samples in the TCGA and GSE43732 datasets. Age, alcohol and tobacco use, and radiotherapy were prognostic factors for samples with different risk scores and the same clinical features. There were 168 DEGs, and the top 20 risk scores positively‑correlated and the top 20 risk scores negatively‑correlated DEGs were significantly enriched for six and 10 functional terms, respectively. 'Tight junction' and 'melanogenesis' were two significantly enriched pathways of DEGs. miR‑214, miR‑129‑2, miR‑37a and miR‑486 may predict ESCC patient survival, although further studies to validate this hypothesis are required.
MicroRNA (miR) signatures may aid the diagnosis and prediction of cancer; therefore, miRs associated with the prognosis of esophageal squamous cell carcinoma (ESCC) were screened. miR‑sequencing (seq) and mRNA‑seq data from early‑stage ESCC samples were downloaded from The Cancer Genome Atlas (TCGA) database, and samples from subjects with a >6‑month survival time were assessed with Cox regression analysis for prognosis‑associated miRs. A further two miR expression datasets of ESCC samples, GSE43732 and GSE13937, were downloaded from the Gene Expression Omnibus database. Common miRs between prognosis‑associated miRs, and miRs in the GSE43732 and GSE13937, datasets were used for risk score calculations for each sample, and median risk scores were applied for the stratification of low‑ and high‑risk samples. A prognostic scoring system of signature miRs was subsequently constructed and used for survival analysis for low‑ and high‑risk samples. Differentially‑expressed genes (DEGs) corresponding to all miRs were screened and functional annotation was performed. A total of 34 prognostic miRs were screened and a scoring system was created using 10 signature miRs (hsa‑miR‑140, ‑33b, ‑34b, ‑144, ‑486, ‑214, ‑129‑2, ‑374a and ‑412). Using this system, low‑risk samples were identified to be associated with longer survival compared with high‑risk samples in the TCGA and GSE43732 datasets. Age, alcohol and tobacco use, and radiotherapy were prognostic factors for samples with different risk scores and the same clinical features. There were 168 DEGs, and the top 20 risk scores positively‑correlated and the top 20 risk scores negatively‑correlated DEGs were significantly enriched for six and 10 functional terms, respectively. 'Tight junction' and 'melanogenesis' were two significantly enriched pathways of DEGs. miR‑214, miR‑129‑2, miR‑37a and miR‑486 may predict ESCC patient survival, although further studies to validate this hypothesis are required.
With respect to prognosis and mortality, esophageal squamous cell carcinoma (ESCC) is the 8th most common type of cancer and the 6th most common cancer-associated cause of premature mortality (1). Globally, ~450,000 people are affected by ESCC and this incidence is growing (2), with ~500,000 new cases diagnosed each year (3). The 5-year survival for patients with ESCC remains low (15–20%) (4). The cure rate of early stage ESCC is as high as 50% following surgical resection (5), although a number of patients with ESCC are not candidates for surgery due to comorbid conditions, including advanced age. In these cases, the 30-day mortality is 2–10% (6).Numerous studies have revealed that smoking and pre-diagnosis alcohol consumption are risk factors for ESCC, and the surgical technique, biological behavior, postoperative treatment and response to chemoradiotherapies contribute to improving prognosis (7,8). There are additional genetic alterations that contribute to the prognosis of ESCC, including somatic mutations, copy number variations and gene expression alterations (9). MicroRNAs (miRs) are useful diagnostic and prognostic indicators for humancancer (10), and miR-377 suppresses the initiation and progression of ESCC by inhibiting cluster of differentiation 133 and vascular endothelial growth factor (11). miR-1290 and miR-613 are prognostic factors for patients with ESCC (12,13), and high expression of miR-103/107 is associated with poor survival in patients with ESCC (14). Nevertheless, miRs may cooperate to drive the progression and prognosis of esophageal carcinoma.miR signatures may aid in the diagnosis and prognosis of cancer (15). Feber et al (16) assessed the association of miR expression with patient survival and lymph node metastasis by evaluating miR expression in 45 primary tumors. This previous study identified that miR profiles have prognostic value for staging patients with ESCC. The present study screened signature miRs involved in predicting ESCC using miR-sequencing (seq) and mRNA datasets from The Cancer Genome Atlas (TCGA; gdc-portal.nci.nih.gov) and the Gene Expression Omnibus (GEO; www.ncbi.nlm.nih.gov) database. Subsequently, a prognostic scoring system was created to identify predictive miRs using sample risk scores. All cancer samples were divided into high- and low-risk categories and validated using the scoring system, and the differentially-expressed genes (DEGs) associated with miRs were functionally annotated.
Materials and methods
Microarray data
miR-seq and mRNA-seq data from early stage ESCC samples were downloaded from TCGA on March 18, 2017 and 89 samples with miR and mRNA expression data were obtained by matching barcodes. These were early stage (stage I and II) cancer samples. This dataset was used as a test dataset.A further two miR expression datasets of ESCC samples, GSE43732 and GSE13937, were downloaded from the GEO database. The GSE43732 dataset was based on the platform of GPL16543, and contained 53 early stage cancer samples. The GSE13937 dataset was based on the platform of GPL8835, and contained 31 early stage cancer samples. These two datasets were used as validation datasets. Clinical feature data for all downloaded datasets were also collected (Table I).
Table I.
Clinical features of cancer samples downloaded from TCGA and the Gene Expression Omnibus.
Clinical feature
TCGA (n=89)
GSE43732 (n=53)
GSE13937 (n=31)
Age, mean ± standard deviation
63.02±12.44
59.21±9.26
–
Gender
Male
62
43
–
Female
17
10
–
Pathologic_M
M0
79
–
–
M1
0
–
–
Pathologic_N (/N1)
N0
64
43
–
N1
24
10
–
Pathologic_T
T0
1
–
–
T1
24
7
–
T2
33
15
–
T3
31
31
–
Alcohol
Yes
66
32
23
No
22
21
7
Smoking
Yes
17
34
23
No
31
19
7
Reformed
35
–
–
New tumor
Yes
27
–
–
No
60
–
–
Radiation therapy
Yes
18
–
14
No
65
–
17
Mortality
Succumbed
27
24
14
Survived
67
29
17
Overall survival time (months) (mean ± standard)
20.47±20.59
44.7±24.05
29.78±20.89
TCGA, The Cancer Genome Atlas.
Prognostic miRs
The overall prognosis of patients with early stage ESCC is comparatively good. Samples with a <6-month censor time are not representative samples for analyzing prognostic factors. Therefore, miR-seq data samples from TCGA with a survival time of <6 months were removed to avoid introducing more mixed factors, and the remaining 77 samples assessed with Cox regression analysis using the survival package in R (17) to identify prognostic miRs (threshold of P<0.01 for the log rank test).
Prognostic scoring system
Prognostic miRs were matched with miRs in the GSE43732 and GSE13937 datasets, and common ones were collected. Selected miRs were ranked according to log rank P-values to construct a prognosis scoring system. miRs were added singly subsequent to the first three, until the highest P-value representing correlation significance between samples and overall survival time was obtained. When the P-value was greatest, miRs were considered to be signature miRs, and the scoring system was created using these miRs.Risk scores are used to assess risk factors for large samples (18). Signature miRs were used to calculate risk scores for samples in the TCGA dataset using the following formula:Risk score = β gene 1 × expr gene 1 + β gene 2 × expr gene 2 + … + β gene n × expr gene n, where β gene indicates the regression coefficients of the gene, and the exp gene indicates its expression levels.The risk scores of validation samples (GSE43732 and GSE13937) were computed, and a median risk score was applied to stratify low- and high-risk samples. Subsequently, survival correlation coefficients between low- and high-risk samples in the TCGA and GEO datasets, and correlations among risk scores, were assessed. In addition, correlations between clinical features and sample prognosis were analyzed via Cox regression.
Functional annotation of samples with different prognosis risks
The matched RNA-seq data was downloaded from TCGA according to the barcodes of the samples used in the prognostic miRNA analysis. The RNA-seq data was used to screen the DEGs between high- and low-risk samples using the limma package in R (bioconductor.org/packages/release/bioc/html/limma.html) (19). A false discovery rate (FDR) of <0.05 was set as the threshold. Correlation coefficients for gene expression and risk scores were computed, and positively and negatively-correlated genes were annotated with respect to significant functional terms, and Kyoto Encyclopedia of Genes and Genomes (KEGG; www.genome.jp/kegg) pathway terms, using DAVID (david.ncifcrf.gov) (20).
Results
Using Cox regression analysis on samples that indicated a survival time of >6 months, 34 prognostic miRs from the miR-seq dataset were screened and 16 common miRs were identified between the GSE43732 and GSE13937 datasets (Table II).
Table II.
Common miRs between prognosis-associated miRs, and miRs in GSE43732 and GSE13937.
miR
P-value
hsa-miR-129-2
3.29 ×10−05
hsa-miR-34b
1.86 ×10−04
hsa-miR-374a
1.92 ×10−04
hsa-miR-412
1.99 ×10−04
hsa-miR-140
4.66 ×10−04
hsa-miR-214
5.15 ×10−04
hsa-miR-144
1.57 ×10−03
hsa-miR-376b
1.59 ×10−03
hsa-miR-486
1.67 ×10−03
hsa-miR-33b
3.99 ×10−03
hsa-let-7f-1
6.22 ×10−03
hsa-miR-494
6.24 ×10−03
hsa-miR-33a
6.37 ×10−03
hsa-miR-432
6.73 ×10−03
hsa-miR-219-1
7.88 ×10−03
hsa-miR-188
9.87 ×10−03
miR, microRNA.
To create a prognostic scoring system, common miRs between prognostic miRs and miRs in the GEO datasets were added singly following the first three, until the highest P-value representing connection significance between samples and overall survival time was obtained.A prognostic scoring system was created using the 10 signature miRs with the greatest P-values, and low-risk samples had greater survival in the TCGA and GSE43732 datasets. These data appear in Fig. 1A and B. Differences in the GSE13937 dataset were not notable (Fig. 1C). Regression analysis revealed that risk scores were correlated with prognosis (P=0.0141; Table III). Differences in expression among 10 signature genes in samples stratified by clinical features were noted, and Table IV shows the risk factors that were prognostic for samples with different risk scores (P<0.05). Survival curves are presented in Figs. 2–5. Risk scores for samples, survival time and expression clustering heatmaps of the 10 signature miRs from the TCGA, GSE13937 and GSE43732 datasets are in Fig. 6.
Figure 1.
Survival curves for patients with early stage esophageal carcinoma stratified by low- and high-risk. Samples from (A) The Cancer Genome Atlas, and (B) GSE43732 and (C) GSE13937 datasets. **P<0.05.
Table III.
Cox regression results for the prognosis-associated clinical features.
Clinical feature
P-value
Hazards regression (confidence interval)
Age, >60 years vs. <60 years
0.97
1.016 (0.438–2.356)
Sex, male vs. female
0.615
1.325 (0.442–3.97)
Alcohol, yes vs. no
0.916
0.943 (0.318–2.793)
Tobacco, yes vs. no vs. reformed
0.56
0.872 (0.551–1.382)
New tumor, yes vs. no
0.726
1.168 (0.491–2.778)
Radiation therapy, yes vs. no
0.9302
0.951 (0.3113–2.907)
Risk score, high vs. low
0.0141
1.21 (1.005–1.458)
Table IV.
Prognostic factors in high- and low-risk samples under the same clinical features.
Clinical feature
P-value
Age
≥60, n=39
0.0119
≤60, n=38
0.1315
Gender
Male, n=6
0.0731
Female, n=15
0.07537
Alcohol
Yes, n=59
0.002
No, n=18
0.548
Smoker
Yes, n=15
0.193
No, n=25
0.0253
Reformed, n=31
0.166
New tumor
Yes, n=27
0.166
No, n=48
0.0175
Radiation therapy
Yes, n=17
0.945
No, n=54
0.000642
Figure 2.
Survival curves of high- and low-risk samples of different ages. (A) Samples <60 years of age. High-risk samples are red and low-risk samples are black. (B) Samples > 60 years of age. High-risk samples are purple and low-risk samples are blue. (C) Combined survival curves of samples with age groups above and below the median age. Curves crossed with P>0.05 represent different samples which cannot be distinguished by risk score, while curves with P<0.05 represent samples that may be distinguished by risk score. **P<0.05.
Figure 5.
Survival curves of high- and low-risk samples with/without radiation therapy. (A) Samples with no radiation therapy. High-risk samples are red, and low-risk samples are black. (B) Samples with radiation therapy. High-risk samples are purple, and low-risk samples are blue. (C) Combined survival curves from those with/without radiation therapy. **P<0.05.
Figure 6.
Risk scores, survival and expression clustering heatmap of the 10 signature microRNAs of all early stage esophageal carcinoma samples. Samples from (A) The Cancer Genome Atlas, and (B) GSE13937 and (C) GSE43732 datasets.
In total, 168 DEGs were identified, and 58 were negatively-associated with risk scores, with 110 positively-associated with risk scores. The expression pattern of the top 20 DEGs positively- and negatively-associated with risk scores differed significantly between low and high-risk samples (Fig. 7A). The GO enrichment of the DEGs is presented in Fig. 7B. The top 20 positively-associated DEGs were significantly enriched in six KEGG pathways, including: hsa05217-Basal cell carcinoma, hsa04916-Melanogenesis, hsa04610-Complement and coagulation cascades, hsa04530-Tight junction, hsa04340-Hedgehog signaling pathway and hsa03320-PPAR signaling pathway (Fig. 7C).
Figure 7.
Expression pattern and functional annotation of the DEGs positively- and negatively-associated with risk scores. (A) Expression pattern of the top 20 DEGs positively- and negatively-associated with risk scores. X-axis represents the samples in TCGA dataset, wich risk scores increase from left to right. Y-axis represents the DEGs expression levels. (B) The GO enrichment of the DEGs. (C) KEGG pathway enrichment of the top 20 positively-associated DEGs.
Discussion
In order to screen miRs involved in the prognosis of ESCC, miR-seq and mRNA-seq data for early stage ESCC samples were downloaded from TCGA, with a further two miR expression datasets, GSE43732 and GSE13937, downloaded from the GEO database. miR-seq data samples with a survival time of >6 months were subjected to Cox regression analysis to assess prognostic value. Common prognostic miRs, and miRs in the GSE43732 and GSE13937 datasets, were used for risk score calculations, and a median risk score was used to stratify low- and high-risk samples. A prognostic scoring system of 10 signature miRs was made according to survival analysis between low- and high-risk samples. It was noted that low-risk samples had greater survival compared with high-risk samples in the TCGA and GSE43732 datasets. Age, alcohol and tobacco use, and radiotherapy were prognostic factors for samples with different risk scores. The present study identified 168 DEGs for all miRs, 110 of which were positively correlated with risk scores. The top 20 positively-correlated and top 20 negatively-correlated DEGs were significantly enriched in six and 10 functional terms, respectively. There were six significantly enriched KEGG pathways, including ‘tight junction’ and ‘melanogenesis’.Prognostic scoring is used to predict survival and disease recurrence for a number of types of cancer (21). Wang et al (17) established a 53-gene expression system to be used to predict overall survival for gastric cancer. Mao et al (22) created a 12-gene prognostic scoring system to guide adjuvant therapy for breast cancer. Yang et al (23) created a miR signature to stratify patients with Barrett's esophagus with different prognostic risks for targeted chemoprevention.A number of miRs in the prognostic system used in the present study have been previously implicated in ESCC or some other malignant tumors. miR-214, a miR that regulates cancer cell proliferation, migration and invasion by targeting phosphatase and tensin homolog in gastric cancer, has been reported to reduce cell survival via downregulation of Bcl2l2 in cervical cancer cells (24,25). The predictive value of miR-214 for prognosis and multidrug resistance has been implicated in ESCC (26). Overexpression has been reported to enhance cisplatin sensitivity in ESCC by directly targeting surviving, and indirectly through CUG triplet repeat RNA binding protein 1 (27). miR-129-2 suppresses the proliferation and migration of ESCC via downregulation of SRY-related HMG box 4, and miR-129 is hypothesized to be a novel therapeutic target and biomarker in gastrointestinal cancer (28,29). miR-37a is a prognostic marker for patient survival in early-stage non-small cell lung cancer (30). miR-39a has been implicated in cell proliferation, migration and invasion in gastric cancer by targeting SRC kinase signaling inhibitor 1 (31). miR-486-5p expression is frequently decreased in humancancer. Low or unaltered expression of miR-486-5p compared with neighboring normal tissues has been demonstrated to be associated with a poor prognosis, and high expression with a good prognosis, in gastric cancer (32). miR-486 was observed to be downregulated in ESCC tissues (33). In patients with ESCC, miR-486-3p was highly expressed following chemotherapy treatment (34). In conclusion, miR-214, miR-129-2, miR-37a and miR-486 may predict survival in patients with ESCC, although these data require validation with larger studies.
Authors: Lai Mun Wang; David Kevans; Hugh Mulcahy; Jacintha O'Sullivan; David Fennelly; John Hyland; Diarmuid O'Donoghue; Kieran Sheahan Journal: Am J Surg Pathol Date: 2009-01 Impact factor: 6.394
Authors: Jacques Ferlay; Isabelle Soerjomataram; Rajesh Dikshit; Sultan Eser; Colin Mathers; Marise Rebelo; Donald Maxwell Parkin; David Forman; Freddie Bray Journal: Int J Cancer Date: 2014-10-09 Impact factor: 7.396