Hongyan Zhao1, Peng Wang2, Gang Wang3, Shuo Zhang4, Feng Guo3. 1. Department of Critical Care Medicine, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China. 2. Department of Critical Care Medicine, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China. 3. Department of Pediatric Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China. 4. Department of Hand and Foot Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China.
Abstract
BACKGROUND: Wilms tumor (WT) is the most frequent malignancy of the kidney in children, and a subset of patients remains with a poor prognosis. This study aimed to identify key long non-coding RNAs (lncRNAs) related to prognosis and establish a genomic-clinicopathologic nomogram to predict survival in children with WT. METHODS: Clinical data of 124 WT patients and the relevant RNA sequencing data including lncRNAs expression signature of primary WT samples were obtained from the Therapeutically Applicable Research to Generate Effective Treatment (TARGET) Data Matrix. Then, lncRNAs associated with overall survival (OS) were identified through univariate Cox, least absolute shrinkage and selection operator (LASSO), and multivariate Cox regression analyses. The risk scores of 124 participants were calculated, and survival analyses were performed between low- and high-risk groups. A genomic-clinicopathologic nomogram was then developed and evaluated by time-dependent receiver operating characteristic (ROC) curves, including the area under the curve (AUC), calibration curve, and decision curve analysis. Subsequently, bioinformatics analyses were performed to explore the potential molecular mechanisms that affect the prognosis of WT. The package "DESeq2" was used to identify differentially expressed protein-coding genes (DEPCGs) between groups. Gene Set Enrichment Analysis (GSEA) was applied to explore the differences in pathways enrichment. The analytical tools CIBERSORTx and ESTIMATE were used to investigate the discrepancies of the immune microenvironment. RESULTS: A total of 10 lncRNAs were selected as independent predictors associated with OS (P<0.05). Participants in the high-risk group had a significantly worse OS and event-free survival (EFS) than those in the low-risk group (P<2E-16 and P=2.03E-04, respectively). The risk score and 3 clinicopathological features (gender, cooperative group protocol, and stage) were identified to construct the nomogram (combined model) (P=5.11E-17). The combined model (1-year AUC: 0.9272, 3-year AUC: 0.9428, 5-year AUC: 0.9259) and risk score model (1-year AUC: 0.9285, 3-year AUC: 0.9399, 5-year AUC: 0.9266) displayed higher predictive accuracy than that of the other models. Subsequently, 105 DEPCGs were identified. The GSEA revealed 4 significant pathways. Analysis with CIBERSORTx demonstrated that monocytes, macrophages M1, activated dendritic cells, and resting mast cells had significant infiltration differences between groups. CONCLUSIONS: This study constructed a genomic-clinicopathologic nomogram, which might present a novel and efficient method for treating patients with WT. 2021 Translational Pediatrics. All rights reserved.
BACKGROUND: Wilms tumor (WT) is the most frequent malignancy of the kidney in children, and a subset of patients remains with a poor prognosis. This study aimed to identify key long non-coding RNAs (lncRNAs) related to prognosis and establish a genomic-clinicopathologic nomogram to predict survival in children with WT. METHODS: Clinical data of 124 WT patients and the relevant RNA sequencing data including lncRNAs expression signature of primary WT samples were obtained from the Therapeutically Applicable Research to Generate Effective Treatment (TARGET) Data Matrix. Then, lncRNAs associated with overall survival (OS) were identified through univariate Cox, least absolute shrinkage and selection operator (LASSO), and multivariate Cox regression analyses. The risk scores of 124 participants were calculated, and survival analyses were performed between low- and high-risk groups. A genomic-clinicopathologic nomogram was then developed and evaluated by time-dependent receiver operating characteristic (ROC) curves, including the area under the curve (AUC), calibration curve, and decision curve analysis. Subsequently, bioinformatics analyses were performed to explore the potential molecular mechanisms that affect the prognosis of WT. The package "DESeq2" was used to identify differentially expressed protein-coding genes (DEPCGs) between groups. Gene Set Enrichment Analysis (GSEA) was applied to explore the differences in pathways enrichment. The analytical tools CIBERSORTx and ESTIMATE were used to investigate the discrepancies of the immune microenvironment. RESULTS: A total of 10 lncRNAs were selected as independent predictors associated with OS (P<0.05). Participants in the high-risk group had a significantly worse OS and event-free survival (EFS) than those in the low-risk group (P<2E-16 and P=2.03E-04, respectively). The risk score and 3 clinicopathological features (gender, cooperative group protocol, and stage) were identified to construct the nomogram (combined model) (P=5.11E-17). The combined model (1-year AUC: 0.9272, 3-year AUC: 0.9428, 5-year AUC: 0.9259) and risk score model (1-year AUC: 0.9285, 3-year AUC: 0.9399, 5-year AUC: 0.9266) displayed higher predictive accuracy than that of the other models. Subsequently, 105 DEPCGs were identified. The GSEA revealed 4 significant pathways. Analysis with CIBERSORTx demonstrated that monocytes, macrophages M1, activated dendritic cells, and resting mast cells had significant infiltration differences between groups. CONCLUSIONS: This study constructed a genomic-clinicopathologic nomogram, which might present a novel and efficient method for treating patients with WT. 2021 Translational Pediatrics. All rights reserved.
Wilms tumor (WT), also known as nephroblastoma, is the most frequent malignancy of the kidney in children, constituting nearly 90% of childhood renal tumors (1). Despite the development and advancement of various treatment approaches, 15% of WT patients undergo metastasis or relapse, suggesting poor prognosis after initial treatment (2). Therefore, it is of great clinical benefit to explore prognostic factors in children with different WT stages, which could provide valuable guidance on formulating an accurate therapeutic regime for each patient. Previous studies that were committed to identifying potential prognostic markers for WT patients relied on very few factors, mainly histology analysis and characterization of tumor stage, which has been deemed insufficient as WT in children was shown to be affected by a variety of factors, including genetic and epigenetic changes that underlie WT pathogenesis (1). The prognosis for WT children in clinical practice incorporates a diverse set of measurements, including age, tumor size, loss of heterozygosity (LOH) for chromosomes 16q and 1p, and sensitivity to medications (3-6). Moreover, radiation (7), surgery, microscopic residual disease, diffuse anaplasia (8), lymph node involvement (9), and a combination of lymph node and LOH status (10) have been associated with the prognosis of WT in children. Therefore, it is important to build a prognostic model using multiple variables that underlie a broad spectrum of genetic and clinicopathological factors.Long non-coding RNAs (lncRNAs) are characterized as non-coding transcripts longer than 200 nucleotides (11). Next-generation and high-throughput sequencing techniques have enabled a significant breakthrough in lncRNA identification and characterization in recent years. Gene expression of various lncRNAs is dysregulated in numerous cancers, and it has been reported to correlate with cancer recurrence, metastasis, and poor prognosis (12-24). Previous studies have suggested that several lncRNAs, such as LINC00473 (25), MIAT (26), SOX21-AS1 (27), and LINP1 (28), play important roles in the pathogenesis of WT. However, it remains largely unknown whether lncRNAs could be used as prognostic markers of WT in children. The present study sought to characterize lncRNA signature as a robust prediction for WT children’s prognosis and construct a prognostic risk index based on comprehensive RNA-sequencing analysis. We established a prognosis model using RNA-sequencing data and subsequently developed and validated a genomic- clinicopathological nomogram that integrated risk scores and traditional clinicopathological factors. Moreover, Gene Set Enrichment Analysis (GSEA) revealed molecular pathways significantly enriched in different risk groups (29,30). Additionally, CIBERSORTx and ESTIMATE algorithms were applied to explore the immune microenvironment discrepancies between the high-risk and low-risk groups (31-33). Altogether, the current study presents a novel and efficient method for the prognosis of patients with WT and provides insight into the molecular mechanisms that affect WT’s prognosis. We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/tp-20-318).
Methods
Study population and RNA-sequencing data processing
The level 3 RNAseq data of primary WT samples, including raw read counts and reads per kilobase million (RPKM), were obtained from the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA), and relevant clinical information of the WT patients was downloaded from the Therapeutically Applicable Research to Generate Effective Treatment (TARGET) Data Matrix in April 2020. This study was conducted following the TARGET publication guidelines. The inclusion criteria of this study were as follows: (I) pathological diagnosis of primary WT; (II) clinical information including age at diagnosis, gender, cooperative group protocol, stage, histologic classification, the first event, event-free survival (EFS), vital status, and overall survival (OS) was documented; and (III) complete RNAseq data. The exclusion criteria were as follows: (I) patients with a history of other malignancies; (II) tumor tissues were not used for RNA sequencing analysis; (III) clinical information not available. Thus, 124 WT patients were selected and enrolled in this study (Tables S1,S2). Participant clinicopathological characteristics are listed in . The geneset annotation GENCODE Human (GRCh38.p13) was used to annotate protein-coding genes and lncRNAs were selected for further study (34). The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Table 1
Clinicopathological characteristics of 124 WT patients
Characteristics
Median (range) or n (%)
Age at diagnosis in days
1,539 (156–5,698)
Gender
Female
70 (56.45)
Male
54 (43.55)
First event
None
27 (21.77)
Relapse or progression
97 (78.23)
Vital status
Alive
74 (59.68)
Dead
50 (40.32)
Protocol
NWTS-4
2 (1.61)
NWTS-5
122 (98.39)
Stage
I–II
65 (52.42)
III–IV
59 (47.58)
Histologic classification
FHWT
82 (66.13)
DAWT
42 (33.87)
WT, Wilms tumor; NWTS-4, the fourth National Wilms’ Tumor Study; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.
WT, Wilms tumor; NWTS-4, the fourth National Wilms’ Tumor Study; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.
Identification and validation of the prognostic lncRNA signature
The “survival” package in the R software (version 3.6.2; https://www.R-project.org/) was used to analyze RPKM data [transformed to log2(RPKM+1)] through univariate Cox regression analysis, and P<0.05 was used as the cut-off criterion. Next, through the “glmnet” package (version 3.0-2) in R, the least absolute shrinkage and selection operator (LASSO) Cox regression was performed to select key lncRNAs among the significant lncRNAs from univariate Cox analysis (35,36). We used 10 times cross-validations to determine the best penalty parameter lambda. The key lncRNAs selected by the LASSO method were considered as candidate variables for inclusion in the multivariate Cox analysis. Then, the independently prognostic lncRNAs were identified in the multivariate Cox analysis to obtain their coefficients (β values), with which the prognostic formula was constructed as follows: (β1 × log2(RPKM+1) value of gene1) + (β2 × log2(RPKM+1) value of gene2) + ... + (βn × log2(RPKM+1) value of genen). Subsequently, the risk score of each WT patient was calculated based on the prognostic formula. Then, the “surv_cutpoint” function in the “survminer” R package (version 0.4.6) was utilized to achieve the optimal cutoff values for risk scores. According to the cutoff values, we divided the 124 WT participants in the cohort into a low‐risk group and a high‐risk group. Between the low- and high-risk groups, differences in OS and EFS were examined by log-rank test, and Kaplan-Meier survival curves were constructed by the “survival” package in R software. A P value <0.05 was considered statistically significant.
Development and assessment of a predictive nomogram
Univariate and multivariate Cox regression analyses were conducted to ascertain whether the risk score of lncRNAs and the clinicopathological variables could be prognostic markers related to OS for WT patients, and a P<0.05 was deemed significant. Then, according to the results of Cox regression analyses, we constructed a genomic-clinicopathologic nomogram to predict 1‐, 3‐, and 5‐year OS through the “rms” package (version 5.1-4) in R software. Using the same R package, we generated a calibration curve to verify the consistency between the nomogram-predicted probability of OS and the actual OS. Furthermore, to assess the accuracy of the predictive nomogram, we performed time-dependent receiver operating characteristic (ROC) curve analyses and calculated the area under the curve (AUC) through the “timeROC” package (version 0.4) in R (37). Additionally, we conducted a decision curve analysis (DCA) to examine the clinical value with the “stdca.R” statistical code (38) in R.
Identification of differentially expressed genes and construction of protein-protein interaction network
Using the “DESeq2” package (version 1.26.0) in R, the differentially expressed genes, including protein-coding genes (DEPCGs) and lncRNAs (DELs) were identified by comparing the low-risk group and high-risk group of primary WT, with the screening conditions of |log2 fold change| >1, and a false discovery rate (FDR) or adjusted P-value <0.05. These DEPCGs were then used to construct the protein-protein interaction (PPI) network through the Search Tool for the Retrieval of Interacting Genes (STRING) online database (39). Required interaction score >0.4 was used as the cut-off criterion. Then, the data downloaded from STRING were used to model the PPI network through Cytoscape software (version 3.7.1, https://cytoscape.org) (40), and the densely connected regions were identified through the Molecular Complex Detection (MCODE) plug-in Cytoscape with the default criteria (degree cutoff =2, node score cutoff =0.2, Haircut = true, Fluff = false, K-score =2, and Max depth =100) (41).
GSEA
Before running GSEA, the normalization of the raw read counts of protein-coding genes was performed by “DESeq2”. Then, GSEA (version 4.0.3) was conducted to demonstrate significant differences in protein-coding genes between the low-risk and high-risk groups (29,30). Canonical pathway gene sets (c2.cp.v7.1.symbols.gmt) and Gene Ontology (GO) gene sets including biological process (BP) (c5.bp.v7.1.symbols.gmt), molecular function (MF) (c5.mf.v7.1.symbols.gmt), and cellular component (CC) (c5.cc.v7.1.symbols.gmt) were selected from the Molecular Signatures Database (MSigDB) as the reference gene sets (42-44). The number of permutations was set at 1,000, and the nominal P value <0.01 was considered statistically significant.
Analysis of immune infiltration level
The analytical tool CIBERSORTx provides an estimation of member cell types’ abundances in a mixed cell population, using gene expression data (31,32). We utilized CIBERSORTx to assess the proportions of 22 immune cell subtypes in the primary WT samples, using the RPKM data of protein-coding genes. The permutations for significance analysis were set at 1000. Then, we performed the 2-sided Wilcoxon test through the “ggpubr” package (version 0.2.5) in R, to compare differences in 22 immune cell subtypes between the low- and high-risk groups. Subsequently, through the “estimate” package (version 1.0.13), the immune and stromal scores were calculated based on the ESTIMATE algorithm (33), and Welch’s t-test performed the comparisons of the stromal and immune scores between the 2 groups with “ggstatsplot” package (version 0.4.0) in R. A P value < 0.05 denoted statistical significance.
Statistical analysis
The R software version 3.6.2 and several R packages were used for statistical analyses, and a 2-tailed P value <0.05 denoted statistical significance. The univariate and multivariate Cox regression analyses were performed by the “survival” package. The LASSO Cox regression analyses were calculated by the “glmnet” package (35,36), and 10 times cross-validations were used to determine the best penalty parameter lambda. Kaplan-Meier analyses and survival curves were constructed through the “survival” package. The “rms” package generated the nomogram and the calibration curve. The time-dependent ROC curve analyses were performed by the “timeROC” package (37).
Results
After filtering of low-expression lncRNAs, 2,927 lncRNAs were generated for further study. Subsequently, univariate Cox analysis was performed to obtain 244 lncRNAs associated with OS of WT patients (P<0.05). Among the 244 prognosis-associated lncRNAs, 24 were identified using the LASSO Cox method (). Then, through multivariate Cox analysis, ten lncRNAs (TENM3-AS1, AC022098.3, EMX2OS, AC099811.1, AL359710.1, ADAMTSL4-AS1, AC005944.1, MBNL1-AS1, AC002451.1, and AC120498.3) were selected from 24 candidates of lncRNAs as independent predictors associated with OS (P<0.05), and the coefficients of the 10 lncRNAs were obtained (). The formula of the lncRNA-based prognostic index model was imputed as follows: [1.3×log2(RPKM+1) value of TENM3-AS1] + [2.627×log2(RPKM+1) value of AC022098.3] + [‒0.5553×log2(RPKM+1) value of EMX2OS] + [‒7.114 × log2(RPKM+1) value of AC099811.1] + [19.99×log2(RPKM+1) value of AL359710.1] + [3.866 × log2(RPKM+1) value of ADAMTSL4-AS1] + [‒10.4 × log2(RPKM+1) value of AC005944.1] + [1.517 × log2(RPKM+1) value of MBNL1-AS1] + [‒1.431×log2(RPKM+1) value of AC002451.1] + [2.215×log2(RPKM+1) value of AC120498.3]. Then, the risk score for each participant was calculated, and all participants were divided into high‐risk (n=54) and low‐risk (n=70) groups based on the optimal cutoff value (1.5069) for risk scores (). Participants in the high‐risk group had a significantly worse OS and EFS than those in the low‐risk group (P<2E-16 and P=2.03E-04, respectively) (, ). Importantly, subgroup analyses based on the clinicopathological variables (i.e., gender, cooperative group protocol, stage, and histologic classification) indicated that the prognostic value of the risk scores for OS and EFS was independent of all these clinicopathological variables, except for EFS in the subgroup of stage I-II and favorable histology Wilms tumor (FHWT) (P=0.348 and P=0.326; ,
).
Figure 1
Candidate lncRNAs selection through LASSO-penalized Cox regression. (A) 10-fold cross-validation with lambda.min. (B) Coefficient profiles of the candidate lncRNAs. lncRNAs, long non-coding RNAs; LASSO, least absolute shrinkage, and selection operator.
Table 2
The results of multivariate Cox analysis to identify the independently predictive lncRNAs from 24 candidates
Gene symbol
Ensemble gene id
Coefficient (β values)
Hazard ratio
P value
TENM3-AS1
ENSG00000177822
1.300197
3.670021
0.000577
AC022098.3
ENSG00000267783
2.62672
13.82834
0.000852
EMX2OS
ENSG00000229847
‒0.55528
0.573914
0.001632
AC099811.1
ENSG00000236194
‒7.11422
0.000813
0.001753
AL359710.1
ENSG00000237461
19.98674
4.79E+08
0.002196
ADAMTSL4-AS1
ENSG00000203804
3.866465
47.77323
0.014825
AC005944.1
ENSG00000267469
‒10.4027
3.03E‒05
0.015802
MBNL1-AS1
ENSG00000229619
1.516977
4.558426
0.017516
AC002451.1
ENSG00000231170
‒1.43106
0.239055
0.031301
AC120498.3
ENSG00000260403
2.215427
9.165325
0.034148
AL139260.1
ENSG00000228436
1.229792
3.420518
0.053541
RABGAP1L-DT
ENSG00000227373
3.674629
39.434
0.054951
SH3RF3-AS1
ENSG00000259863
0.621723
1.862133
0.075875
LINC00630
ENSG00000223546
1.359918
3.895875
0.083571
AC245140.1
ENSG00000231830
1.778615
5.921647
0.083843
AC233266.2
ENSG00000261600
‒0.40106
0.669608
0.094665
TMEM72-AS1
ENSG00000224812
1.521191
4.577673
0.130971
AC009831.1
ENSG00000263823
1.992977
7.337346
0.203017
AC079922.2
ENSG00000237753
‒0.89755
0.407569
0.287507
AC091180.6
ENSG00000262039
‒2.26552
0.103776
0.303579
LCMT1-AS1
ENSG00000260448
0.651954
1.919288
0.414851
AC226119.1
ENSG00000253917
‒0.39665
0.67257
0.468275
POU6F2-AS1
ENSG00000224122
0.142447
1.153092
0.59815
AC018653.3
ENSG00000256967
0.038999
1.03977
0.957213
Figure 2
Risk score analyses of 124 WT patients in low- and high-risk groups. (A) Risk score distribution against the rank of the risk score. (B) OS status of patients. (C) Heatmap of the expression profiles. of the 10 lncRNAs. WT, Wilms tumor; OS, overall survival
Figure 3
Kaplan-Meier OS and EFS analyses between the high- and low-risk groups represented respectively by the red and blue curves. (A) All patients’ OS. (B) All patients’ EFS. (C) Male patients’ OS. (D) Male patients’ EFS. (E) Female patients’ OS. (F) Female patients’ EFS. (G) NWTS-5 patients’OS. (H) NWTS-5 patients’ EFS. (I) Stage I-II patients’ OS. (J) Stage I-II patients’ EFS. (K) Stage III-IV patients’ OS. (L) Stage III-IV patients’ EFS. (M) FHWT patients’ OS. (N) FHWT patients’ EFS. (O) DAWT patients’ OS. (P) DAWT patients’ EFS. OS, overall survival; EFS, event-free survival; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.
Table 3
The results of OS and EFS analyses for WT patients through log-rank test
Candidate lncRNAs selection through LASSO-penalized Cox regression. (A) 10-fold cross-validation with lambda.min. (B) Coefficient profiles of the candidate lncRNAs. lncRNAs, long non-coding RNAs; LASSO, least absolute shrinkage, and selection operator.Risk score analyses of 124 WT patients in low- and high-risk groups. (A) Risk score distribution against the rank of the risk score. (B) OS status of patients. (C) Heatmap of the expression profiles. of the 10 lncRNAs. WT, Wilms tumor; OS, overall survivalKaplan-Meier OS and EFS analyses between the high- and low-risk groups represented respectively by the red and blue curves. (A) All patients’ OS. (B) All patients’ EFS. (C) Male patients’ OS. (D) Male patients’ EFS. (E) Female patients’ OS. (F) Female patients’ EFS. (G) NWTS-5 patients’OS. (H) NWTS-5 patients’ EFS. (I) Stage I-II patients’ OS. (J) Stage I-II patients’ EFS. (K) Stage III-IV patients’ OS. (L) Stage III-IV patients’ EFS. (M) FHWT patients’ OS. (N) FHWT patients’ EFS. (O) DAWT patients’ OS. (P) DAWT patients’ EFS. OS, overall survival; EFS, event-free survival; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.OS, overall survival; EFS, event-free survival; WT, Wilms tumor; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.Based on the univariate Cox regression analysis results, the risk score of lncRNAs and 3 clinicopathological factors (i.e., gender, cooperative group protocol, and stage) were all significantly related to OS for WT patients (). The subsequent multivariate Cox analysis (P value for Wald test: 5.11E-17) further demonstrated that the risk score remained a powerful and independent prognostic factor (P=9.22E-16) (). Through integrating the 4 prognostic factors identified by univariate Cox analysis, a genomic-clinicopathologic nomogram was developed. This combined model is depicted in . The concordance index of the combined OS prediction model was 0.8780 (95% CI: 0.8407‒0.9154). The nomograms’ predictive accuracy was further evaluated by time-dependent ROC curves of 1‐, 3‐, and 5‐year OS among the models. The results suggested that the combined model and the risk score model had a significantly greater AUC than that of the gender, cooperative group protocol, and stage models (; ). Additionally, there was no significant difference in AUC between the combined model and the risk score model (1-year P=0.8610, 3-year P=0.7024, 5-year P=0.9354) (). Furthermore, the calibration curve also demonstrated high consistency between the actual proportion of 500-day OS and the nomogram-predicted probability of 500-day OS (). Meanwhile, DCA illustrated that the net benefit of the 500-day decision curve for the combined model was greater than that for the gender, cooperative group protocol, and stage model, further supporting the clinical value of the genomic-clinicopathologic nomogram ().
Table 4
The results of univariate and multivariate Cox analyses of OS for WT patients
Risk factors
N
Univariate Cox
Multivariate Cox (P= 5.11 E-17)
HR
95% CI
P value
HR
95% CI
P value
Age
124
1
0.9997–1
0.7487
Gender
Female
70
1
0.04857
1
0.5577
Male
54
1.751
1.004–3.055
1.2060
0.6447–2.2561
Protocol
NWTS-4
2
1
0.0369
1
0.1381
NWTS-5
122
0.2194
0.05277–0.9119
0.3222
0.0721–1.4393
Stage
I–II
65
1
1.60E-04
1
0.1072
III–IV
59
3.151
1.736–5.718
1.6741
0.8943–3.1336
Histologic classification
FHWT
82
1
0.7081
DAWT
42
1.118
0.6228–2.008
Risk score
124
2.62
2.124–3.231
2.17E-19
2.5147
2.0083–3.1489
9.22E-16
OS, overall survival; WT, Wilms tumor; HR, hazard ratio; CI, confidence interval; NWTS-4, the fourth National Wilms’ Tumor Study; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.
Figure 4
Construction and validation of the nomogram in WT patients. (A) The nomogram integrating the risk score of lncRNAs and 3 clinicopathological factors (gender, cooperative group protocol, and stage) to predict 1‐, 3‐, and 5‐year OS. The time-dependent ROC curves for predicting probabilities of patients with 1-year (B), 3-year (C), and 5-year (D) OS. The calibration curve (E) and the DCA curve (F) of the nomogram for predicting probabilities of patients with 500-day OS. WT, Wilms tumor; lncRNAs, long non-coding RNAs; OS, overall survival; ROC, receiver operating characteristic; DCA, decision curve analysis.
Table 5
1-, 3-, and 5-year AUCs (95% CI) among models
1–year
3–year
5–year
AUC
95% CI
AUC
95% CI
AUC
95% CI
Gender
0.5776
0.4394–0.7158
0.5834
0.4889–0.6779
0.5702
0.4732–0.6672
Protocol
0.5311
0.4627–0.5996
0.5241
0.4914–0.5569
0.5212
0.4923–0.5502
Stage
0.6754
0.5578–0.793
0.6725
0.5835–0.7616
0.6603
0.567–0.7535
Risk score
0.9285
0.8816–0.9755
0.9399
0.8935–0.9863
0.9266
0.8742–0.979
Combined
0.9272
0.878–0.9764
0.9428
0.8968–0.9887
0.9259
0.872–0.9799
AUC, the area under the time-dependent receiver operating characteristic curve; CI, confidence interval.
Table 6
The results (P values) of comparing AUCs between two models
1-year
3-year
5-year
Combined vs. Gender
1.19E-06
1.63E-11
9.02E-11
Combined vs. Protocol
1.68E-24
1.59E-51
5.89E-41
Combined vs. Stage
9.74E-06
1.50E-10
1.91E-09
Combined vs. Risk score
0.8610
0.7024
0.9354
Risk score vs. Gender
4.27E-07
3.75E-11
1.97E-10
Risk score vs. Protocol
2.34E-22
1.04E-46
3.09E-41
Risk score vs. Stage
3.24E-05
6.66E-09
2.72E-08
AUC, the area under the time-dependent receiver operating characteristic curve.
OS, overall survival; WT, Wilms tumor; HR, hazard ratio; CI, confidence interval; NWTS-4, the fourth National Wilms’ Tumor Study; NWTS-5, the fifth National Wilms’ Tumor Study; FHWT, favorable histology Wilms tumor; DAWT, diffuse anaplastic Wilms tumor.Construction and validation of the nomogram in WT patients. (A) The nomogram integrating the risk score of lncRNAs and 3 clinicopathological factors (gender, cooperative group protocol, and stage) to predict 1‐, 3‐, and 5‐year OS. The time-dependent ROC curves for predicting probabilities of patients with 1-year (B), 3-year (C), and 5-year (D) OS. The calibration curve (E) and the DCA curve (F) of the nomogram for predicting probabilities of patients with 500-day OS. WT, Wilms tumor; lncRNAs, long non-coding RNAs; OS, overall survival; ROC, receiver operating characteristic; DCA, decision curve analysis.AUC, the area under the time-dependent receiver operating characteristic curve; CI, confidence interval.AUC, the area under the time-dependent receiver operating characteristic curve.
Identification of differentially expressed genes and construction of PPI network
A total of 105 DEPCGs (24 upregulated and 81 downregulated) and 14 DELs (7 upregulated and 7 downregulated) were identified in the high-risk group (). Furthermore, the PPI network of the 105 DEPCGs was generated (). Following MCODE analysis, 2 densely connected regions in the PPI network were identified. The first region consisted of 15 target protein-coding genes, including MYL4, MYBPC2, ACTC1, MYLPF, DES, TRIM63, NEB, TTN, TNNI2, TNNC2, MYH3, UNC45B, ACTN2, MYH8, and TNNT2 (). The second region consisted of 4 target protein-coding genes, including ATP1A2, SLC1A2, SLC17A7, and GRM5 ().
Figure 5
Identification of differentially expressed genes and construction of PPI network. Volcano plots of differentially expressed protein-coding genes (A) and lncRNAs (B) between the low-risk group and high-risk group. (C) PPI network of differentially expressed protein-coding genes in the high-risk group. (D,E) Two densely connected regions recognized by MCODE. The upregulated and downregulated genes in the high-risk group were represented by red and blue, respectively. PPI, protein-protein interaction; lncRNAs, long non-coding RNAs; MCODE, Molecular Complex Detection.
Identification of differentially expressed genes and construction of PPI network. Volcano plots of differentially expressed protein-coding genes (A) and lncRNAs (B) between the low-risk group and high-risk group. (C) PPI network of differentially expressed protein-coding genes in the high-risk group. (D,E) Two densely connected regions recognized by MCODE. The upregulated and downregulated genes in the high-risk group were represented by red and blue, respectively. PPI, protein-protein interaction; lncRNAs, long non-coding RNAs; MCODE, Molecular Complex Detection.Through canonical pathway gene sets, GSEA identified energy-dependent regulation of mTOR by LKB1-AMPK from the Reactome subset, which was significantly enriched in the high-risk group [(nominal P=0.007859, normalized enrichment score (NES) = 1.728] (). By contrast, 3 other pathways, including the HNF3B pathway (nominal P=0.009434, NES = ‒1.631) from the PID subset (), olfactory transduction (nominal P=0.009634, NES = ‒1.599) from the Kyoto Encyclopedia of Gnese and Genomes (KEGG) subset (), and crosslinking of collagen fibrils (nominal P=0.009940, NES = ‒1.730) from the Reactome subset (), were significantly enriched in the low-risk group. Through GO gene sets, GSEA identified 11 GO terms consisted of 5 BP, 1 CC, and 5 MF, which were significantly enriched in the high-risk group (nominal P<0.01) (), and 19 GO terms consisted of 16 BP, 2 CC, and 1 MF, which were significantly enriched in the low-risk group (nominal P<0.01; ).
Figure 6
The results of GSEA between the low- risk group and high-risk group. (A) Energy-dependent regulation of mTOR by LKB1-AMPK from the Reactome subset, which was significantly enriched in the high-risk group. (B,C,D) Three pathways were significantly enriched in the low-risk group. (E) Histogram including nominal P-values of GO enrichment analysis. GSEA, Gene Set Enrichment Analysis; GO, Gene Ontology.
The results of GSEA between the low- risk group and high-risk group. (A) Energy-dependent regulation of mTOR by LKB1-AMPK from the Reactome subset, which was significantly enriched in the high-risk group. (B,C,D) Three pathways were significantly enriched in the low-risk group. (E) Histogram including nominal P-values of GO enrichment analysis. GSEA, Gene Set Enrichment Analysis; GO, Gene Ontology.The CIBERSORTx analysis demonstrated that the proportion of monocytes activated dendritic cells and resting mast cells were significantly reduced in the high-risk group compared to that in the low-risk group (P=0.0464, P=0.0241, and P=0.00266), and the proportion of M1 macrophages (pro-inflammatory) in the low-risk group was substantially lower than that in the high-risk group (P=0.00944; ). However, the stromal and immune scores showed no significance between groups (P=0.1339 and P=0. 8536; ).
Figure 7
Analysis of immune cell infiltration levels. Comparisons of the percentage of 22 immune cells (A), the stromal scores (B), and the immune scores (C) between the low- and high-risk groups of WT patients. WT, Wilms tumor.
Analysis of immune cell infiltration levels. Comparisons of the percentage of 22 immune cells (A), the stromal scores (B), and the immune scores (C) between the low- and high-risk groups of WT patients. WT, Wilms tumor.
Discussion
In clinical practice, it is apparent that tumor progression and prognosis of patients are affected by multiple factors, yet many studies have only focused on single or few clinicopathologic factors for WT patients’ prognosis (7-9). For instance, Tang et al. constructed nomograms to predict OS of children with WT based on 5 clinicopathologic factors, but AUCs of 3- and 5-year OS were not more than 0.74 (45). Nonetheless, prognostic tumor biomarkers also play a vital role in the diagnosis and treatment of tumors. For example, LOH for chromosomes 16q and 1p was demonstrated as a specific marker for increased relapse risk in favorable-histology WT (46). Several retrospective studies of heterogeneously-treated patients suggested an association between a gain of chromosomes 1q and tumor recurrence (47,48). Our previous study also indicated that astrocyte elevated gene-1 overexpression in histologically favorable WT was associated with poor patient prognosis (49). Also, Gong et al. have reported a 5-microRNA signature model to predict WT patients’ prognosis based on the TARGET database, but AUC was only 0.767 (50). Very few studies currently focus on the development of prognostic models that integrate the clinicopathologic features and biomarkers based on comprehensive sequencing analysis for WT patient prognosis.In this study, we performed multiple analyses to identify the prognostic lncRNA signature, including filtering of low-expression lncRNAs, identification of lncRNAs associated with OS through univariate Cox analysis and LASSO method, and identification of independently predictive lncRNAs through multivariate Cox analysis. As a result, the potential minimum number of robust lncRNAs to predict WT patients’ prognosis was obtained. Accordingly, a risk-score formula was constructed, and the subsequent log-rank tests and Kaplan-Meier analyses further confirmed the predictive value of the risk score for OS and EFS in the low- and high-risk groups of WT patients. Among the 10 prognostic lncRNAs in the present study, the low expression of EMX2OS (also known as EMX2-AS1 or NCRNA00045) was correlated with a poor prognosis in participants with WT [β=‒0.55528, hazard ratio (HR) =0.573914; ]. Interestingly, Gu et al. also reported that downregulation of EMX2OS was an independent prognosis factor for shorter recurrence-free survival of classical papillary thyroid cancer (51).Nomograms also called nomographs or alignment charts, which are widely used as prognostic devices in the field of medicine and oncology, can generate the individual probability of clinical events by integrating multiple prognostic variables (52). In this study, a genomic- clinicopathologic predictive nomogram was developed through incorporating multiple prognostic factors including gender, cooperative group protocol, stage, and risk score to predict 1-, 3-, and 5-year OS of children with WT, which has shown stronger prediction with better AUCs (1-, 3-, and 5-year OS in the combined model of 0.9272, 0.9428, and 0.9259, respectively) (45,50). The combined model and the risk score model demonstrated significantly higher predictive accuracy, which was further verified by the time-dependent ROC curve and calibration curve analyses. Importantly, there was no significant difference in AUC between the combined and the risk score models, which indicated that the risk score played a critical role in the genomic-clinicopathologic nomogram. Hence, as a powerful and independent prognostic factor, the risk score achieved extremely high accuracy without combining clinicopathologic factors.Nonetheless, several limitations need to be addressed in the present study. First, WT’s sample size was relatively small, which might lead to bias that could reduce the accuracy of the prognostic model. Second, an external validation cohort for the prognostic model was not available, and all the data analyzed were collected from the TARGET Data Matrix, which might also result in biased discovery. Moreover, the results of bioinformatics were not verified through reverse transcription polymerase chain reaction (RT-qPCR). Finally, besides mRNA sequencing data, other potential biomarkers such as LOH at 1p/16q, gain of 1q, microRNA, and DNA methylation were not integrated into the study.
Conclusions
In summary, through constructing a genomic-clinicopathologic nomogram, we presented a novel and efficient method for the prognosis of patients with WT and provided insight into the molecular mechanisms that affect WT’s prognosis.The article’s supplementary files as
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Sue C Kaste; Jeffrey S Dome; Paul S Babyn; Norbert M Graf; Paul Grundy; Jan Godzinski; Gill A Levitt; Helen Jenkinson Journal: Pediatr Radiol Date: 2007-11-17
Authors: Jeffrey S Dome; Norbert Graf; James I Geller; Conrad V Fernandez; Elizabeth A Mullen; Filippo Spreafico; Marry Van den Heuvel-Eibrink; Kathy Pritchard-Jones Journal: J Clin Oncol Date: 2015-08-24 Impact factor: 44.544
Authors: Aaron M Newman; Chloé B Steen; Chih Long Liu; Andrew J Gentles; Aadel A Chaudhuri; Florian Scherer; Michael S Khodadoust; Mohammad S Esfahani; Bogdan A Luca; David Steiner; Maximilian Diehn; Ash A Alizadeh Journal: Nat Biotechnol Date: 2019-05-06 Impact factor: 54.908
Authors: Andrea Franceschini; Damian Szklarczyk; Sune Frankild; Michael Kuhn; Milan Simonovic; Alexander Roth; Jianyi Lin; Pablo Minguez; Peer Bork; Christian von Mering; Lars J Jensen Journal: Nucleic Acids Res Date: 2012-11-29 Impact factor: 16.971