Literature DB >> 35498426

Identification and Validation of a Prognostic Prediction Model in Diffuse Large B-Cell Lymphoma.

Jiaqin Yan¹, Wei Yuan^1,2,3, Junhui Zhang⁴, Ling Li¹, Lei Zhang¹, Xudong Zhang¹, Mingzhi Zhang¹.

Abstract

Background: Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous group with varied pathophysiological, genetic, and clinical features, accounting for approximately one-third of all lymphoma cases worldwide. Notwithstanding that unprecedented scientific progress has been achieved over the years, the survival of DLBCL patients remains low, emphasizing the need to develop novel prognostic biomarkers for early risk stratification and treatment optimization. Method: In this study, we screened genes related to the overall survival (OS) of DLBCL patients in datasets GSE117556, GSE10846, and GSE31312 using univariate Cox analysis. Survival-related genes among the three datasets were screened according to the criteria: hazard ratio (HR) >1 or <1 and p-value <0.01. Least Absolute Shrinkage and Selection Operator (LASSO) and multivariate Cox regression analysis were used to optimize and establish the final gene risk prediction model. The TCGA-NCICCR datasets and our clinical cohort were used to validate the performance of the prediction model. CIBERSORT and ssGSEA algorithms were used to estimate immune scores in the high- and low-risk groups.
Results: We constructed an eight-gene prognostic signature that could reliably predict the clinical outcome in training, testing, and validation cohorts. Our prognostic signature also performed distinguished areas under the ROC curve in each dataset, respectively. After stratification based on clinical characteristics such as cell-of-origin (COO), age, eastern cooperative oncology group (ECOG) performance status, international prognostic index (IPI), stage, and MYC/BCL2 expression, the difference in OS between the high- and low-risk groups was statistically significant. Next, univariate and multivariate analyses revealed that the risk score model had a significant prediction value. Finally, a nomogram was established to visualize the prediction model. Of note, we found that the low-risk group was enriched with immune cells.
Conclusion: In summary, we identified an eight-gene prognostic prediction model that can effectively predict survival outcomes of patients with DLBCL and built a nomogram to visualize the perdition model. We also explored immune alterations between high- and low-risk groups.

Entities: Chemical

Keywords: diffuse large B-cell lymphoma; immune cell infiltration; nomogram; prediction model; stratification analyses

Mesh：

Year: 2022 PMID： 35498426 PMCID： PMC9048048 DOI： 10.3389/fendo.2022.846357

Source DB: PubMed Journal: Front Endocrinol (Lausanne) ISSN： 1664-2392 Impact factor: 6.055

Introduction

Lymphoma is the fourth most common cancer and the sixth leading cause of cancer death in the United States (1). Diffuse large B-cell lymphoma (DLBCL) accounts for approximately one-third of all lymphoma cases worldwide (2–4). In the current World Health Organization (WHO) lymphoma classification, about 80% of DLBCL cases are designated as not otherwise specified (NOS) (2). Three molecularly distinct forms of DLBCL have been identified by gene expression patterns, specifically an activated B cell-like (ABC) and germinal center B-cell-like (GCB) types and a small amount were unclassified DLBCL (UC) (5–7). Lymphomas with rearrangements of MYC with BCL2 and/or BCL6 are called “double-hit lymphomas”(DHL) or “triple-hit lymphomas”(THL) (8). There also exist one subtype called “double-expressor lymphomas” (DELs), defined as co-expression of MYC and BCL2 (9). DLBCL comprises a heterogeneous group with pathophysiological, genetic and clinical features (4). Albeit significant efforts have been made to better understand lymphomas, the overall survival (OS) of DLBCL patients remains dismal (5, 10). Accordingly, developing novel prognostic biomarkers for early risk stratification and treatment optimization is imperative. It is well established that clinical prognosis systems for DLBCL, including the rituximab international prognostic index (IPI), age-adjusted IPI, and NCCN-IPI, use clinical factors for risk stratification of patients (4). Although IPI is easy to apply during clinical practice, it does not fully account for disease heterogeneity (11, 12). An increasing body of evidence suggests that patients with the ABC disease subtype have significantly poorer outcomes with standard up-front rituximab-containing chemoimmunotherapy than patients with GCB disease (13). A survival-related gene prognostic model, in combination with other prognostic indicators such as IPI and cell of origin (COO), might improve our assessment of patient prognosis for individualized treatment. It is widely acknowledged that the tumor microenvironment (TME) of patients with lymphoma comprises endothelial cells, fibroblasts, adipocytes, and immune cells and is a key factor for tumor initiation and metastasis (14, 15). Several studies have focused on the potential role of the TME, especially the immune status in DLBCL pathogenesis (16, 17). Therefore, it is critical to better characterize the TME to develop the treatments for DLBCL patients (18). Thanks to high-throughput genome sequencing technique, there had been several studies exploring potential prognostic biomarkers of DLBCL patients on genomic level. Xie et al. for example, investigated the prognostic value of m6A regulators and established an m6A-based prognostic gene signature for DLBCL (19). Feng et al. constructed a 14-gene prognostic signature deriving from immune-related genes for 216 DLBCL patients (20). Luo et al. identified the aging-related genes associated with prognostic value in DLBCL patients (21). In this study, we integrated the transcriptome data from the Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA), and our clinical cohort and constructed an eight-gene signature-based prediction model. Furthermore, we explored the immune alterations in high- and low-risk groups.

Methods and Materials

Collection of Clinical DLBCL Specimens

DLBCL specimens were obtained through biopsy in the First Affiliated Hospital of Zhengzhou University and frozen at −80°C for storage. All participants provided written informed consent for the use of their specimens in this study. Clinicopathological features of 45 DLBCL patients from the First Affiliated Hospital of Zhengzhou University are performed in . The study protocol was approved by the ethics committee of the First Affiliated Hospital of Zhengzhou University (ethics number 2021-KY-0835-001).

Selection of DLBCL Gene Expression Datasets

We systematically explored publicly available DLBCL gene expression datasets with corresponding clinical information of patients from the GEO (https://www.ncbi.nlm.nih.gov/geo/) and TCGA (https://portal.gdc.cancer.gov/) databases. For this study, we gathered a total of 2,335 patients with DLBCL from four cohorts, including GSE117556 (n = 928), GSE10846 (n = 420), GSE31312 (n = 498), and TCGA-NCICCR (n = 489) ( ) and our clinical cohort (n = 45). Patients with incomplete transcriptomic data and clinical data were excluded. Ultimately, we included 928 patients from GSE117556, 414 from GSE10846, 470 from GSE31312, and 234 from TCGA-NCICCR. The GSE10846 and GSE31312 datasets used the GPL570 platform, while the GSE117556 dataset used the GPL14951 platform.

Table 1

Characteristics of the included datasets.

Dataset ID	Country	Number of samples	GPL ID	Number of rows per platform
GSE117556	UK	928	GPL14951	29,377
GSE10846	USA	420	GPL570	54,675
GSE31312	USA	498	GPL570	54,675
TCGA-NCICCR	USA	489	NA	56,753
Clinical cohort	China	45	NA	NA

GSE, Gene Expression Omnibus Series; GPL, Gene Expression Omnibus Platform; UK, United Kingdom; USA, United States of America; NA, Not Available.

Characteristics of the included datasets. GSE, Gene Expression Omnibus Series; GPL, Gene Expression Omnibus Platform; UK, United Kingdom; USA, United States of America; NA, Not Available.

Construction and Validation of Prediction Model

In this study, we used univariate Cox analysis to screen genes related to the OS of DLBCL patients in the GSE117556, GSE10846, and GSE31312 datasets. Genes with hazard ratio (HR) >1 and HR <1 were defined as the risk and protective genes. A p-value <0.01 was the cutoff point. The risk genes and protective genes shared by the three datasets were intersected and combined with Least Absolute Shrinkage and Selection Operator (LASSO) regression and multivariate Cox regression to build the final gene risk prediction model (22). We conducted univariate and multivariate Cox regression analyses using the R package “survival.” Another R package “glmnet” was used for the LASSO Cox regression analysis (23). The risk score (RS) of each sample was calculated by multivariate Cox regression analysis. The correlation analysis was based on the R package “corrplot,” and the forest plots for univariate and multivariate analyses were constructed by R package “forestplot.”

TME Characterization Analysis

We employed two algorithms to assess immune infiltration in DLBCL. CIBERSORT (http://cibersort.stanford.edu/) algorithm was used to obtain the proportion of 22 immune cell types with a threshold of p < 0.05 (24). We applied Single-Sample Gene Set Enrichment Analysis (ssGSEA) to assess the infiltration level of 28 different immune cells in DLBCL expression profile data by the “GSVA” package (25). The R package “ggpubr” was used to visualize differences in the distributions of immune-related cells in the low- and high-risk patient groups from the overall cohort. ”***,” “**,” “*,” and “ns” indicate p < 0.001, p < 0.01, p < 0.05, and not significant, respectively, for the Kruskal–Wallis test.

Quantitative Real-Time Polymerase Chain Reaction

Total RNA was isolated from the specimens harvested using TRIzol reagent (Invitrogen Corporation, Carlsbad, CA, # A33250). The Prime Script RT reagent kit with genomic DNA eraser (TaKaRa, Tokyo, Japan, #RR037A) was used to synthesize complementary DNA (cDNA). Quantitative real-time polymerase chain reaction (qRT-PCR) analyses were detected by SYBR Green Master Mix (TaKaRa), and the primers for qRT-PCR analyses are listed in . The 2−ΔCT method was utilized to calculate the relative mRNA expression of each gene.

Statistical Analysis

Statistical analyses were performed with R (version 3.6.3). The Kaplan–Meier method was used to assess the differences in survival time, and the log-rank test was used to determine the statistical significance. Time-dependent receiver operating characteristic (ROC) curve analysis was used to measure the prognostic performance by comparing the areas under curves (AUC). The nomogram was plotted using the “rms” package (26). The difference in immune infiltration levels between high- and low-risk groups was calculated by Kruskal–Wallis test, and a p-value < 0.05 was statistically significant.

Results

Identification and Validation of a Prognostic Signature

To identify the prognostic signature of DLBCL, we performed a multiple-step analysis ( ). We first screened the GEO database and selected three datasets for univariate Cox proportional hazards regression to identify candidate genes significantly related to OS. We conducted univariate Cox analysis in the GSE117556, GSE10846, and GSE31312 datasets with p < 0.01 as the cutoff value. In total, 1,426, 1,904, and 1,788 candidate protective genes (with hazard ratios (HR) <1) and 890, 2,958, and 2,525 candidate risk genes (with HR >1) were identified in GSE117556, GSE10846, and GSE31312, respectively. A total of 11 genes were candidate protective genes after intersecting the candidate protective genes ( ). Similarly, after matching the candidate risk genes identified in the three datasets, 13 common genes are retained ( ).

Figure 1

Multistep analysis of the study.

Figure 2

Construction and validation of prediction model. (A, B) Venn diagram shows the common protective and risk genes identified by three datasets. (C–E) Kaplan–Meier analysis for the eight-gene signature in the GSE117556, TCGA-NCICCR, and our clinical cohort, respectively. (F–H) Time-dependent ROC curve analysis of 1, 3, and 5 years in GSE117556, TCGA-NCICCR, and our clinical cohort.

Multistep analysis of the study. Construction and validation of prediction model. (A, B) Venn diagram shows the common protective and risk genes identified by three datasets. (C–E) Kaplan–Meier analysis for the eight-gene signature in the GSE117556, TCGA-NCICCR, and our clinical cohort, respectively. (F–H) Time-dependent ROC curve analysis of 1, 3, and 5 years in GSE117556, TCGA-NCICCR, and our clinical cohort. The detailed information on these genes is listed in . Next, GSE117556 was assigned as the training dataset, and the LASSO was applied to screen the candidate genes, yielding eight genes ( ). Ultimately, the HRs of the eight genes were acquired by conducting multivariate Cox regression analysis. The forest plot ( ) showed that HK2, GAB1, GRPEL1, RCSD1, PLAC8L1, and RASAL1 were risk factors (HR >1), and CAPG and PDPN were protective factors (HR <1) for OS. Finally, the following risk score model was established: risk score = 0.271 × HK2 expression + 0.182 × GAB1 expression + 0.172 × RASAL1 expression − 0.254 × CAPG expression − 0.358 × PDPN expression + 0.362 × GRPEL1 expression + 0.370 × RCSD1 expression + 0.177 × PLAC8L1 expression. Each patient was assigned a risk score with the prognostic model. According to the median risk score, patients in the GSE117556 dataset were stratified into high-and low-risk groups. Of the eight-gene prognostic signature, HK2 have the most significant correlation with worse survival. Through analyzing the RT-PCR results deriving from our clinical specimens, We found HK2 had high expressed in high stage DLBCL ( ). To further explore the function of HK2, we first identified genes that correlated with HK2 in the GSE117556 dataset ( ). We then utilized positive and negative correlated genes to perform GO and KEGG enrichment analyses. Results revealed that positive correlated genes were involved in cell cycle ( ) and negative correlated genes were involved in immune response ( ). These results partly reflect genomic differences between high- and low-risk groups.

Table 2

Detail information of selected common genes in three datasets.

Gene symbol	Gene name	Datasets (HR/p-value)
Gene symbol	Gene name	GSE117556	GSE31312	GSE10864
COL1A1	Collagen type I alpha 1 chain	0.763 (p < 0.001)	0.043 (p = 0.009)	0.829 (p = 0.004)
ST6GALNAC5	ST6N-Acetylgalactosaminide alpha-2,6-sialyltransferase 5	0.727 (p < 0.001)	0.184 (p = 0.009)	0.796 (p < 0.001)
CAPG	Capping actin protein, gelsolin like	0.728 (p < 0.001)	0.145 (p < 0.001)	0.793 (p < 0.001)
LRRC15	Leucine-rich repeat containing 15	0.656 (p < 0.001)	0.166 (p = 0.001)	0.790 (p < 0.001)
PDPN	Podoplanin	0.656 (p < 0.001)	0.214 (p = 0.005)	0.841 (p < 0.001)
NEK6	NIMA-related kinase 6	0.610 (p < 0.001)	0.205 (p = 0.004)	0.660 (p < 0.001)
PTPN14	Protein tyrosine phosphatase nonreceptor type 14	0.738 (p = 0.002)	0.183 (p = 0.002)	0.813 (p < 0.001)
LOX	Lysyl oxidase	0.656 (p = 0.008)	0.161 (p = 0.001)	0.862 (p < 0.001)
RBP5	Retinol-binding protein 5	0.754 (p = 0.006)	0.215 (p = 0.001)	0.870 (p = 0.005)
NRP2	Neuropilin 2	0.747 (p = 0.002)	0.237 (p = 0.003)	0.773 (p = 0.003)
DST	Dystonin	0.792 (p = 0.005)	0.042 (p < 0.001)	0.748 (p = 0.001)
MSL1	MSL complex subunit 1	1.592 (p = 0.004)	12.721 (p < 0.001)	1.459 (p = 0.002)
GRPEL1	GrpE-like 1, mitochondrial	1.797 (p < 0.001)	4.534 (p = 0.008)	1.877 (p < 0.001)
RCSD1	RCSD domain-containing 1	1.807 (p < 0.001)	8.005 (p < 0.001)	1.300 (p < 0.001)
PLAC8L1	PLAC8-like 1	1.277 (p = 0.004)	5.033 (p = 0.001)	1.449 (p < 0.001)
PRC1	Protein regulator of cytokinesis 1	1.745 (p = 0.001)	3.910 (p = 0.005)	1.330 (p = 0.001)
RASAL1	RAS protein activator-like 1	1.368 (p < 0.001)	3.056 (p = 0.007)	1.220 (p = 0.001)
LARS	Leucyl-TRNA synthetase 1	2.088 (p = 0.001)	5.419 (p = 0.007)	1.599 (p < 0.001)
SNHG7	Small nucleolar RNA host gene 7	1.238 (p = 0.006)	4.498 (p = 0.002)	1.439 (p < 0.001)
WDR12	WD repeat domain 12	1.812 (p = 0.001)	8.049 (p = 0.001)	1.526 (p = 0.003)
PI4K2B	Phosphatidylinositol 4-kinase type 2 beta	1.637 (p = 0.004)	3.868 (p = 0.004)	1.537 (p = 0.006)
MMACHC	Metabolism of cobalamin-associated C	1.430 (p < 0.001)	5.033 (p = 0.009)	1.770 (p < 0.001)
HK2	Hexokinase 2	1.320 (p = 0.004)	4.455 (p < 0.001)	1.438 (p < 0.001)
GAB1	GRB2-associated binding protein 1	1.277 (p = 0.008)	2.634 (p = 0.007)	1.235 (p = 0.008)

Detail information of selected common genes in three datasets. To further verify the predicting model, we analyzed the survival of high- and low-risk groups. Patients in the low-risk group demonstrated a longer survival time than those in the high-risk group ( ). Consistently, the Kaplan–Meier curve indicated that patients in the high-risk group had significantly worse prognoses than low-risk patients (log-rank test p < 0.001) ( ). We further validated the risk score model in the testing dataset GSE 31312 (log-rank test p < 0.001) ( ) and GSE 10864 (log-rank test p < 0.001) ( ), TCGA-NCICCR dataset (log-rank test p = 0.0052) ( ), and our clinical specimens (log-rank test p = 0.024) ( ), the results were consistent with the training dataset findings. The predictive power of the risk score model was assessed by time-dependent ROC, which yielded good performance in the above datasets. In dataset GSE117556, the AUC values for 1-, 3-, and 5-year overall survival were 0.7, 0.74, and 0.89, respectively. In TCGA-NCICCR dataset, the AUC values for 1-, 3-, and 5-year overall survival predictions were 0.61, 0.60, and 0.62, respectively. Our clinical cohort yielded AUC values of 0.74, 0.79, and 0.78 for the 1-, 3-, and 5-year overall survival, respectively ( ). In dataset GSE31312, the AUC values for 1-, 3-, and 5-year overall survival were 0.64, 0.7, and 0.74, respectively. In dataset GSE10864, the AUC values for 1-year overall survival were 0.56 ( ). The above AUC curves provided an objective validation of the clinical application value of our model.

Validation of the Accuracy of the Risk Score Model

We used GSE117556 as the training dataset to detect the correlation and interdependence between the eight risk genes, which proved that our prognostic signature could minimize data bias caused by gene collinearity ( ). Using the median risk score as the cutoff point, the training set was classified into low- and high-risk groups. Our results suggest that the risk score model has significant value for evaluating characteristic genes. The eight genes were differentially expressed between the two groups, indicating their role in contributing to the prognosis of DLBCL ( ). Analysis of validation dataset TCGA-NCICCR yielded consistent results ( ).

Figure 3

Gene expression in high- and low-risk group. (A) Corrplot shows correlation of eight genes in GSE117556 dataset. (B) Heatmap shows gene expression of eight genes and clinical parameters in high- and low-risk group in GSE117556 dataset. (C) Corrplot shows correlation of eight genes in TCGA-NCICCR dataset. (D) Heatmap shows gene expression of eight genes and clinical parameters in high- and low-risk groups in TCGA-NCICCR dataset.

Validation of Risk Score Model Based on Different Clinical Parameters and Subgroups

It has been established that DLBCL exhibits a significant heterogeneity in cell origin, clinical manifestations, gene expression profiles, and so on. To verify the effectiveness of the risk signature in the existing clinical subgroups, we conducted a series of subgroup analyses on dataset GSE117556. Stratification based on clinical characteristics such as COO, ECOG, MYC/BCL2 double expression, lactate dehydrogenase (LDH) value, age, stage, and gender was conducted. Among those subtypes, patients in the high-risk groups had worse survival outcomes than patients in the low-risk group (log-rank test p < 0.05; and ). These findings validated that our risk model yielded good predictive performance after stratifying for different clinicopathological characteristics.

Figure 4

KM survival stratification analyses in the GSE117556 dataset. (A) COO subgroup. (B) ECOG stage. (C) Age. (D) Gender. (E) Clinical stage.

The Landscape of Immune Cell Infiltration in the TME of DLBCL

To further explore the potential survival mechanisms related to the risk score model, mRNA data from dataset GSE117556 were first used to detect the proportion of 22 immune cell types in each sample via the CIBERSORT algorithm. As shown in , the proportion of immune cells was significantly different between the high- and low-risk score groups. Compared with the low-risk group, the high-risk group exhibited increased B-cell infiltrations, with less-naive CD4 T cell, T follicular helper cell, M0 macrophages, M1 macrophages, and other proinflammatory cells. Consistent results were obtained when ssGSEA was applied ( ). In contrast with the high-risk group, tumor-infiltrating lymphocytes (TIL), antigen-presenting cells were significantly enriched in the low-risk groups. We further applied the above two algorithms to the validation dataset TCGA-NCICCR. Similar results were obtained from GSE117556 analysis results. Immune signatures between the high- and low-risk groups were different, and the low-score group was significantly infiltrated with proinflammatory immune cells ( ). In addition, we analyzed immune-related genes between the two groups and further approved the above findings ( ).

Figure 5

Immune estimation in high- and low-risk groups. (A, B) The difference of immune infiltration in high- and low-risk groups estimated by CIBERSORT and ssGSEA in the GSE117556 datasets. (C, D) The difference of immune infiltration in high- and low-risk groups estimated by CIBERSORT and ssGSEA in TCGA-NCICCR datasets. ns, Not Significant; * P < 0.05; ** P < 0.01; *** P < 0.001; **** P < 0.0001.

Validation of the Performance of Our Prediction Model

To evaluate the performance of our risk score model on the prognosis of DLBCL patients, we integrated the clinicopathological characteristics with risk score signatures in different algorithms. As shown in , univariate Cox regression analysis demonstrated that the risk score model was a significant predictor of OS in patients with DLBCL (p < 0.0001, HR = 1.377), compared with other clinicopathological characteristics. Multivariate Cox regression analysis showed that the risk score model was an independent prognostic factor for poor prognosis (p < 0.0001, HR = 1.380) ( ). Compared with other indicators, the risk score was superior for predicting the patient prognosis (AUC = 0.820) ( ). Finally, we established a nomogram based on the above clinical parameters to predict patient prognosis quantitatively. Accordingly, our nomogram has huge prospects for clinical application for predicting the OS of individual DLBCL patients ( ).

Figure 6

Risk score is a superior biomarker for evaluating the prognosis of DLBCL. (A, B) Forest plot summary of the univariate and multivariable analyses of risk score and other clinical parameters. (C) ROC analysis of risk score and other clinical parameters. (D) Nomogram integrating the risk score and clinical parameters for predicting the probability of patient mortality at 1, 3, and 5 years of OS.

Discussion

Notwithstanding that unprecedented scientific progress has been achieved over the years, the survival of DLBCL patients remains relatively low. In this regard, the cure rate of DLBCL ranges from 40% to 60% following standard frontline immunochemotherapy (5). However, a poor prognosis has been reported for patients with refractory disease, those who relapse after salvage chemotherapy and autologous stem cell transplant or chimeric antigen receptor T-cell therapy, highlighting the need for novel therapeutic approaches (27, 28). Indeed, good prediction models, like good therapies, are best compared head-to-head in novel patient populations (29). Therefore, developing a novel prognostic model in combination with other prognostic indicators IPI and COO might be necessary to assess the patient prognosis for individualized treatment. In recent years, an increasing number of studies have been conducted to identify novel prognostic indicators. For instance, Schmitz et al. applied exome and transcriptome sequencing methods on 574 DLBCL biopsy samples to construct a new genetic subtype for DLBCL classification to guide therapy (30). Han et al. demonstrated that piRNA-30473, which promotes DLBCL progression by regulating m6A RNA methylation in DLBCL, can improve the prognostic stratification and therapeutic approach (31). With the development of next-generation sequencing, many prognostic cancer models have been established in recent years (32–34) based on public transcriptomic databases such as the GEO and TCGA datasets. The present study explored DLBCL gene expression datasets with corresponding clinical information of patients from the GEO to identify candidate genes that were significantly related to OS. After performing univariate Cox proportional hazards regression analysis, 11 protective and 13 risk genes were identified. Finally, we constructed an eight-gene prognostic signature through the LASSO method and multivariate Cox regression analysis. There had been several previous studies that were consistent and corroborated with the prognostic value of our risk score model. For example, HK2, GAB1, and RASAL1 were risk genes in our model. HK2 is known to be a key metabolic enzyme by promoting glucose uptake in cells and facilitating the Warburg effect. HK2 had been explored as a major player in helping maintain the highly malignant state in many types of cancer (35–37). Bhalla et al. also provided strong support for the direct contribution of HK2 in B-cell lymphoma development and suggested that HK2 is a key metabolic driver of the DLBCL phenotype (38). GAB1, which is widely distributed in various body tissues, is capable of promoting cell proliferation, and its expression may enhance the carcinogenesis and cancer progression (39, 40). Chang et al. clarified that RASAL1 was increased in ovarian adenocarcinoma tumorous tissues and HEY cells, which correlated with poor prognosis in ovarian adenocarcinoma patients (41). Kaplan–Meier analysis demonstrated that the risk model could predict the outcome for patients with DLBCL in training, testing, and validating datasets. Similar conclusions were reached when our clinical cohort data were applied. ROC curve analysis consistently indicated the good performance of our risk model. We then conducted correlation analysis to evaluate the collinearity among the eight prognostic genes. Importantly, we found that the correlation among these genes was low, suggesting that the regression coefficients of this model were reliable and stable (42). At present, different approaches are adopted in clinical practice to evaluate the occurrence and development of DLBCL at different levels, including gene expression patterns (ABC, GCB, UC), ECOG, IPI, DELs, LDH, age-adjusted IPI, gender, stages, and so on. Patients with the GCB subtype, for example, usually have a better prognosis than the ABC subtype (6, 43). It has been established that DELs are generally aggressive and respond poorly to currently available therapies (9, 44, 45). Moreover, IPI and age-adjusted IPI have been developed as models for predicting outcomes based on clinical factors from more than 4,000 patients (46, 47). However, despite overall improvements in DLBCL patient outcomes, 30%–40% of patients develop relapsed or refractory disease (48). In the present study, after stratification based on clinical characteristics such as gene expression patterns (ABC, GCB, UC), ECOG, IPI, DELs, LDH, age-adjusted IPI, gender, and Ann Arbor stages, the difference in OS between the high- and low-risk groups was still statistically significant. This finding suggests that our model can be combined with existing clinical parameters to reduce false positives and negatives, improve diagnostic accuracy, and provide effective treatment. An increasing body of evidence suggests that the TME affects the prognosis of DLBCL patients. Lenz et al. analyzed gene expression in 181 pretreatment biopsy specimens derived from DLBCL patients and found that the survival of patients with DLBCL was affected by immune cells, fibrosis, and angiogenesis in the tumor microenvironment (49). Mueller et al. also demonstrated that DLBCL recruited T cells and monocytes via CCL5 to support B-cell survival and proliferation (50). By immunohistochemical staining, Chang et al. showed the presence of CD1a+ dendritic cells (DCs) and increased granzyme B+ T cells within tumors was associated with a favorable prognosis (51). It has been established that M1 cells play a proinflammatory and anticancer role in the TME of DLBCL, while M2 type plays an immunosuppressive role to promote cancer progression (47, 52, 53). Herein, we used two algorithms to evaluate the TME of patients with DLBCL and found that patients in the high-risk score group exhibited significant B-cell infiltration with mild infiltration of M0, M1, CD8+ T cells, and DCs. Our results suggest the presence of an immunosuppressive TME in patients from the high-risk group leading to cancer progression. Finally, we evaluated the potential value of applying our risk score model to clinical practice. Based on the results of univariate regression analysis, multivariate regression analysis, and nomogram, our model has huge prospects for application in clinical practice. However, our study was significantly limited by its retrospective nature as DLBCL samples were from different platforms, which may be a source of sampling bias. Well-designed prospective clinical trials should be conducted in the future to highlight the role of our prediction model in DLBCL progression and metastasis.

Conclusions

In summary, this study identified an eight-gene prognostic signature that can effectively predict DLBCL patient outcomes. The eight-gene prognostic model related to TME in combination with other prognostic indicators IPI and COO might be useful to clinicians when evaluating the prognosis of patients for individualized treatment.

Data Availability Statement

The datasets GSE117556, GSE10846 and GSE31312 for this study can be found in the GEO (https://www.ncbi.nlm.nih.gov/geo).The datasets NCICCR for this study can be found at TCGA (https://portal.gdc.cancer.gov/repository).

Ethics Statement

The studies involving human participants were reviewed and approved by the ethics committee of the First Affiliated Hospital of Zhengzhou University. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

MZ, JY, and LL contributed to the design and implementation of the research. WY and JZ contributed to the analysis of the results and to the writing of the manuscript. JY, LZ, and XZ designed the figures. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

This work was supported by the National Natural Science Foundation of China (Grant no. 81500174).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

52 in total

1. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

2. Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray.

Authors: Christine P Hans; Dennis D Weisenburger; Timothy C Greiner; Randy D Gascoyne; Jan Delabie; German Ott; H Konrad Müller-Hermelink; Elias Campo; Rita M Braziel; Elaine S Jaffe; Zenggang Pan; Pedro Farinha; Lynette M Smith; Brunangelo Falini; Alison H Banham; Andreas Rosenwald; Louis M Staudt; Joseph M Connors; James O Armitage; Wing C Chan
Journal: Blood Date: 2003-09-22 Impact factor: 22.113

3. Classifying DLBCL Subtypes for Optimal Treatment.

Authors: Jennifer Crombie
Journal: Oncology (Williston Park) Date: 2019-10-28 Impact factor: 2.990

4. Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes.

Authors: Bjoern Chapuy; Chip Stewart; Andrew J Dunford; Jaegil Kim; Atanas Kamburov; Robert A Redd; Mike S Lawrence; Margaretha G M Roemer; Amy J Li; Marita Ziepert; Annette M Staiger; Jeremiah A Wala; Matthew D Ducar; Ignaty Leshchiner; Ester Rheinbay; Amaro Taylor-Weiner; Caroline A Coughlin; Julian M Hess; Chandra S Pedamallu; Dimitri Livitz; Daniel Rosebrock; Mara Rosenberg; Adam A Tracy; Heike Horn; Paul van Hummelen; Andrew L Feldman; Brian K Link; Anne J Novak; James R Cerhan; Thomas M Habermann; Reiner Siebert; Andreas Rosenwald; Aaron R Thorner; Matthew L Meyerson; Todd R Golub; Rameen Beroukhim; Gerald G Wulf; German Ott; Scott J Rodig; Stefano Monti; Donna S Neuberg; Markus Loeffler; Michael Pfreundschuh; Lorenz Trümper; Gad Getz; Margaret A Shipp
Journal: Nat Med Date: 2018-04-30 Impact factor: 53.440

5. Genetics and Pathogenesis of Diffuse Large B-Cell Lymphoma.

Authors: Roland Schmitz; George W Wright; Da Wei Huang; Calvin A Johnson; James D Phelan; James Q Wang; Sandrine Roulland; Monica Kasbekar; Ryan M Young; Arthur L Shaffer; Daniel J Hodson; Wenming Xiao; Xin Yu; Yandan Yang; Hong Zhao; Weihong Xu; Xuelu Liu; Bin Zhou; Wei Du; Wing C Chan; Elaine S Jaffe; Randy D Gascoyne; Joseph M Connors; Elias Campo; Armando Lopez-Guillermo; Andreas Rosenwald; German Ott; Jan Delabie; Lisa M Rimsza; Kevin Tay Kuang Wei; Andrew D Zelenetz; John P Leonard; Nancy L Bartlett; Bao Tran; Jyoti Shetty; Yongmei Zhao; Dan R Soppet; Stefania Pittaluga; Wyndham H Wilson; Louis M Staudt
Journal: N Engl J Med Date: 2018-04-12 Impact factor: 91.245

6. Stromal gene signatures in large-B-cell lymphomas.

Authors: G Lenz; G Wright; S S Dave; W Xiao; J Powell; H Zhao; W Xu; B Tan; N Goldschmidt; J Iqbal; J Vose; M Bast; K Fu; D D Weisenburger; T C Greiner; J O Armitage; A Kyle; L May; R D Gascoyne; J M Connors; G Troen; H Holte; S Kvaloy; D Dierickx; G Verhoef; J Delabie; E B Smeland; P Jares; A Martinez; A Lopez-Guillermo; E Montserrat; E Campo; R M Braziel; T P Miller; L M Rimsza; J R Cook; B Pohlman; J Sweetenham; R R Tubbs; R I Fisher; E Hartmann; A Rosenwald; G Ott; H-K Muller-Hermelink; D Wrench; T A Lister; E S Jaffe; W H Wilson; W C Chan; L M Staudt
Journal: N Engl J Med Date: 2008-11-27 Impact factor: 91.245

7. Robust enumeration of cell subsets from tissue expression profiles.

Authors: Aaron M Newman; Chih Long Liu; Michael R Green; Andrew J Gentles; Weiguo Feng; Yue Xu; Chuong D Hoang; Maximilian Diehn; Ash A Alizadeh
Journal: Nat Methods Date: 2015-03-30 Impact factor: 28.547

8. Expression of N⁶-methyladenosine (m⁶A) regulators correlates with immune microenvironment characteristics and predicts prognosis in diffuse large cell lymphoma (DLBCL).

Authors: Zucheng Xie; Meiwei Li; Haoyuan Hong; Qingyuan Xu; Zhendong He; Zhigang Peng
Journal: Bioengineered Date: 2021-12 Impact factor: 3.269

9. A Novel Risk Model Based on Lipid Metabolism-Associated Genes Predicts Prognosis and Indicates Immune Microenvironment in Breast Cancer.

Authors: Zhimin Ye; Shengmei Zou; Zhiyuan Niu; Zhijie Xu; Yongbin Hu
Journal: Front Cell Dev Biol Date: 2021-06-14

Review 10. The Tumor Microenvironment of DLBCL in the Computational Era.

Authors: Giuseppina Opinto; Maria Carmela Vegliante; Antonio Negri; Tetiana Skrypets; Giacomo Loseto; Stefano Aldo Pileri; Attilio Guarini; Sabino Ciavarella
Journal: Front Oncol Date: 2020-03-31 Impact factor: 6.244