Literature DB >> 26604798

Novel prognostic genes of diffuse large B-cell lymphoma revealed by survival analysis of gene expression data.

Chenglong Li1, Biao Zhu1, Jiao Chen1, Xiaobing Huang1.   

Abstract

OBJECTIVE: This study aimed to identify prognostic genes for diffuse large B-cell lymphoma (DLBCL), using bioinformatic methods.
METHODS: Five gene expression data sets were downloaded from the Gene Expression Omnibus database. Significance analysis of microarrays algorithm was used to identify differentially expressed genes (DEGs) from two data sets. Functional enrichment analysis was performed for the DEGs with the Database for Annotation, Visualization and Integration Discovery (DAVID). Survival analysis was performed with the Kaplan-Meier method using function survfit from package survival of R for the other three data sets. Cox univariate regression analysis was used to further screen out prognostic genes.
RESULTS: Thirty-one common DEGs were identified in the two data sets, mainly enriched in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway. Combined with 47 DLBCL-related genes acquired by literature retrieval, a total of 78 potential prognostic genes were obtained. Cases from the other three data sets were used in hierarchical clustering, and the 78 genes could cluster them into several subtypes with significant differences in survival curves. Cox univariate regression analysis revealed 45, 33, and eleven prognostic genes in the three data sets, respectively. Five common prognostic genes were revealed, including LCP2, TNFRSF9, FUT8, IRF4, and TLE1, among which LCP2, FUT8, and TLE1 were novel prognostic genes.
CONCLUSION: Five prognostic genes of DLBCL were identified in this study. They could not only be used for molecular subtyping of DLBCL but also be potential targets for treatment.

Entities:  

Keywords:  differentially expressed genes; diffuse large B-cell lymphoma; gene expression profile; subtype; survival analysis

Year:  2015        PMID: 26604798      PMCID: PMC4655963          DOI: 10.2147/OTT.S90057

Source DB:  PubMed          Journal:  Onco Targets Ther        ISSN: 1178-6930            Impact factor:   4.147


Introduction

Diffuse large B-cell lymphoma (DLBCL) is one of the most common types of non-Hodgkin lymphoma, which occurs primarily in older individuals. It is an aggressive tumor. R-CHOP, an improved form of cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP) with the addition of rituximab, is a standard treatment for DLBCL. Many subtypes of the lymphoid neoplasms are established based on the World Health Organization classification system, and DLBCL is the most common type in Asians.1 However, classification merely based on morphology and clinical information is difficult and thus a considerable percentage of cases are not classified. Gene expression profiling studies have attempted to distinguish heterogeneous groups of DLBCL from each other.2–4 For instance, by gene expression profile, two groupings of germinal center B-cell-like and the activated B-cell-like were identified as two DLBCL subtypes in the current World Health Organization classification.5 The study by Lenz et al6 provides genetic evidence that the DLBCL subtypes are distinct diseases that use different oncogenic pathways. Obviously, DNA microarrays provide a better understanding of the biology of DLBCL and advance the development of novel diagnostic tools.7 Meanwhile, many genes with prognostic effect have been reported in DLBCL, such as BCL28 and BCL6.9 Hu et al10 suggested that MYC/BCL2 coexpression, rather than cell-of-origin classification, is a better predictor of prognosis in patients with DLBCL treated with R-CHOP. Additionally, Gratzinger et al11 reported the prognostic value of vascular endothelial growth factor and vascular endothelial growth factor receptors in DLBCL patients treated with anthracycline-based chemotherapy. Besides, Hussain et al12 found that X-linked inhibitor of apoptosis expression is a poor prognostic factor for DLBCL. Due to the heterogeneity of DLBCL, more works are necessary to advance molecular subtyping as well as to discover the prognostic genes. In this study, two gene expression data sets were analyzed to identify differentially expressed genes (DEGs), which were regarded as potential prognostic genes for DLBCL, and to ascertain whether these genes would be used to well distinguish the subtypes of DLBCL in other three expression profile data sets.

Methods

Gene expression data

All the five gene expression data sets were downloaded from the Gene Expression Omnibus. The data set of GSE3291813,14 collected gene expression profiles of 172 DLBCL samples. The platform of Illumina GPL8432 (Illumina HumanRef-8 WG-DASL v3.0) was used. It included a total of 294 sequencing data since some samples were sequenced repeatedly. The data set of GSE1084615,16 included gene expression profiles of 181 clinical samples from chemotherapy-treated patients and 233 clinical samples from rituximab–chemotherapy-treated patients. The platform was Affymetrix GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). A total of 416 gene expression data were included. The data set of GSE113186 consisted of gene expression profiles of 203 DLBCL samples, based on the platform of Affymetrix GPL570. The data set of GSE932717 collected gene expression profiles of 36 DLBCL samples and eight reactive lymph nodes samples, which were used as controls. The platform of GPL6011 (CNIO Human Oncochip 1.0, 1.2, and 2.0) was used. The data set of GSE3088118 contained gene expression profiles of 23 DLBCL samples and ten healthy controls, in order to investigate the changes in NF-κB pathway activation. The platform was Affymetrix GPL3738 (Affymetrix Canine Genome 2.0 Array).

Pretreatment of raw data

Probes were mapped to genes according to the annotation files. For a gene corresponding to more than one probe, the average probe value was calculated as the gene expression value for the specific gene.19 Subsequently, log2 conversion and quantile normalization20 were applied on the data. A total of 4,356 and 16,454 unique genes were identified in GSE9327 and GSE30881, respectively. Both GSE10846 and GSE11318 were obtained using GPL570, and a total of 20,693 unique genes were acquired. Besides, 18,403 unique genes were identified in GSE32918.

Clinical information

The expression profiles of GSE10846 and GSE11318 provided clinical information such as age, sex, stage, lactate dehydrogenase (LDH) level, extranodal versus nodal presentation, treatment, subtype, survival time, and survival status. GSE32918 described age, sex, treatment, subtype, survival time, and survival status. According to these three data sets, we found that “stage” could well separate samples into different groups with diverse survival time while “age”, “sex”, and “treatment” could not.

Screening of DEGs

Significance analysis of microarrays algorithm21 was adopted to screen out DEGs. It can reduce the false-positive rate in multiple testing via controlling false discovery rate. Relative difference (statistic d) is calculated as follows: Statistic d measures the relative differences in gene expression levels, and it is the corrected t. represents the average expression level of a gene under certain state, represents the average expression level of a gene under another state, and s represents the variance of a gene. Adjusted P-value <0.05 and log |fold change| >1.5 were set as the threshold to select the DEGs.

Functional enrichment analysis

Gene ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were performed for the DEGs with DAVID22 to examine the potential altered functions and pathways of these DEGs. False discovery rate <0.05 was set as the cutoff.

Survival analysis

Kaplan–Meier method (K–M method; product-limit method) is suitable for analysis with small sample size. The analysis procedure is as follows: 1) Put the samples in ascending order according to the survival time, rank i=1, 2, …, n. 2) List the number of surviving at the beginning of each time point (in fact, a short time). 3) Calculate the probability of death at each time point q and survival probability p (p=1−q). 4) Calculate the survival rate S(ti) for each time point, which equals to the product of each survival probability from the starting point to ti. S(ti)=p1×p2×p3 … p. Finally, plot survival curves with survival time in abscissa and survival rate in ordinate. Survival analysis was performed with function survfit from package survival of R.23 Difference in survival curves for two groups was analyzed with log-rank method using function survdiff from package survival.24

Screening of risk factors

Cox univariate regression analysis was carried out using function coxph from package survival to screen out risk factors related to survival.25 The formula is as follows: h0(t) is the basic risk function, the risk function when all covariates X1, X2, …, Xm are 0 or under standard conditions, and it is generally unknown. h (t, x) represents the risk function when each covariate X is given a fixed value, and it is proportional to h0(t). Therefore, the model is also known as the proportional hazard model. X1, X2, …, Xm are covariates while β1, β2, …, βm are regression coefficients. When the regression coefficient β>0, that is, the risk ratio >1, it indicates that the covariate is a risk factor. The greater the covariate is, the shorter the survival time is. When the regression coefficient β<0, that is, the risk ratio <1, it indicates that the covariate is a protective factor, so the greater the covariate is, the longer the survival time is.

Results

Differentially expressed genes and enriched biological functions

According to the aforementioned criteria, a total of 437 DEGs were identified in DLBCL from the data set GSE9327 and 1,457 DEGs from the data set GSE30881. Thirty-one overlapping genes were selected out and functional enrichment analysis was performed for these genes, which are mainly involved in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway (Figure 1), suggesting that the 31 DEGs were closely associated with the development of DLBCL.
Figure 1

Functional enrichment analysis result for the 31 differentially expressed genes (DEGs) (top 20 gene ontology [GO] terms ranked by the significance).

Notes: X-axis represents the adjusted P-value transformed by log2, and Y-axis denotes the enriched GO terms.

Abbreviation: IL, interleukin.

Moreover, 47 DLBCL-related genes were acquired via literature retrieval.2,15,26–31

Survival analysis result

The 31 DEGs and 47 DLBCL-related genes were combined and a total of 78 potential prognostic genes were obtained, which were used to classify samples with diverse survival time from other three data sets. In the data set of GSE10846, 71 out of the 78 genes were detected. Using hierarchical clustering, the 71 genes could well cluster the 416 DLBCL samples into four subtypes (Figure 2A). The differences in survival curves of the four subtypes were found to be significant (P=7.65e−11; Figure 2B).
Figure 2

Subtyping of diffuse large B-cell lymphoma (DLBCL) in three gene data sets using the 78 predicted and curated DLBCL-related genes.

Notes: (A, C, and E) Hierarchical clustering that denotes the subtypes of DLBCL clustered by the 78 genes in the gene data sets of GSE10846, GSE11318, and GSE32918, respectively; (B, D, and F) Kaplan–Meier survival curves of the subtypes in the gene data sets of GSE10846, GSE11318, and GSE32918, respectively.

In the data set of GSE11318, 71 out of the 78 genes were detected. Using hierarchical clustering, the 71 genes could well classify the 203 DLBCL samples into three subtypes (Figure 2C). The difference in survival curves of the three subtypes was found to be significant (P=7.5e−05; Figure 2D). In the data set of GSE32918, 69 out of the 78 genes were detected. Some samples were sequenced repeatedly, and thus average expression levels were calculated as the final values. Using hierarchical clustering, the 69 genes could cluster the 172 DLBCL samples into three subtypes (Figure 2E). The difference in survival curves of the three subtypes was found to be significant (P=0.013; Figure 2F).

Prognostic genes

The correlation between each gene and the survival of DLBCL patients was calculated with Cox univariate regression analysis to further screen out genes with prognostic value. In the data set of GSE10846, 45 genes were found to have significant prognostic effect, while in GSE11318, 33 genes had prognostic effect, and in GSE32918, eleven genes showed prognostic value. Five prognostic genes were common among the three data sets (Figure 3; Table 1). According to the coefficient, lymphocyte cytosolic protein 2 (LCP2) and tumor necrosis factor receptor superfamily member 9 (TNFRSF9) might be related to poor prognosis while fucosyltransferase 8 (FUT8), interferon regulatory factor 4 (IRF4), and transducin-like enhancer of split 1 (TLE1) might bring in favorable prognosis.
Figure 3

Venn diagram of the prognostic genes from three gene expression data sets (GSE10846, GSE11318, and GSE21918).

Table 1

Five common prognostic genes

Gene namesGSE10846
GSE11318
GSE32918
P-valueCoefficientP-valueCoefficientP-valueCoefficient
FUT81.07E–050.3400270.0109860.2475430.0352470.276876
IRF40.0005750.2494170.009360.2615030.0393360.235775
LCP20.027791−0.183340.026033−0.241890.009313−0.50978
TLE10.001520.2228743.28E–050.3547510.0012120.357821
TNFRSF93.65E–08−0.387520.005578−0.246410.045842−0.23852

Discussion

In this study, five gene expression data sets were downloaded from the Gene Expression Omnibus. Thirty-one common DEGs were identified from two gene expression data sets, mainly enriching in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway, which were closely associated with the development of DLBCL. Combined with 47 DLBCL-related genes acquired by literature retrieval, 78 potential prognostic genes were obtained, which could successfully cluster the DLBCL samples from another three gene expression data sets into several subtypes with significant differences in survival. Prognostic genes were screened out via Cox univariate regression analysis, and five common genes were acquired, such as LCP2, TNFRSF9, FUT8, IRF4, and TLE1. TNFRSF932 and IRF433 are two known prognostic genes of DLBCL. TNFRSF9 is a member of the TNF-receptor superfamily that can induce proliferation in peripheral monocytes. Alizadeh et al32 indicate that expression levels of LIM domain only 2 (LMO2) and TNFRSF9 powerfully predict the overall survival in patients with DLBCL. TNFRSF9 can also serve as the target to treat DLBCL. The study by Houot et al34 demonstrates that anti-CD137 therapy has a potent antilymphoma activity in a mouse model. IRF4 belongs to the interferon regulatory factor (IRF) family of transcription factors. Salaverria et al35 report that translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. Therefore, it may be a therapeutic target of DLBCL.36 LCP2, FUT8, and TLE1 may be novel prognostic genes of DLBCL. LCP2 plays a positive role in promoting T-cell development and activation as well as mast cell and platelet function. FUT8 is an enzyme belonging to the family of fucosyltransferases. It may contribute to the malignancy of cancer cells and to their invasive and metastatic capabilities.37 Chen et al38 found that FUT8 is upregulated during epithelial–mesenchymal transition via the transactivation of β-catenin/lymphoid enhancer-binding factor (LEF)-1. Based on these instances, we speculated that FUT8 might exert a similar role in DLBCL and thus contributes to the metastasis of DLBCL. TLE1 is a multitasked transcriptional corepressor that acts through the acute myelogenous leukemia 1, Wnt, and Notch signaling pathways. Promoter CpG island hypermethylation-associated inactivation of TLE1 has been observed in DLBCL.39 Fraga et al40 further point out that TLE1 epigenetic inactivation contributes to the development of hematologic malignancies by disrupting critical differentiation and growth-suppressing pathways. However, the exact role of TLE1 in DLBCL remains to be explored. We supposed that more researches may unveil clinical applications of the three genes. Overall, five critical genes with prognostic effect were disclosed in DLBCL via bioinformatic analysis of existing gene expression data. Two out of the five genes have been reported while the other three are novel predictors. Further researches on these genes can benefit molecular subtyping and also provide potential therapeutic targets of DLBCL. A set of 31 common DEGs were identified from two gene expression data sets. Totally, 78 potential prognostic genes were suggested be used for subtyping of DLBCL. Five prognostic genes, including three novel ones, were identified in DLBCL.
  39 in total

1.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Authors:  Margaret A Shipp; Ken N Ross; Pablo Tamayo; Andrew P Weng; Jeffery L Kutok; Ricardo C T Aguiar; Michelle Gaasenbeek; Michael Angelo; Michael Reich; Geraldine S Pinkus; Tane S Ray; Margaret A Koval; Kim W Last; Andrew Norton; T Andrew Lister; Jill Mesirov; Donna S Neuberg; Eric S Lander; Jon C Aster; Todd R Golub
Journal:  Nat Med       Date:  2002-01       Impact factor: 53.440

2.  A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma.

Authors:  George Wright; Bruce Tan; Andreas Rosenwald; Elaine H Hurt; Adrian Wiestner; Louis M Staudt
Journal:  Proc Natl Acad Sci U S A       Date:  2003-08-04       Impact factor: 11.205

3.  Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling.

Authors:  Juliette J Hoefnagel; Remco Dijkman; Katia Basso; Patty M Jansen; Christian Hallermann; Rein Willemze; Cornelis P Tensen; Maarten H Vermeer
Journal:  Blood       Date:  2004-08-12       Impact factor: 22.113

4.  [Nonparametric method of estimating survival functions containing right-censored and interval-censored data].

Authors:  Yonghong Xu; Xiaohuan Gao; Zhengxi Wang
Journal:  Sheng Wu Yi Xue Gong Cheng Xue Za Zhi       Date:  2014-04

5.  Translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults.

Authors:  Itziar Salaverria; Claudia Philipp; Ilske Oschlies; Christian W Kohler; Markus Kreuz; Monika Szczepanowski; Birgit Burkhardt; Heiko Trautmann; Stefan Gesk; Miroslaw Andrusiewicz; Hilmar Berger; Miriam Fey; Lana Harder; Dirk Hasenclever; Michael Hummel; Markus Loeffler; Friederike Mahn; Idoia Martin-Guerrero; Shoji Pellissery; Christiane Pott; Michael Pfreundschuh; Alfred Reiter; Julia Richter; Maciej Rosolowski; Carsten Schwaenen; Harald Stein; Lorenz Trümper; Swen Wessendorf; Rainer Spang; Ralf Küppers; Wolfram Klapper; Reiner Siebert
Journal:  Blood       Date:  2011-04-12       Impact factor: 22.113

6.  High microvessel density determines a poor outcome in patients with diffuse large B-cell lymphoma treated with rituximab plus chemotherapy.

Authors:  Teresa M Cardesa-Salzmann; Luis Colomo; Gonzalo Gutierrez; Wing C Chan; Dennis Weisenburger; Fina Climent; Eva González-Barca; Santiago Mercadal; Leonor Arenillas; Sergio Serrano; Ray Tubbs; Jan Delabie; Randy D Gascoyne; Joseph M Connors; Jose L Mate; Lisa Rimsza; Rita Braziel; Andreas Rosenwald; Georg Lenz; George Wright; Elaine S Jaffe; Louis Staudt; Pedro Jares; Armando López-Guillermo; Elias Campo
Journal:  Haematologica       Date:  2011-05-05       Impact factor: 9.941

7.  Prognostic significance of VEGF, VEGF receptors, and microvessel density in diffuse large B cell lymphoma treated with anthracycline-based chemotherapy.

Authors:  Dita Gratzinger; Shuchun Zhao; Robert J Tibshirani; Eric D Hsi; Christine P Hans; Brad Pohlman; Martin Bast; Abraham Avigdor; Ginette Schiby; Arnon Nagler; Gerald E Byrne; Izidore S Lossos; Yasodha Natkunam
Journal:  Lab Invest       Date:  2007-11-12       Impact factor: 5.662

8.  Therapeutic effect of CD137 immunomodulation in lymphoma and its enhancement by Treg depletion.

Authors:  Roch Houot; Matthew J Goldstein; Holbrook E Kohrt; June H Myklebust; Ash A Alizadeh; Jack T Lin; Jonathan M Irish; James A Torchia; Arne Kolstad; Lieping Chen; Ronald Levy
Journal:  Blood       Date:  2009-07-29       Impact factor: 22.113

9.  Comparative gene expression profiling identifies common molecular signatures of NF-κB activation in canine and human diffuse large B cell lymphoma (DLBCL).

Authors:  Manikhandan A V Mudaliar; Ross D Haggart; Gino Miele; Grant Sellar; Karen A L Tan; John R Goodlad; Elspeth Milne; David M Vail; Ilene Kurzman; Daniel Crowther; David J Argyle
Journal:  PLoS One       Date:  2013-09-04       Impact factor: 3.240

10.  SPIB and BATF provide alternate determinants of IRF4 occupancy in diffuse large B-cell lymphoma linked to disease heterogeneity.

Authors:  Matthew A Care; Mario Cocco; Jon P Laye; Nicholas Barnes; Yuanxue Huang; Ming Wang; Sharon Barrans; Ming Du; Andrew Jack; David R Westhead; Gina M Doody; Reuben M Tooze
Journal:  Nucleic Acids Res       Date:  2014-05-29       Impact factor: 16.971

View more
  8 in total

1.  Identification of gene modules associated with survival of diffuse large B-cell lymphoma treated with CHOP-based chemotherapy.

Authors:  YongChao Gao; Bao Sun; JingLei Hu; Huan Ren; HongHao Zhou; Ling Chen; Rong Liu; Wei Zhang
Journal:  Pharmacogenomics J       Date:  2020-02-11       Impact factor: 3.550

Review 2.  TLE1 function and therapeutic potential in cancer.

Authors:  Da Yuan; Xue Yang; Zhenpeng Yuan; Yanqing Zhao; Junchao Guo
Journal:  Oncotarget       Date:  2017-02-28

3.  Identifying autophagy gene-associated module biomarkers through construction and analysis of an autophagy-mediated ceRNA‑ceRNA interaction network in colorectal cancer.

Authors:  Kun Qian; Huiying Huang; Jing Jiang; Dahua Xu; Shengnan Guo; Ying Cui; Hao Wang; Liqiang Wang; Kongning Li
Journal:  Int J Oncol       Date:  2018-06-18       Impact factor: 5.650

4.  Co-expression network analysis identifies a gene signature as a predictive biomarker for energy metabolism in osteosarcoma.

Authors:  Naiqiang Zhu; Jingyi Hou; Guiyun Ma; Shuai Guo; Chengliang Zhao; Bin Chen
Journal:  Cancer Cell Int       Date:  2020-06-22       Impact factor: 5.722

5.  High Expression of VAV Gene Family Predicts Poor Prognosis of Acute Myeloid Leukemia.

Authors:  Dan Mu; Sili Long; Ling Guo; Wenjun Liu
Journal:  Technol Cancer Res Treat       Date:  2021 Jan-Dec

6.  Identification of new progestogen-associated networks in mammalian ovulation using bioinformatics.

Authors:  Fang Yang; Meng Wang; Baoyun Zhang; Wei Xiang; Ke Zhang; Mingxin Chu; Pingqing Wang
Journal:  BMC Syst Biol       Date:  2018-04-03

7.  Improved personalized survival prediction of patients with diffuse large B-cell Lymphoma using gene expression profiling.

Authors:  Adrián Mosquera Orgueira; José Ángel Díaz Arias; Miguel Cid López; Andrés Peleteiro Raíndo; Beatriz Antelo Rodríguez; Carlos Aliste Santos; Natalia Alonso Vence; Ángeles Bendaña López; Aitor Abuín Blanco; Laura Bao Pérez; Marta Sonia González Pérez; Manuel Mateo Pérez Encinas; Máximo Francisco Fraga Rodríguez; José Luis Bello López
Journal:  BMC Cancer       Date:  2020-10-21       Impact factor: 4.430

8.  Microenvironment-related prognostic genes in esophageal cancer.

Authors:  Min-Hang Zhou; Xin-Kun Wang
Journal:  Transl Cancer Res       Date:  2020-12       Impact factor: 1.241

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.