| Literature DB >> 19455257 |
S Blenk1, J Engelmann, M Weniger, J Schultz, M Dittrich, A Rosenwald, H K Müller-Hermelink, T Müller, T Dandekar.
Abstract
Aiming to find key genes and events, we analyze a large data set on diffuse large B-cell lymphoma (DLBCL) gene-expression (248 patients, 12196 spots). Applying the loess normalization method on these raw data yields improved survival predictions, in particular for the clinical important group of patients with medium survival time. Furthermore, we identify a simplified prognosis predictor, which stratifies different risk groups similarly well as complex signatures. We identify specific, activated B cell-like (ABC) and germinal center B cell-like (GCB) distinguishing genes. These include early (e.g. CDKN3) and late (e.g. CDKN2C) cell cycle genes. Independently from previous classification by marker genes we confirm a clear binary class distinction between the ABC and GCB subgroups. An earlier suggested third entity is not supported. A key regulatory network, distinguishing marked over-expression in ABC from that in GCB, is built by: ASB13, BCL2, BCL6, BCL7A, CCND2, COL3A1, CTGF, FN1, FOXP1, IGHM, IRF4, LMO2, LRMP, MAPK10, MME, MYBL1, NEIL1 and SH3BP5. It predicts and supports the aggressive behaviour of the ABC subgroup. These results help to understand target interactions, improve subgroup diagnosis, risk prognosis as well as therapy in the ABC and GCB DLBCL subgroups.Entities:
Keywords: cancer; gene expression; immunity; prognosis; regulation
Year: 2007 PMID: 19455257 PMCID: PMC2675856
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 2Prognosis prediction applying a molecular predictor of 6 gene spots after improved normalization. Kaplan-Meier plots show large differences in the survival rate for all risk groups. They are estimated by a Cox-Regression Hazard model of the genes listed in Table 1. Normalization was improved applying the “loess” method. x-axis: time (years); y-axis: probability of survival, predicted for the risk groups “low”, “medium” and “high”.
Figure 1DLBCL splits into sub-groups independent of signatures. Optimal bipartitions of patients are calculated by ISIS based on optimal bipartition subsets of genes (50). Every column of the x-axis represents a patient. On the bottom, the DLBCL-type of the patient is labelled. On the y-axis every row shows the bipartitions ranked in increasing score of separation quality. The three best bipartitions show a very consistent and clear signal separating the ABC- from the GCB-patients. The unsupervised method ISIS reveals the ABC-GCB classification independent of proliferation signatures. No evidence for a previously suggested third group “Type 3” was found. Only a few patients are falsely assigned if compared to the DLBCL gene signature assignment.
Multivariate Cox regression hazard models.
| 1 | HGAL | Germ-S | ACTa1 | HLA-DRA |
| 2 | HGAL | CD54(2) | ACTa1 | HLA-DRA |
| 3 | HGAL | CD54(2) | HLA-DRA(2) | ACTa1 |
| 4 | HGAL | CD54(2) | HLA-DRA(3) | ACTa1 |
| 5 | HGAL | ACTa1 | HLA-DRA | CD54 |
| 6 | HGAL | MHCIIDQa1 | CD54(2) | ACTa1 |
| 7 | HGAL | CD54(2) | MHCIIDRb | ACTa1 |
| 8 | HGAL | Germ-S | MHCIIDRb | ACTa1 |
| 9 | HGAL | Germ-S | HLA-DRA(2) | ACTa1 |
| 10 | HGAL | Germ-S | HLA-DRA(3) | ACTa1 |
A heuristic search of multivariate Cox regression hazard models revealed this 10 best fitting models. All possible multivariate Cox regression hazard models of four 4 genes from 36 important genes for diffuse large B-cell lymphoma and the metabolic genes LDH, IDH and PDH were calculated and these ten gene sets fit best. Genes are abbreviated according to GenBank nomenclature.
Next best multivariate Cox regression hazard models.
| 1 | CD10 | IRF4 | HLA-DRb5 | LDH(2) |
| 2 | IRF4(2) | BCL7A | HLA-DRb5 | LDH(2) |
| 3 | MYC | IRF4(2) | HLA-DRb5 | LDH |
| 4 | MYC | IRF4(2) | HLA-DQa1 | LDH |
| 5 | PLAU | IRF4 | BCL7A | HLA-DRb5 |
| 6 | IRF4 | BCL7A | HLA-DRb5 | LDH(2) |
| 7 | PLAU | IRF4(2) | BCL7A | HLA-DRb5 |
| 8 | IRF4 | BCL6 | BCL7A | HLA-DRb5 |
| 9 | CD10 | IRF4(2) | HLA-DRb5 | LDH(2) |
| 10 | MYC | IRF4(2) | HLA-DRb5 | LDH(2) |
If the genes appearing in Table S1 are removed, and the heuristic search of multivariate Cox regression hazard models is redone, these ten models are the next best fitting. The genes are represented by their GenBank abbreviation. The metabolic marker LDH from the IPI score occurs in the four best fitting models as well as in the the majority of the models.
Figure 3Early and late cell cycle genes are overrepresented in the best separating cell cycle gene set. The density plot compares the distribution of different cell cycle gene sets. x-axis: cell cycle states (from 0 to 99; complete cell cycle). y-axis: relative frequencies. Black line: density of all mapped cell cycle genes of de Lichtenberg et al (de Lichtenberg et al. 2005) in the data set. The area under this line is coloured for easier comparison. Blue line: Optimal separating subset of cell cycle genes (77 spots). Two peaks in the early and late cell cycle states show cell cycle gene expression differences between the subgroups ABC and GCB.
Genes which distinguish best between ABC and GCB according PAM analysis.
| 1 | MYBL1 |
| 2 | *Centerin |
| 3 | FOXP1 |
| 4 | LOC96597 |
| 5 | SH3BP5 |
| 6 | KIAA0864 |
| 7 | IRF4 |
| 8 | ASB13 |
| 9 | *Similar to human endogenous retrovirus-4 Clone=417048 |
| 10 | NEIL1 |
| 11 | MME |
| 12 | IGHM |
| 13 | LMO2 |
| 14 | LOC152137 |
| 15 | KIAA1039 |
| 16 | LRMP |
| 17 | FLJ123633 |
| 18 | CCND2 |
From all twelve thousand spots from the lymphoma chip, the listed genes distinguish best between ABC and GCB according to PAM analysis. The best separating genes are written on the top.
Classical lymphoma genes.
| 1 | BCL6 |
| 2 | BRAF |
| 3 | ARAF1 |
| 4 | RAF1 |
| 5 | RAS |
| 6 | MEK |
| 7 | MAP |
| 8 | HLA-DPα |
| 9 | HLA-DQα |
| 10 | HLA-DRα |
| 11 | HLA-DRβ |
| 12 | α-Actinin |
| 13 | COL3A1 |
| 14 | Connective-tissue growth factor |
| 15 | FN1 |
| 16 | KIAA0233 |
| 17 | PLAUR |
| 18 | E2IG3 |
| 19 | NPM3 |
| 20 | BMP6 |
| 21 | CASP10 |
| 22 | POU2AF1 |
| 23 | CDKN2A |
| 24 | MYC |
| 25 | BCL2 |
| 26 | FCGR2B |
| 27 | CyclinD1 |
| 28 | NFKB2 |
| 29 | PAX5 |
| 30 | BCL10 |
| 31 | CDK6 |
| 32 | DDX6 |
| 33 | BCL7A |
| 34 | CyclinD2 |
| 35 | IL-10 |
| 36 | LDH |
| 37 | IDH |
| 38 | PDH |
Lymphoma associated genes were collected from literature and were also found in the data set. Furthermore we added the metabolic enzymes “lactate dehydrogenase”(LDH), “isocitrate dehydrogenase” (IDH) and “pyruvate dehydrogenase”(PDH). The latter are represented in the data by the genes PDHB, PDHA1, IDH3A, IDH3G, IDH3B, IDH1, IDH3B, IDH3A, LDHB and LDHA.
Classical marker genes of lymphoma disease distinguish between ABC and GCB lymphoma subtype (PAM analysis; error rates for this gene set: TR:10% VAL:15.38%; F:CV:14%))
| 1 | FN1 |
| 2 | BCL6 |
| 3 | CTGF |
| 4 | BCL2 |
| 5 | MAPK10 |
| 6 | CCND2 |
| 7 | COL3A1 |
| 8 | KIAA0233 |
| 9 | BCL7A |
Lymphochip spots of known lymphoma genes.
| 19384 | MAPK10 |
| 24787 | CCND2 |
| 15914 | MAPK10 |
| 24429 | BCL6 |
| 28472 | MAPK10 |
| 19268 | BCL6 |
| 16858 | CCND2 |
| 17646 | BCL2 |
| 16789 | BCL2 |
| 19361 | COL3A1 |
| 26535 | BCL6 |
| 28859 | BCL2 |
| 24367 | BCL2 |
| 17791 | FN1 |
| 16016 | FN1 |
| 16732 | FN1 |
| 31398 | FN1 |
| 19379 | FN1 |
| 27499 | KIAA0233 |
| 24415 | BCL7A |
| 29222 | CTGF |
180 spots, which are known to deal with lymphoma were tested to distinguish between ABC and GCB subtype by PAM analysis. Successful genes are given in descending order (gene set error rate:TR:10% VAL:15.38%; F:CV:14%)
Combined classifier for lymphoma subtypes.
| 24376 | *Centerin |
| 17496 | MYBL1 |
| 28014 | MYBL1 |
| 19326 | IGHM |
| 19254 | MME |
| 33991 | FOXP1 |
| 19384 | MAPK10 |
| 19375 | FOXP1 |
| 16049 | IGHM |
| 26454 | SH3BP5 |
| 22118 | KIAA0864 |
| 24787 | CCND2 |
| 24787 | CCND2 |
| 28979 | LMO2 |
| 15914 | MAPK10 |
| 19346 | SH3BP5 |
| 15864 | MME |
| 19238 | LMO2 |
| 30263 | ASB13 |
| 19291 | MYBL1 |
| 19312 | NEIL1 |
| 25036 | FLJ12363 |
| 26385 | MME |
| 19227 | LOC96597 |
| 22122 | IRF4 |
| 16886 | LRMP |
| 24480 | KIAA1039 |
| 27378 | LRMP |
| 27379 | LRMP |
| 24729 | IRF4 |
| 27673 | LRMP |
| 19348 | *Similar to |
| 24429 | BCL6 |
| 28472 | MAPK10 |
| 26516 | *Similar clone=417048 |
| 19268 | BCL6 @Homo sapiH08 (LOC152137) Sur_clone=232 |
| 32529 | 2321 |
| 17646 | BCL2 |
The resulting gene list that distinguishes ABC and GCB if the PAM analysis is performed only on the 31 best spots merged with the well known lymphoma genes. Marked in grey are the 31 best spots from all twelve thousand spots compared. Remarkably, the two classical lymphoma marker genes MAPK10 and CCND2 reach a similar quality in distinguishing ABC and GCB as the best separating ones.
Cell cycle gene set that best distinguishes ABC and GCB subgroup. The genes are annotated by their spot ID, ensembl gene-ID and their gene name. Additionally the cell cycle states are given. The latter parameter shows a strong signal in the early and late cell cycle states compared with all available cell cycle states in the data set.
| 24927 | ENSG00000165810 | 85 | BTNL9 |
| 33929 | ENSG00000165810 | 85 | BTNL9 |
| 26913 | ENSG00000138764 | 72 | CCNG2 |
| 24750 | ENSG00000136244 | 80 | IL6 |
| 32430 | ENSG00000162783 | 56 | IER5 |
| 24491 | ENSG00000165810 | 85 | BTNL9 |
| 30172 | ENSG00000138764 | 72 | CCNG2 |
| 24930 | ENSG00000187837 | 69 | HIST1H1C |
| 24725 | ENSG00000011007 | 59 | TCEB3 |
| 24908 | ENSG00000118515 | 83 | SGK |
| 30355 | ENSG00000164330 | 84 | EBF |
| 32096 | ENSG00000164330 | 84 | EBF |
| 31931 | ENSG00000164543 | 18 | STK17A |
| 26081 | ENSG00000180447 | 80 | GAS1 |
| 19374 | ENSG00000124762 | 21 | CDKN1A |
| 24969 | ENSG00000164330 | 84 | EBF |
| 24647 | ENSG00000164330 | 84 | EBF |
| 34708 | ENSG00000118515 | 83 | SGK |
| 27774 | ENSG00000134058 | 92 | CDK7 |
| 26401 | ENSG00000118515 | 83 | SGK |
| 26725 | ENSG00000164330 | 84 | EBF |
| 28881 | ENSG00000163918 | 52 | RFC4 |
| 17786 | ENSG00000102804 | 1 | TSC22D1 |
| 24613 | ENSG00000102804 | 1 | TSC22D1 |
| 33901 | ENSG00000100644 | 2 | HIF1A |
| 27538 | ENSG00000171656 | 96 | ETV5 |
| 27952 | ENSG00000179583 | 76 | CIITA |
| 34557 | ENSG00000052841 | 2 | TTC17 |
| 30021 | ENSG00000099953 | 95 | MMP11 |
| 27704 | ENSG00000164330 | 84 | EBF |
| 26992 | ENSG00000102804 | 1 | TSC22D1 |
| 26344 | ENSG00000138764 | 72 | CCNG2 |
| 24832 | ENSG00000163918 | 52 | RFC4 |
| 26080 | ENSG00000163739 | 76 | CXCL1 |
| 33329 | ENSG00000179583 | 76 | CIITA |
| 17290 | ENSG00000134058 | 92 | CDK7 |
| 30922 | ENSG00000185658 | 5 | BRWD1 |
| 26162 | ENSG00000135541 | 91 | AHI1 |
| 34288 | ENSG00000134884 | 48 | NA |
| 33646 | ENSG00000185658 | 5 | BRWD1 |
| 26951 | ENSG00000102804 | 1 | TSC22D1 |
| 24977 | ENSG00000153936 | 92 | HS2ST1 |
| 16661 | ENSG00000123080 | 75 | CDKN2C |
| 25942 | ENSG00000145050 | 49 | ARMET |
| 22163 | ENSG00000169926 | 6 | KLF13 |
| 17405 | ENSG00000178573 | 30 | MAF |
| 27275 | ENSG00000100644 | 2 | HIF1A |
| 30415 | ENSG00000164330 | 84 | EBF |
| 34484 | ENSG00000151150 | 50 | ANK3 |
| 33221 | ENSG00000065809 | 2 | FAM107B |
| 32218 | ENSG00000179583 | 76 | CIITA |
| 29637 | ENSG00000145632 | 99 | PLK2PLK2 |
| 27939 | ENSG00000179583 | 76 | CIITA |
| 27328 | ENSG00000108984 | 44 | MAP2K6 |
| 28792 | ENSG00000099326 | 53 | ZNF42 |
| 30725 | ENSG00000175455 | 65 | CCDC14 |
| 16736 | ENSG00000136244 | 80 | IL6 |
| 30874 | ENSG00000081320 | 77 | STK17B |
| 28707 | ENSG00000123080 | 75 | CDKN2C |
| 33336 | ENSG00000175455 | 65 | CCDC14 |
| 15871 | ENSG00000168310 | 7 | IRF2 |
| 28640 | ENSG00000100526 | 0 | CDKN3 |
| 28748 | ENSG00000136244 | 80 | IL6 |
| 28430 | ENSG00000168310 | 7 | IRF2 |
| 26084 | ENSG00000128590 | 38 | DNAJB9 |
| 30859 | ENSG00000117650 | 93 | NEK2 |
| 28674 | ENSG00000138061 | 66 | CYP1B1 |
| 16127 | ENSG00000138061 | 66 | CYP1B1 |
| 24868 | ENSG00000012963 | 52 | C14orf130 |
| 30508 | ENSG00000081320 | 77 | STK17B |
| 34108 | ENSG00000169926 | 6 | KLF13 |
| 16053 | ENSG00000173757 | 83 | STAT5B |
| 16091 | ENSG00000100526 | 0 | CDKN3 |
| 33594 | ENSG00000179583 | 76 | CIITA |
| 32924 | ENSG00000185658 | 5 | BRWD1 |
| 32766 | ENSG00000135164 | 74 | DMTF1 |
| 16597 | ENSG00000109971 | 0 | HSPA8 |
The cell cycle genes, which were chosen to distinguish the ABC and the GCB group.
| ENSG00000011007 | 59 | TCEB3 |
| ENSG00000012963 | 52 | C14orf130 |
| ENSG00000052841 | 2 | TTC17 |
| ENSG00000065809 | 2 | FAM107B |
| ENSG00000081320 | 77 | STK17B |
| ENSG00000099326 | 53 | ZNF42 |
| ENSG00000099953 | 95 | MMP11 |
| ENSG00000100526 | 0 | CDKN3 |
| ENSG00000100644 | 2 | HIF1A |
| ENSG00000102804 | 1 | TSC22D1 |
| ENSG00000108984 | 44 | MAP2K6 |
| ENSG00000109971 | 0 | HSPA8 |
| ENSG00000117650 | 93 | NEK2 |
| ENSG00000118515 | 83 | SGK |
| ENSG00000123080 | 75 | CDKN2C |
| ENSG00000124762 | 21 | CDKN1A |
| ENSG00000128590 | 38 | DNAJB9 |
| ENSG00000134058 | 92 | CDK7 |
| ENSG00000134884 | 48 | NA |
| ENSG00000135164 | 74 | DMTF1 |
| ENSG00000135541 | 91 | AHI1 |
| ENSG00000136244 | 80 | IL6 |
| ENSG00000138061 | 66 | CYP1B1 |
| ENSG00000138764 | 72 | CCNG2 |
| ENSG00000145050 | 49 | ARMET |
| ENSG00000145632 | 99 | PLK2PLK2 |
| ENSG00000151150 | 50 | ANK3 |
| ENSG00000153936 | 92 | HS2ST1 |
| ENSG00000162783 | 56 | IER5 |
| ENSG00000163739 | 76 | CXCL1 |
| ENSG00000163918 | 52 | RFC4 |
| ENSG00000164330 | 84 | EBF |
| ENSG00000164543 | 18 | STK17A |
| ENSG00000165810 | 85 | BTNL9 |
| ENSG00000168310 | 7 | IRF2 |
| ENSG00000169926 | 6 | KLF13 |
| ENSG00000171656 | 96 | ETV5 |
| ENSG00000173757 | 83 | STAT5B |
| ENSG00000175455 | 65 | CCDC14 |
| ENSG00000178573 | 30 | MAF |
| ENSG00000179583 | 76 | CIITA |
| ENSG00000180447 | 80 | GAS1 |
| ENSG00000185658 | 5 | BRWD1 |
| ENSG00000187837 | 69 | HIST1H1C |
The cell cycle genes annotated by their ensembl gene-ID and their gene name. Additionally the cell cycle states are annotated. The latter parameter shows a strong signal in the early and late cell cycle states compared with all available cell cycle states in the data set.
Gene expression values of the main regulatory network distinguishing ABC and GCB.
| ASB13 | − | + |
| MYBL1 | − | + |
| MME | − | + |
| MAPK10 | − | + |
| LRMP | − | + |
| LMO2 | − | + |
| FN1 | − | + |
| CTGF | − | + |
| COL3A1 | − | + |
| BCL6 | − | + |
| BCL7A | − | + |
| NEIL1 | − | + |
| SH3BP5 | + | − |
| BCL2 | + | − |
| CCND2 | + | − |
| IRF4 | + | − |
| IGHM | + | − |
| FOXP1 | + | − |
Genes from Figure 2 and their gene expression values in the subgroups ABC and GCB are shown. The symbol “−” indicates a lower gene expression than “+”. In this network, more genes of the more aggressive ABC type have a lower gene expression than the GCB type.
Regulatory network of genes best distinguishing ABC and GCB.
| Proliferation | CCND2 | cyclin D2, regulates G1 to S transition of CDK4/CDK6; CTGF, fibroblast growth factor |
| MAPK10 | map kinase 10 | |
| MYBL1 | transcriptional activator in the proliferation of neurons, spermatogenic and B-lymphoid cells (recognition sequence: 5′YAAC(GT)G-3′) | |
| ASB13 | ankyrin repeat and sox box-containing protein 13, mediates protein-protein interactions, sox box couples suppressors of cytokine signalling and binding partners with elongin B and C complex to target them for degradation | |
| SH3BP5 | SH3 domain binding protein, targets protein-protein interaction | |
| Block of proliferation | MME | synonyms CALLA, common acute lymphocytic leukemia antigen, the synonym CD10 stresses its properties as a tumor suppressor gene |
| BCL7A | putative tumor suppressor gene in T-cell lymphoma | |
| Apoptosis | BCL2 | integral outer mitochondrial protein to block apoptosis |
| BCL6 | transcriptional repressor, necessary for germinal center formation in lymph nodes | |
| Differentiation | CTGF | fibroblast differentiation |
| FOXP1 | forkhead box P1 | |
| LMO2 | LIM domain only 2 transcription factor for hematopoetic development | |
| LAMP | expressed in lymphoid cells during development | |
| COL3A1 | collagen type III | |
| FN1 | fibronectin 1, cell adhesion | |
| NEIL1 | base excision repair | |
| Immune cell specific | IGHM | immunoglobulin heavy chain gene |
| IRF4 | interferon regulatory factor 4 |
The genes of the network in Figure 4 (suppl.) are associated to the functional categories “Proliferation”, “Block of proliferation”, “Apoptosis”, “Differentiation” and “Immune cell specific”, by their annotation. Most of them are part of the antagonists “Proliferation” and “Block of proliferation”. This indicates the complex regulation and importance of proliferation in the determination of ABC and GCB lymphomas. Classical lymphoma genes (see Table S4) known previously are given in italics.
T-test result of network genes in another data set.
| CCND2 | 6.260705e-06 | 5.56939706 |
| BCL6 | 2.490035e-02 | −2.34449786 |
| BCL2 | 1.843571e-03 | 3.43618678 |
| IRF4 | 2.082072e-07 | 6.49044833 |
| LMO2 | 3.820841e-07 | −6.66162303 |
| MAPK10 | 3.888633e-02 | −2.15403094 |
The genes from the proposed STRING-network in Figure 4 were used to apply a T-test between the ABC and the GCB group in the gene expression data of Shipp et al. The authors Wright et al. found some evidence for these DLBCL groups in there. The most obvious rejection of the null hypothesis is delivered by IRF4, LMO2, CCND2, BCL2, BCL6 and MAPK10, which are also part of the predictor of Wright et al.
List of potential Notch target transcripts.
| ENSG00000156136 | ENST00000286648 | Deoxycytidine kinase |
| ENSG00000148158 | ENST00000277244 | Sorting nexin family member 30 |
| ENSG00000179388 | ENST00000317216 | Early growth response protein 3 |
| ENSG00000198833 | ENST00000361212 | Ubiquitin-conjugating enzyme E2 J1 |
| ENSG00000198833 | ENST00000361333 | Ubiquitin-conjugating enzyme E2 J1 |
| ENSG00000065308 | ENST00000182527 | Translocation associated membrane protein 2 |
| ENSG00000170584 | ENST00000302764 | NudC domain containing protein 2 |
| ENSG00000074706 | ENST00000265198 | phosphoinositide-binding protein PIP3-E |
| ENSG00000134108 | ENST00000256496 | ADP-ribosylation factor-like 10C) |
For all genes of the Lymphochip, all available transcripts annotated in ensembl were screened for the GY, Brd and K boxes. Only these transcripts bear all three boxes, GY, Brd and K in the 3′-UTRs. They are possible candidates to be regulated by the Notch signalling pathway. Moreover, the Deoxycytidine kinase (ENSG00000156136) and the Translocation associated membrane protein 2 (ENSG00000065308) show different gene expression values between the ABC and GCB subgroups.
Optimal molecular survival predictor applying six genes.
| HLA-DPa | Major histocompatibility complex, class II, DP alpha 1 |
| HLA-DQa | Major histocompatibility complex, class II, DQ alpha1 |
| HLA-DRb5 | Major histocompatibility complex, class II, DR beta 1 |
| SEPT1 | Serologically defined breast cancer antigen NY-BR-24=Similar to DIFF6 |
| EIF2S2 | Eukaryotic translation initiation factor 2 subunit 2 |
| IDH3A | Isocitrate dehydrogenase 3 (NAD+) alpha |
The gene symbol (left side) is followed by the gene description. Three of these genes are HLA major histocompatibility complex genes (HLA).