Literature DB >> 33730841

Integrative prognostic models predict long-term survival after immunochemotherapy in chronic lymphocytic leukemia patients.

Johannes Bloehdorn¹, Julia Krzykalla², Karlheinz Holzmann³, Andreas Gerhardinger³, Billy Michael Chelliah Jebaraj¹, Jasmin Bahlo⁴, Kathryn Humphrey⁵, Eugen Tausch¹, Sandra Robrecht⁴, Daniel Mertens⁶, Christof Schneider¹, Kirsten Fischer⁴, Michael Hallek⁴, Hartmut Döhner¹, Axel Benner², Stephan Stilgenbauer⁷.

Abstract

Chemoimmunotherapy with fludarabine, cyclophosphamide and rituximab (FCR) can induce long-term remissions in patients with chronic lymphocytic leukemia. Treatment efficacy with Bruton's tyrosine kinase inhibitors was found similar to FCR in untreated chronic lymphocytic leukemia patients with a mutated immunoglobulin heavy chain variable (IGHV) gene. In order to identify patients who specifically benefit from FCR, we developed integrative models including established prognostic parameters and gene expression profiling (GEP). GEP was conducted on n=337 CLL8 trial samples, "core" probe sets were summarized on gene levels and RMA normalized. Prognostic models were built using penalized Cox proportional hazards models with the smoothly clipped absolute deviation penalty. We identified a prognostic signature of less than a dozen genes, which substituted for established prognostic factors, including TP53 and IGHV gene mutation status. Independent prognostic impact was confirmed for treatment, β2-microglobulin and del(17p) regarding overall survival and for treatment, del(11q), del(17p) and SF3B1 mutation for progression-free survival. The combination of independent prognostic and GEP variables performed equal to models including only established non-GEP variables. GEP variables showed higher prognostic accuracy for patients with long progression-free survival compared to categorical variables like the IGHV gene mutation status and reliably predicted overall survival in CLL8 and an independent cohort. GEP-based prognostic models can help to identify patients who specifically benefit from FCR treatment. The CLL8 trial is registered under EUDRACT-2004- 004938-14 and clinicaltrials gov. Identifier: NCT00281918.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 33730841 PMCID： PMC8883563 DOI： 10.3324/haematol.2020.251561

Source DB: PubMed Journal: Haematologica ISSN： 0390-6078 Impact factor: 9.941

Introduction

Chemoimmunotherapy with fludarabine, cyclophosphamide and rituximab (FCR) was defined as the standard first-line therapy for patients with chronic lymphocytic leukemia (CLL) who are eligible for intensive treatment.[1,2] There is prognostic impact of recurrent genetic alterations and NOTCH1 mutations were identified as a predictive marker for reduced benefit of FCR over FC.[3,4,5,6,7] While substantial treatment benefit has been established for FCR in distinct patient populations, [1] high efficacy of novel targeted compounds such as the Bruton's tyrosine kinase (BTK) inhibitor ibrutinib was recently reported in previously untreated patients,[8,9] and for cohorts with genetic high-risk subgroups or refractory populations.[10,11,12,13,14] However, progression-free survival (PFS) in previously untreated patients ≤70 years old with a mutated immunoglobulin heavy chain variable (IGHV) gene was similar for the treatment with BTK inhibition or FCR.[14] Therefore, identification of young and fit patients who specifically benefit from the treatment with FCR is needed to optimize long-term outcomes, in particular in the light of toxicity and cost associated with lifelong ibrutinib treatment. Additional biological characterization, such as gene expression profiling (GEP), may be helpful for further refinement of prognostic models leading to an increased prognostic accuracy and precise segregation of patients with a high treatment efficacy of FCR. Established markers mostly constitute categorical variables or consensus cut-offs, in the case of IGHV mutation status, and therefore may not fully reflect the underlying biology. In addition, established prognostic markers may loose some of their impact with novel treatments. Since such large-scale studies on randomized trials are scarce, we performed GEP on 337 baseline patient samples from the CLL8 trial and modeled different scenarios for the combined use with established prognostic factors. We identified less than a dozen genes substituting for the prognostic impact of distinct recurrent alterations for PFS and overall survival (OS). Our results provide the basis for refined prognostic models and rational treatment selection.

Methods

Patients and samples

The study was conducted on peripheral blood samples from 337 previously untreated CLL patients (Table1) collected at enrolment on the CLL8 trial, a prospective, international, multi-center trial comparing first-line treatment with FC or FCR in a 1:1 randomized fashion. Further details for the study are provided online at the ClinicalTrials.gov (CTG) homepage (www.clinicaltrials.gov#NCT00281918).[1] Ficoll density gradient centrifugation for isolation of mononuclear cells followed by an immunomagnetic tumor cell enrichment via CD19 (Midi MACS, Miltenyi Biotec®, Bergisch Gladbach, Germany) was performed on all samples. Data on genomic aberrations del(13q), trisomy 12, del(11q), del(17p) and mutation status for IGHV, TP53, SF3B1 and NOTCH1 was assessed as previously described.[5] Informed consent and ethics committee approval was obtained in accordance with the Declaration of Helsinki for all patients.

RNA isolation, quality assessment and gene expression profiling on Exon ST 1.0 arrays

Total RNA was extracted from whole cell lysate according to the Allprep DNA/RNA mini kit (Qiagen). Quality control was performed using the Agilent 2100 Bioanalyzer with the RNA 6000 Nano LabChip (Agilent Technologies). In order to ensure accuracy and reproducibility, samples with an RNA integrity number (RIN) less than 7.0 were excluded from further analysis. Samples were analyzed for mRNA expression using the Affymetrix GeneChip® Human Exon 1.0 ST Array (Affymetrix, Santa Clara, CA, USA). Further details are provided in the Online Supplementary Appendix.

Normalization of expression data

Raw Affymetrix data files were preprocessed by the robust multichip average (RMA) algorithm using the aroma.affymetrix R package (2008).[15] Within RMA normalization, background correction and quantile normalization was conducted. Aroma.affymetrix was applied to generate GEP values summarized on the exon/probe set level and on the transcript level using the ‘core’ probe set definition according to Affymetrix. ‘Core’ refers to probe sets that are supported by the most reliable evidence from RefSeq and full-length mRNA GenBank records containing complete coding sequences information. We further assessed and excluded the presence of potential batch effects induced by external factors such as time point and location of sampling as well as time point of labeling and hybridization. Quality control was further conducted with "Relative Log Expression” (RLE) and "Normalized Unscaled Standard Errors” (NUSE), where we also did not find any abnormalities indicating potential batch effects.

Statistical analyses

Data was analyzed to evaluate improvement of prognostication for PFS and OS by using GEP in addition to prognostic factors del(17p), del(11q), trisomy 12, del(13q), IGHV mutation status, SF3B1, NOTCH1, TP53 mutations, β2-microglobulin (β2-m), thymidine kinase (TK), white blood cell count (WBC), Eastern Cooperative Oncology Group (ECOG) performance status, study medication (FC or FCR), sex and age. For the following analyses, missing values in the clinical data were imputed using chained equations.[16] The algorithm imputes the missing values using a model with all other clinical variables as predictors, thus generating ’plausible’ synthetic values. As the percentage of missingness for each variable was low (maximum of 16 missing values in 337 patients), a single imputation method was adequate. Furthermore, a non-specific filtering was performed selecting the 500 genes with highest variability over all samples. The final model was built by sparsed Cox proportional hazards model using the smoothly clipped absolute deviation (SCAD) penalty.[17] The “reference model” for our analysis is a Cox proportional hazards model including variables with confirmed prognostic impact: age (continuous), sex (male or female), study medication (FC or FCR), ECOG performance status (1 or 2 vs. 0), WBC, TK and β2-m (all continuous), IGHV/ NOTCH1/ SF3B1 mutation status (all unmutated vs. mutated), del(11q), del(13q), del(17p), trisomy 12 and TP53 mutation (all present or absent). The analysis is based on updated results from the CLL8 trial.[1] Patient characteristics of the CLL8 gene expression profiling cohort. Models investigated for possible improvement of prognostication using GEP included, first: the combination of all above-mentioned confirmed prognostic variables without penalization and a subset of the GEP data selected by SCAD penalization (referred to as “fixed model”), and secondly: the combination of confirmed prognostic variables and GEP data in which all variables were equally penalized (“equally penalized model”) allowing for substitution of the confirmed prognostic variables with equally strong prognostic GEP variables. For internal validation bootstrap subsampling with 1.000 subsamples equal to 63.2% of the original sample size was used.[18] The prognostic value of the final model was evaluated on the basis of the time-dependent Brier score (as implemented in the R-package pec).[19] The Brier score was used to estimate the prediction error at a given time point. Resulting prediction error curves show the time-dependent Brier score over 60 months of follow-up and the integrated Brier score (IBS) was used to summarize prediction accuracy. For external validation the apparent error was calculated. For visualization purposes, survival curves were calculated by means of the Stone-Beran estimator[20] using symmetrical nearest neighborhoods around the lowest, the median, and the highest observed values of the prognostic variable combinations using the R-package prodlim,[21] both for OS and PFS. Statistical analysis was performed with the R environment for statistical computing, version 3.3.1, using the R packages survival, version 2.39-5, prodlim, version 1.5.7, mice, version 2.25, ncvreg, version 3.6-0, pec, version 2.4.9 and bootstrap, version 2015.2. For validation, the prognostic gene signature established on the CLL8 cohort was tested in an array-based GEP training set of an independent cohort (n=149 unsorted CLL samples from treatmentnaive [83%] and pretreated [17%] patients).[22] Unmutated IGHV was reported in 49.3% and del(17p) in 8.6% of tested samples. Further details on cohort characteristics are provided in a previous publication.[22]

Results

Gene expression profiling variables substitute established prognostic markers in multivariate models

We first established multivariate models for variables for which the prognostic impact was confirmed in previous studies and is herein referred to as the “reference model”. Results are shown in the Online Supplementary Table S1A for OS and in Online Supplementary Table S1B for PFS, respectively. In order to evaluate the impact for OS including a signature consisting of GEP variables selected in the penalized Cox model (Online Supplementary Table S2A), we tested various combinations of confirmed prognostic variables and GEP. Only model combinations including genetic markers with prognostic impact achieved prediction error estimates similar to the confirmed prognostic variables used in the “reference model” (Figure 1A). Using the “fixed model”, penalization of GEP resulted in selection of only one GEP variable (PITPNC1, phosphatidylinositol transfer protein cytoplasmic 1) and no further improvement as compared to the reference model (IBS: reference model 0.092; fixed model 0.092) (Figure 1A).

Figure 1.

Prediction error estimates for prognostic model combinations. Prediction error curves for combinations of prognostic variables in models are shown for overall survival (OS) (A) and progression-free survival (PFS) (B). Combinations of prognostic variables contain the confirmed prognostic variables, as used in the reference model (age, sex, study medication, Eastern Cooperative Oncology Group [ECOG], log white blood cells [WBC], β2-microglobulin [β2- m], log thymidine kinase [TK], IGHV mutation status, del(11q), del(13q), del(17p), trisomy 12, TP53 mutation, NOTCH1 mutation, SF3B1 mutation) and gene expression profiling (GEP) variables. Prognostic GEP variables were selected in addition to (fixed model) or instead of (equally penalized model) the confirmed prognostic variables. In a separate approach prognostic GEP variables were selected in addition to (fixed model) or instead of (equally penalized model) non-genetic prognostic variables (only age, sex, study medication, ECOG, log WBC, log TK, β2-m). GEP variables selected in the fixed or equally penalized model largely overlap with the full prognostic gene signature (Online Supplementary Table S2), which is separately used in the “GEP data only” prediction error curve. Combination of prognostic variables selected in the equally penalized model performed highly similar to the model containing only confirmed prognostic variables. Strong overlap was found for prediction error curves represented by the red and blue solid lines.

In contrast, using the “equally penalized model” on all variables from the reference model and GEP data resulted in selection of only three confirmed prognostic markers (FCR, β2-m, del(17p)) along with ten GEP variables comprising the genes CLEC2B, RGS1, LDOC1, L3MBTL4, PRKCA, FHL1, SGCE, DCLK2, VSIG1, CD72 (Online Supplementary Table S3A). When assessing the prediction accuracy, this model performed similarly as the reference model (IBS: reference model 0.092; equally penalized model 0.096) (Figure 1A). When analyzing PFS by prediction models including a signature of selected GEP variables for PFS (Online Supplementary Table S2B) with the same approach, the “fixed model” did not lead to selection of GEP variables besides the confirmed prognostic variables. Conversely, only four confirmed prognostic markers (FCR, del(11q), del(17p), SF3B1 mutation) were selected in the “equally penalized model”, together with 11 GEP variables including the genes RGS1, EIF1AY, LDOC1, L3MBTL4, DCAF12, PLD5, GTSF1L, NIPAL2, CYBRD1, ANXA1 (Online Supplementary Table S3B). Again, variables selected in the “equally penalized model” performed similar to the “reference model” as demonstrated by prediction error estimates (IBS: reference model 0.160; equally penalized model 0.166; fixed model 0.160) (Figure 1B). Of note, strong prognostic markers like TP53 and IGHV mutation status (Online Supplementary Table S1) were substituted in both models by prognostic GEP variables (Online Supplementary Table S3). For the prognostication of PFS, inclusion of GEP data alone or in addition to non-genetic variables (β2-m, TK, WBC, ECOG, study medication, sex and age) compensated for missing genetic information in patients with late disease progression (Figure 1B). In such models, GEP reliably increased prediction accuracy for patients over time as prediction error curves converged with those of the reference model. Prediction accuracy was comparable with the reference model at 60 months. The overall number of prognostic variables remained similar for either model (“reference model”: OS/PFS 15 variables vs. “equally penalized”: OS 13 and PFS 15 variables) and although chromosomal gains or losses covered multiple genes, these variables were substituted by the expression of a few genes only. Furthermore, expression variables selected along with clinical variables in the penalized models for OS and PFS were not derived from genes localized in the recurrently deleted or amplified chromosomal regions (Online Supplementary Table S3A and B).

Gene expression profiling signatures refine prognostic estimation and retain strong prognostic value in an independent cohort of unselected patients

In order to illustrate the distribution for OS and PFS within the different prediction models, conditional Kaplan- Meier estimates were generated and survival curve estimates are shown for lowest, median, and highest values of the prognostic variable combinations (Figure 2A to F).

Figure 2.

Conditional Kaplan-Meier survival estimates illustrate the distribution for overall survival and progression-free survival within the different prediction models. Kaplan-Meier estimates were generated for the lowest, the median, and the highest observed values of the prognostic variable combinations. Kaplan-Meier estimates illustrate overall survival (OS) (A, C and E) and progression-free survival (PFS) (B, D and F) with regard to the “reference model” (confirmed prognostic variables only, A and B), the “equally penalized model” (confirmed prognostic variables and GEP equally penalized, C and D) and prognostic GEP signatures only (as represented in the Online Supplementary Table S2A and B) (E and F).

GEP variables are especially suitable to predict cases with late progression, while established prognostic factors compensate in the remaining cases with early progression (Figure 1A and B). Specifically, patients with long-term PFS were more accurately identified with models using prognostic GEP signatures (Figure 2D and F) when compared with models using established prognostic variables only (Figure 2B) or single genetic characteristics. This aspect was further exemplified in a subgroup analysis for patients <60 years and those receiving FCR (Online Supplementary Figure S1A and B). In order to validate the results we tested our prognostic gene signature in an independent cohort.[22] This cohort was selected to be most heterogeneous from CLL8 to confirm the strength and independence of our prognostic score for OS (Online Supplementary Table S2A; Figure 2E and F). While the CLL8 cohort consisted of treatment-naive patients receiving FC/FCR and GEP was derived from CD19+ purified tumor cells, the validation cohort contained samples with heterogeneous tumor cell purity from both treatmentnaive and pretreated patients. The CLL8-based signature was estimated on the validation cohort and evaluated for individual performance. For comparison, we used the gene signature established for the validation cohort with respective weights as provided.[22] Notably, the CLL8-derived gene signature performed highly similar to the gene signature originally established for this dataset (Online Supplementary Figure S2).[22]

Gene expression profiling variables balance prognostic inaccuracy of established markers

GEP variables selected both for OS and PFS contained the genes RGS1 (regulator of G protein signaling 1), LDOC1 (LDOC1 regulator of NF-κB signaling) and L3MBTL4 (L3MBTL histone methyl-lysine binding protein 4). While RGS1 was homogeneously distributed across the expression range, LDOC1 and L3MBTL4 expression showed a bimodal distribution (Online Supplementary Figure S3). When evaluating expression level distributions of RGS1, LDOC1 and L3MBTL4 in relation to genetic variables, we could not identify an exclusive association with known prognostic factors (Figure 3; Online Supplementary Table S4A to D). Prediction error estimates for prognostic model combinations. Prediction error curves for combinations of prognostic variables in models are shown for overall survival (OS) (A) and progression-free survival (PFS) (B). Combinations of prognostic variables contain the confirmed prognostic variables, as used in the reference model (age, sex, study medication, Eastern Cooperative Oncology Group [ECOG], log white blood cells [WBC], β2-microglobulin [β2- m], log thymidine kinase [TK], IGHV mutation status, del(11q), del(13q), del(17p), trisomy 12, TP53 mutation, NOTCH1 mutation, SF3B1 mutation) and gene expression profiling (GEP) variables. Prognostic GEP variables were selected in addition to (fixed model) or instead of (equally penalized model) the confirmed prognostic variables. In a separate approach prognostic GEP variables were selected in addition to (fixed model) or instead of (equally penalized model) non-genetic prognostic variables (only age, sex, study medication, ECOG, log WBC, log TK, β2-m). GEP variables selected in the fixed or equally penalized model largely overlap with the full prognostic gene signature (Online Supplementary Table S2), which is separately used in the “GEP data only” prediction error curve. Combination of prognostic variables selected in the equally penalized model performed highly similar to the model containing only confirmed prognostic variables. Strong overlap was found for prediction error curves represented by the red and blue solid lines. Conditional Kaplan-Meier survival estimates illustrate the distribution for overall survival and progression-free survival within the different prediction models. Kaplan-Meier estimates were generated for the lowest, the median, and the highest observed values of the prognostic variable combinations. Kaplan-Meier estimates illustrate overall survival (OS) (A, C and E) and progression-free survival (PFS) (B, D and F) with regard to the “reference model” (confirmed prognostic variables only, A and B), the “equally penalized model” (confirmed prognostic variables and GEP equally penalized, C and D) and prognostic GEP signatures only (as represented in the Online Supplementary Table S2A and B) (E and F). In order to elucidate the biologic context from which the prognostic impact of these three genes may derive, we dichotomized patient samples regarding the upper and lower quartile of RGS1, LDOC1 and L3MBTL4 expression and assessed the differential expression of associated genes. Differentially expressed genes with a false discovery rate (FDR) of <0.01 and a fold-change (FC) of >1.5 were assessed for overlaps of the respective expression signatures (Figure 4A). Only 12 genes were overlapping between all three gene-specific comparisons (Figure 4A). Expression signatures associated with RGS1 were highly distinct from the other profiles and showed only nine of 341 genes exclusively overlapping with the LDOC1 specific signature. Conversely, 51 of 69 genes contained in the L3MBTL4 signature exclusively overlapped with the LDOC1 signature and therefore support a similar biologic context. Genes contained in different signatures showed highly correlated expression profiles (Figure 4B). LDOC1[23] and other genes overlapping for the L3MBTL4 and LDOC1 signature, such as LPL or CRY1, were previously reported as surrogate markers for the IGHV mutation status. [24,25] We specifically investigated ZAP70 in this context, since it has also been identified as a surrogate marker for the IGHV mutation status.[25,26,27] While ZAP70 had a foldchange lower than the previously set cut-off (FC>1.5), we found a highly significant (q<1x10-7) association with LDOC1 and L3MBTL4 (Figure 4C). Provided that LDOC1 and L3MBTL4 expression levels did not show an exclusive association with the IGHV mutation status (Figure 3; Online Supplementary Table S4A to D), we wondered if the combined status of these two genes may explain the observed similarities. Notably, expression of LDOC1 and L3MBTL4 was highly correlated with each other and the combination of both variables reliably identified the majority of cases with IGHV homology <98% (Figure 5). However, we observed several “discordant” cases with mutated IGHV and high expression levels of LDOC1 and L3MBTL4 or IGHV unmutated cases with low expression levels (Figure 3; Figure 5). Provided the fact that these continuous variables were selected due to the higher prognostic accuracy instead of the categorical IGHV mutation status, these markers therefore better mirror prognostic effects and the related biology of a variable sequence homology, especially in “discordant” cases.

Figure 4.

Assessment of genes showing concordant or discordant expression with (A) Venn diagram illustrating overlaps for differentially expressed genes (fold-change [FC] >1.5; false discovery rate [FDR] <0.01) between patient samples with either high or low expression (upper vs. lower quartile) for RGS1, LDOC1 and L3MBTL4. (B) Heatmap showing clustered expression pattern (Pearson correlation and average linkage) of 12 genes found in all three gene specific signatures and heatmap showing expression pattern of 51 genes found in gene specific signatures of LDOC1 and L3MBTL4. (C) Scatter plots for ZAP70 expression with regard to groups showing high and low LDOC1 and L3MBTL4 expression (upper vs. lower quartile).

Figure 5.

Combined status of The figure highlights the correlation between expression levels of LDOC1 (x-axis), L3MBTL4 (y-axis) and the immunoglobulin heavy chain variable (IGHV) gene sequence homology (color coded). Cases with IGHV sequence homology <98% are indicated in blue, cases with IGHV sequence homology ≥98% are indicated in red. LDOC1 and L3MBTL4 expression identifies “discordant” cases with mutated IGHV but poor clinical course (high expression of LDOC1 and/or L3MBTL4) and vice versa.

Discussion

In the presented study, we evaluated the significance of GEP as a means for prognostic modeling in CLL. The CLL8 study cohort provides a valid basis for this as it was designed as a large international, multi-center phase III study defining current standard treatment, with full genetic characterization and long follow-up. Importantly, CD19+ purified tumor cells were procured at enrollment allowing valid GEP analysis. While GEP was unable to improve prediction when used in addition to confirmed prognostic variables, GEP substituted for many of these variables when tested in direct comparison in the equally penalized model and reliably predicted OS and PFS, similar to models integrating only confirmed prognostic variables. Furthermore, for the prognostication of PFS, GEP was able to compensate for missing genetic information in the subgroup with late progression events. High prediction accuracy for late progression and confirmation of the independent prognostic value for previously reported high-risk markers,[4,5,28] which were selected in the equally penalized model, implies that GEP-based prognostication can primarily substitute for intermediate and lowrisk prognostic variables. However, GEP-based prognostic modeling was also able to substitute for “unmutated IGHV”, one of the most important variables with negative prognostic impact on OS and PFS.[1,6,7,28] GEP variables selected for PFS and OS in the equally penalized models were largely heterogeneous, a finding that may reflect both methodological and biological differences when modeling these endpoints. Conversely, we identified RGS1, LDOC1 and L3MBTL4 to have prognostic value both for PFS and OS. While the combined expression of LDOC1 and L3MBTL4 was highly associated with IGHV homology and therefore may be viewed as surrogate marker of the IGHV mutation status at first, one has to consider that both genes were selected in the prognostic model instead of the IGHV mutation status. This indicates that these genes and the associated biology have a considerable impact on the prognosis and not merely substitute for the IGHV mutation status. This study further demonstrates the potential of GEP to reduce biologic dimensionality. As such, chromosomal aberrations affecting a multitude of genes, also if minimally deleted regions only are considered, can be replaced by less than a dozen genes. The fact that the genes contained in the prognostic GEP scores were not located on recurrently affected chromosomal regions indicates that the deregulated expression does not derive from a mere gene dosage effect but represents a convergence of various biologic traits. Genes of the identified signatures likely constitute important elements in overactive signaling cascades impacting on the clinical course. In addition, GEP variables represent continuous variables and therefore may hold more potential to fine-tune prognostic modeling in contrast to categorical variables such as aberrations and mutations. The efficacy resulting from the addition of rituximab to FC treatment and substantial benefit for patients with distinct genetic features leading to long-term disease control and OS has been confirmed recently in a long-term followup analysis.[1] Notably, prognostic variables selected in the equally penalized model or the GEP signature estimated the clinical course of long-term PFS within this cohort better compared to the model using only genetic factors or parameters previously identified to characterize such patients.[1] Future studies will provide insight, if prognostic models including GEP also hold advantage over recently reported prognostic models using epigenetic subgrouping.[29,30,31] Patients with DNA methylation profiles reflecting memory B-cell-like CLL were reported to strongly benefit from treatment with chemoimmunotherapy on two phase II trials.[31] A major strength of our study was the possibility to exclusively use CD19+ sorted patient samples from a randomized phase III trial and extensive characterization for established prognostic variables, including availability of the TP53, SF3B1 and NOTCH1 mutation status in >95% of cases. Future comparative studies assessing the prognostic impact of methylation markers need to include a comprehensive genetic characterization since SF3B1 and NOTCH1 mutations were found to have independent prognostic and predictive impact for chemoimmunotherapy[5] and show a heterogeneous distribution within epigenetic subgroups.[29,31] In addition, the CLL8 trial design provided an ideal basis to differentiate between the prognostic and predictive value of markers and therefore to specifically assess for the prognostic strength of established and GEP variables. Notably, GEP variables selected in our model also reliably substituted for IGHV mutation status and showed strong prognostic impact irrespective of treatment for both PFS and OS in contrast to the epigenetic subgrouping.[31] Association of Boxplots showing distribution for log2 expression of genes selected for both overall survival (OS) and progressionfree survival (PFS), namely RGS1, LDOC1 and L3MBTL4. LDOC1 and L3MBTL4 show a bimodal distribution. Distribution of the three genes was not exclusively associated with distinct genetic variables. Assessment of genes showing concordant or discordant expression with (A) Venn diagram illustrating overlaps for differentially expressed genes (fold-change [FC] >1.5; false discovery rate [FDR] <0.01) between patient samples with either high or low expression (upper vs. lower quartile) for RGS1, LDOC1 and L3MBTL4. (B) Heatmap showing clustered expression pattern (Pearson correlation and average linkage) of 12 genes found in all three gene specific signatures and heatmap showing expression pattern of 51 genes found in gene specific signatures of LDOC1 and L3MBTL4. (C) Scatter plots for ZAP70 expression with regard to groups showing high and low LDOC1 and L3MBTL4 expression (upper vs. lower quartile). Combined status of The figure highlights the correlation between expression levels of LDOC1 (x-axis), L3MBTL4 (y-axis) and the immunoglobulin heavy chain variable (IGHV) gene sequence homology (color coded). Cases with IGHV sequence homology <98% are indicated in blue, cases with IGHV sequence homology ≥98% are indicated in red. LDOC1 and L3MBTL4 expression identifies “discordant” cases with mutated IGHV but poor clinical course (high expression of LDOC1 and/or L3MBTL4) and vice versa. While storage and workup conditions were found to change expression levels of multiple transcripts in an RNA sequencing-based study on healthy donor samples, prognostic GEP variables selected in our study largely represented transcripts with low reported variability.[32] Stable expression of our prognostic GEP variables selected for the respective clinical endpoints is further supported since prognostic markers unaffected by surrounding conditions (e.g., chromosomal aberrations, gene mutation status) were reliably substituted in the multivariate analysis. Validation of the prognostic impact of selected GEP variables was achieved in an independent data set differing with regard to storage conditions, workup and sorting of samples from a patient cohort with heterogeneous treatment, [22] further demonstrating the prognostic robustness of selected GEP variables. While novel compounds have revolutionized the landscape of CLL treatment in particular for high-risk patients,[10,11,12,13] the long-term benefit and treatment related toxicities still remain to be evaluated. Further, the significant economic burden may limit the access in some healthcare systems.[33] In this study, we were able to confirm that GEP variables can achieve a higher prognostic accuracy, better reflect IGHV sequence homology and reliably identify “discordant” patients with mutated IGHV but poor clinical course and vice versa. This is especially promising since treatment with BTK inhibitors and FCR was reported with similar PFS in patients with mutated IGHV.[14] Although the depth of biological characterization has reached a new dimension with the use of RNA sequencing, both array and RNA sequencing-based prognostic modeling were found to perform equally well for the prediction of major clinical endpoints.[34] Studies evaluating FCR and BTK inhibitor treatment in a randomized fashion[14] would provide an ideal basis for marker validation using RNA sequencing and easy to apply quantitative real-time polymerase chain reaction based approaches in parallel. Prognostic models used here may therefore hold promise for future selection, substitution and harmonization of prognostic markers, which show variable prognostic value within the respective treatment context.

Table 1.

Patient characteristics of the CLL8 gene expression profiling cohort.

28 in total

1. ZAP-70 compared with immunoglobulin heavy-chain gene mutation status as a predictor of disease progression in chronic lymphocytic leukemia.

Authors: Laura Z Rassenti; Lang Huynh; Tracy L Toy; Liguang Chen; Michael J Keating; John G Gribben; Donna S Neuberg; Ian W Flinn; Kanti R Rai; John C Byrd; Neil E Kay; Andrew Greaves; Arthur Weiss; Thomas J Kipps
Journal: N Engl J Med Date: 2004-08-26 Impact factor: 91.245

2. An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia.

Authors: T Herold; V Jurinovic; K H Metzeler; A-L Boulesteix; M Bergmann; T Seiler; M Mulaw; S Thoene; A Dufour; Z Pasalic; M Schmidberger; M Schmidt; S Schneider; P M Kakadia; M Feuring-Buske; J Braess; K Spiekermann; U Mansmann; W Hiddemann; C Buske; S K Bohlander
Journal: Leukemia Date: 2011-05-31 Impact factor: 11.528

3. Surrogate molecular markers for IGHV mutational status in chronic lymphocytic leukemia for predicting time to first treatment.

Authors: Fortunato Morabito; Giovanna Cutrona; Laura Mosca; Marianna D'Anca; Serena Matis; Massimo Gentile; Ernesto Vigna; Monica Colombo; Anna Grazia Recchia; Sabrina Bossio; Laura De Stefano; Francesco Maura; Martina Manzoni; Fiorella Ilariucci; Ugo Consoli; Iolanda Vincelli; Caterina Musolino; Agostino Cortelezzi; Stefano Molica; Manlio Ferrarini; Antonino Neri
Journal: Leuk Res Date: 2015-05-19 Impact factor: 3.156

4. On stability issues in deriving multivariable regression models.

Authors: Willi Sauerbrei; Anika Buchholz; Anne-Laure Boulesteix; Harald Binder
Journal: Biom J Date: 2014-12-15 Impact factor: 2.207

5. Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

Authors: Ulla B Mogensen; Hemant Ishwaran; Thomas A Gerds
Journal: J Stat Softw Date: 2012-09 Impact factor: 6.440

6. DNA methylation dynamics during B cell maturation underlie a continuum of disease phenotypes in chronic lymphocytic leukemia.

Authors: Christopher C Oakes; Marc Seifert; Yassen Assenov; Lei Gu; Martina Przekopowitz; Amy S Ruppert; Qi Wang; Charles D Imbusch; Andrius Serva; Sandra D Koser; David Brocks; Daniel B Lipka; Olga Bogatyrova; Dieter Weichenhan; Benedikt Brors; Laura Rassenti; Thomas J Kipps; Daniel Mertens; Marc Zapatka; Peter Lichter; Hartmut Döhner; Ralf Küppers; Thorsten Zenz; Stephan Stilgenbauer; John C Byrd; Christoph Plass
Journal: Nat Genet Date: 2016-01-18 Impact factor: 38.330

7. Ibrutinib plus obinutuzumab versus chlorambucil plus obinutuzumab in first-line treatment of chronic lymphocytic leukaemia (iLLUMINATE): a multicentre, randomised, open-label, phase 3 trial.

Authors: Carol Moreno; Richard Greil; Fatih Demirkan; Alessandra Tedeschi; Bertrand Anz; Loree Larratt; Martin Simkovic; Olga Samoilova; Jan Novak; Dina Ben-Yehuda; Vladimir Strugov; Devinder Gill; John G Gribben; Emily Hsu; Chih-Jian Lih; Cathy Zhou; Fong Clow; Danelle F James; Lori Styles; Ian W Flinn
Journal: Lancet Oncol Date: 2018-12-03 Impact factor: 41.316

8. Unmutated Ig V(H) genes are associated with a more aggressive form of chronic lymphocytic leukemia.

Authors: T J Hamblin; Z Davis; A Gardiner; D G Oscier; F K Stevenson
Journal: Blood Date: 1999-09-15 Impact factor: 22.113

9. An international prognostic index for patients with chronic lymphocytic leukaemia (CLL-IPI): a meta-analysis of individual patient data.

Authors:
Journal: Lancet Oncol Date: 2016-05-13 Impact factor: 41.316

10. Targeting BTK with ibrutinib in relapsed chronic lymphocytic leukemia.

Authors: John C Byrd; Richard R Furman; Steven E Coutre; Ian W Flinn; Jan A Burger; Kristie A Blum; Barbara Grant; Jeff P Sharman; Morton Coleman; William G Wierda; Jeffrey A Jones; Weiqiang Zhao; Nyla A Heerema; Amy J Johnson; Juthamas Sukbuntherng; Betty Y Chang; Fong Clow; Eric Hedrick; Joseph J Buggy; Danelle F James; Susan O'Brien
Journal: N Engl J Med Date: 2013-06-19 Impact factor: 91.245