Literature DB >> 34535016

A clinically applicable gene expression-based score predicts resistance to induction treatment in acute myeloid leukemia.

Christian Moser¹, Vindi Jurinovic^1,2, Sabine Sagebiel-Kohler³, Bianka Ksienzyk¹, Aarif M N Batcha^4,5, Annika Dufour¹, Stephanie Schneider^1,6, Maja Rothenberg-Thurley¹, Cristina M Sauerland⁷, Dennis Görlich⁷, Wolfgang E Berdel⁸, Utz Krug⁹, Ulrich Mansmann^4,5,10, Wolfgang Hiddemann^1,10, Jan Braess¹¹, Karsten Spiekermann^1,10, Philipp A Greif^1,10, Sebastian Vosberg^1,10, Klaus H Metzeler^1,10,12, Jörg Kumbrink^3,10, Tobias Herold^1,13,10.

Abstract

Prediction of resistant disease at initial diagnosis of acute myeloid leukemia (AML) can be achieved with high accuracy using cytogenetic data and 29 gene expression markers (Predictive Score 29 Medical Research Council; PS29MRC). Our aim was to establish PS29MRC as a clinically usable assay by using the widely implemented NanoString platform and further validate the classifier in a more recently treated patient cohort. Analyses were performed on 351 patients with newly diagnosed AML intensively treated within the German AML Cooperative Group registry. As a continuous variable, PS29MRC performed best in predicting induction failure in comparison with previously published risk models. The classifier was strongly associated with overall survival. We were able to establish a previously defined cutoff that allows classifier dichotomization (PS29MRCdic). PS29MRCdic significantly identified induction failure with 59% sensitivity, 77% specificity, and 72% overall accuracy (odds ratio, 4.81; P = 4.15 × 10-10). PS29MRCdic was able to improve the European Leukemia Network 2017 (ELN-2017) risk classification within every category. The median overall survival with high PS29MRCdic was 1.8 years compared with 4.3 years for low-risk patients. In multivariate analysis including ELN-2017 and clinical and genetic markers, only age and PS29MRCdic were independent predictors of refractory disease. In patients aged ≥60 years, only PS29MRCdic remained as a significant variable. In summary, we confirmed PS29MRC as a valuable classifier to identify high-risk patients with AML. Risk classification can still be refined beyond ELN-2017, and predictive classifiers might facilitate clinical trials focusing on these high-risk patients with AML.

© 2021 by The American Society of Hematology. Licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), permitting only noncommercial, nonderivative use with attribution. All other rights reserved.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34535016 PMCID： PMC8759116 DOI： 10.1182/bloodadvances.2021004814

Source DB: PubMed Journal: Blood Adv ISSN： 2473-9529

Introduction

Despite recent advances and the introduction of novel drugs, most patients with acute myeloid leukemia (AML) who are treated with curative intent receive physically demanding intensive chemotherapy consisting of cytarabine and anthracyclines.[1] The majority of these patients achieve complete remission (CR), but 20% to 40% of younger patients and 40% to 60% of older patients do not respond to the initial treatment.[1,2] About half of patients with primary refractory disease (RD) die within 6 months.[3] Because patient outcomes remain poor, even with salvage therapy followed by allogeneic stem cell transplantation, treatment of patients with RD is extremely challenging.[4] The ability to predict primary RD would prevent patients with AML from undergoing ineffective intensive treatment. Several prognostic markers for patients with AML that provide information about the overall outcome and help to guide treatment decisions are routinely used in the clinic. The 2017 European Leukemia Network (ELN-2017) guidelines classify patients into “favorable,” “intermediate,” and “adverse” risk groups.[1] With regard to the effect of a therapeutic intervention, some predictive classifiers are primarily geared toward forecasting RD. The model published by Walter et al integrates clinical information, laboratory data, and molecular genetic analysis of patients at initial diagnosis.[5] Age, cytogenetics according to the Medical Research Council (MRC), and NPM1/FLT3-ITD status were the most significant predictive covariates.[5,6] Another predictive model by Ng et al is derived from the prognostic 17-gene leukemia stem cells score (LSC17).[7] Gene expression analysis of 6 retrained response LSC17 genes was performed using the NanoString platform.[7] The retrained response LSC17 signature proved to be of predictive value.[5,7] However, these existing predictors are not refined or specific enough to be sufficient for clinical use. Therefore, more precise classifiers are necessary to guide treatment decisions and facilitate clinical trials aimed at this high-risk population of AML patients. We recently published a predictive classifier based on the analysis of cytogenetic data and 29 gene expression markers (Predictive Score 29 Medical Research Council; PS29MRC).[8] Prediction of RD at initial diagnosis of AML can be achieved with high accuracy using PS29MRC (77%). The classifier was developed using cohorts analyzed by gene expression microarrays (n = 856) and validated in a cohort measured by RNA sequencing (n = 250). Because prompt and reproducible gene expression analysis is vital for PS29MRC to be incorporated into trials or the clinical routine, we identified the 29 gene expression markers in this study using the fast and automated NanoString platform, which is already used for the risk calculation of recurrence in breast cancer.[9] The NanoString method is based on direct digital detection of messenger RNA (mRNA) molecules of interest using target-specific color-coded probe pairs.[10] Even mRNA samples with less-than-ideal quality can be measured precisely and in a short period of time.[10] We set out to transfer PS29MRC to a clinically applicable platform and validate its predictive performance and prognostic value in an independent multicenter cohort of patients who recently underwent intensive treatment.

Patients and methods

Patients and inclusion criteria

This study included 384 intensively treated patients who were enrolled in the multicenter German AML Cooperative Group (AMLCG) Registry (DRKS00020816) between 2009 and 2019. Only adult patients (≥18 years) with newly diagnosed AML (de novo or secondary to myelodysplastic syndromes or therapy-related) and material available for analysis were included. The diagnosis of AML was made according to the World Health Organization (WHO) 2008 criteria.[11] Patients with acute promyelocytic leukemia or extramedullary disease without systemic involvement were excluded. All patients were treated with intensive front-line induction therapy: sequential high-dose cytarabine and mitoxantrone (n = 226, 59%), cytarabine and anthracyclines (7 + 3; n = 121, 31%), and other intensive regimens such as thioguanine, cytarabine, and daunorubicin and/or cytarabine and mitoxantrone [TAD-HAM, HAM(-HAM); n = 37, 10%].[12] Second induction or salvage treatment following AMLCG recommendations was given whenever possible in case of RD after the first induction cycle. Cytogenetic and genetic analyses and measurement of the FLT3-ITD allelic ratio were performed centrally, as recently reported.[12] Following its approval, 15 patients received the FLT3 inhibitor midostaurin during induction treatment.[13] The AMLCG Registry was approved by the ethics committee of Technische Universität Dresden (EK 98032010) and is registered in the German Clinical Trials Register (DRKS00020816). Written informed consent was obtained from all participants. Ethic committees of the participating institutions approved all protocols, and patients were treated according to the Declaration of Helsinki.

Sample collection, RNA purification, and measurement of gene expression

Pretreatment leukemic marrow samples (n = 320; 83%) or blood samples (n = 64, 17%) were processed using a Ficoll-Paque gradient and stored at −80°C at the Laboratory for Leukemia Diagnostics, University Hospital of Munich. The median percentage of marrow and blood blasts was 68.5% (range, 9-97%) and 11.5% (range, 0-98%), respectively. Samples with detectable blasts in marrow or blood were processed further. RNA was isolated using a QIAcube robotic workstation, according to the RNeasy protocol (QIAGEN, Hilden, Germany). The quality and concentration of total RNA were assessed using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE). Samples with a concentration > 20 ng/µL and purity with an A260/A280 ratio of 1.5 to 2.3 were used for further analysis (n = 373). For NanoString gene expression profiling, a customized set of barcoded probes containing the 29 genes of interest was used for analysis (CodeSet). Six positive controls, spiked-in at fixed proportional concentrations, and 8 negative controls, used to assess background and nonspecific binding, were included as recommended by the manufacturer (NanoString Technologies, Seattle, WA). The 4 housekeeping genes (ABL, GAPDH, PGK1, RPS27) were also part of the CodeSet (for details see supplemental Table 1). The analyses were performed on an nCounter FLEX Analysis System, which is approved for clinical diagnostics applications, such as the US Food and Drug Administration–approved Prosigna test.[9] Hybridization between target mRNA and reporter-capture probe pairs was performed according to the NanoString protocol. The hybridized samples were then processed by a fully automated nCounter Prep station robot.[10] After placing the cartridge in the nCounter Digital Analyzer, data were collected by taking magnified images of the immobilized fluorescent reporters with a CCD camera.[10] Results from digital data acquisition were processed using nSolver 4.0 Analysis Software (NanoString Technologies). Raw data were evaluated using several quality control metrics to measure imaging quality, oversaturation, and overall signal/noise ratio. Gene expressions of all samples meeting quality control metrics (n = 368) were log transformed and normalized using the default settings. In a pilot study, we performed gene expression analysis using NanoString on a cohort of 48 pretreatment leukemic samples from our previous study, allowing us to compare gene expression values measured by Affymetrix microarrays with those analyzed by the novel platform on the same data set (supplemental Figure 1).[8] To transfer the previously defined optimal cutoff value onto the score calculated using NanoString expression values, we created dichotomous scores for NanoString data using different cutoff values and compared their classification concordance with the original score. The optimal cutoff was chosen as the one maximizing the concordance between the 2 scores (new cutoff = 0.4). Of note, the outcome of patients was not used for recalculating the score; only gene expression data measured by the 2 platforms on the same patient cohort were compared.

Statistical analyses

For the terms “prognostic” and “predictive” we used the definitions proposed by Clark et al.[14] Primary outcome was treatment failure: RD, partial remission, death in aplasia, or death due to indeterminate cause. Response criteria were defined according to ELN-2017 (for a more detailed discussion of end point definition see supplemental Appendix).[1] Patients without cytogenetic data (n = 9) or evaluation of induction response (n = 8) were excluded. Overall survival (OS) was defined as the time from AML diagnosis to death from any cause and was censored at the time of the last follow-up. PS29MRC was calculated as the weighted linear sum of 29 gene expression values and cytogenetic classification, according to the MRC.[6,8] The formula for PS29MRC is given in the supplemental Appendix. The χ2 test was used to compare categorical variables, whereas the Mann-Whitney U test was applied for continuous variables. Adjustment for multiple hypothesis testing was performed using the Benjamini-Hochberg procedure.[15] Time to event variables were analyzed with the Kaplan-Meier method and Cox proportional hazards regression model. Logistic regression was applied to analyze the association of variables with the treatment outcome. All statistical analyses were performed with statistical software R (version 4.0.3; R Foundation for Statistical Computing, Vienna, Austria).

Results

Patient characteristics

A flowchart of the study is given in Figure 1. Treatment outcome and the predictive score were available for 351 patients, with 249 (71%) responses (197 CRs, 52 CRs with incomplete hematologic recovery) and 102 patients (29%) showing evidence of therapy failure (68 RDs, 11 partial remissions, 10 deaths in aplasia, 13 deaths due to indeterminate cause). Most patients (292, 83%) were diagnosed with de novo AML, according to WHO criteria.[16] The cohort was evenly distributed between younger patients (<60 years; 184, 52%) and older patients (≥60 years; 167, 48%). The median age was 58 years (range, 18-87). Induction failure was observed more frequently in older patients (n = 67, 40%) than in younger patients (n = 35, 19%). Several studies did not show any significant differences in outcome between the treatment regimens included in this analysis.[17,18] However, an analysis of patients receiving different treatment regimens is provided in the results section of the supplemental Appendix. The median follow up was 3.3 years. The patients’ baseline characteristics are presented in supplemental Table 2.

Figure 1.

Consort diagram. APL, acute promyelocytic leukemia; MDS, myelodysplastic syndrome; MPS, myeloproliferative syndrome.

Predictive value of the continuous score in the total cohort and risk subgroups

The continuous score (PS29MRCcont) was predictive of treatment failure with an odds ratio (OR) of 2.37 (95% confidence interval [CI], 1.82-3.18; P = 1.20 × 10−9) and an area under the receiver operating characteristic curve (AUC) of 0.75 (Figure 2). Because the score includes the MRC classification of AML, we wanted to test the independent predictive value of its gene expression components by evaluating the score within different MRC groups.[6] MRC classification was available for 351 patients (favorable: n = 33, 10%; intermediate: n = 250, 71%; adverse: n = 68, 19%). The score was predictive for induction failure in the intermediate-risk (OR, 2.75; 95% CI, 1.82-4.36; P = 5.66 × 10−6) and adverse-risk (OR, 1.87; 95% CI, 1.15-3.38; P = .021) groups. It did not reach significance in the favorable-risk group (OR, 18.13; 95% CI, 0.93-12366.72; P = .22), likely because of the small number of treatment failures (n = 3). Furthermore, we tested the score in the risk groups defined by the ELN-2017 classification.[1] ELN-2017 classification, PS29MRCcont, and outcome were available for 301 patients. The categories were more evenly distributed (favorable: n = 129, 43%; intermediate: n = 68, 23%; adverse: n = 104, 35%). The score was significantly predictive of treatment failure in the favorable-risk group (OR, 3.45; 95% CI, 1.50-9.12; P = 6.59 × 10−3) and the adverse-risk group (OR, 1.73; 95% CI, 1.18-2.71; P = 9.92 × 10−3). It reached borderline significance in the intermediate subgroup (OR, 1.85; 95% CI, 1.06-4.01; P = .064). An overview of the subgroup analysis is given in supplemental Table 3. PS29MRCcont was able to identify patients at high risk for treatment failure in various risk subgroups. The score was well calibrated for the first half of the predicted values, but it overestimated the risk for patients with very high scores. However, the number of patients with very high predicted risk was rather small, which might have influenced the poor calibration for these values (supplemental Figure 2).

Figure 2.

Comparison of different predictive classifiers of induction failure in AML. Receiver operating curves (A) and precision-recall curves (B) comparing the prediction of induction failure of PS29MRC, the clinical score of Walter et al, and the retrained response LSC17 score.

Performance of PS29MRC in comparison with other predictive classifiers

We compared the PS29MRCcont with the clinical model of Walter et al[5] and the gene expression–based retrained response LSC17 score of Ng et al.[7] To examine the predictive ability of each model, we analyzed the AUC accordingly and compared the values with PS29MRCcont (Figure 2). The clinical score of Walter et al and an assessment of the induction response were available for 342 of 384 patients. The score reached an AUC of 0.53 (OR, 1.00; 95% CI, 0.98-1.01; P = .66). The individual score components that significantly predicted induction outcomes were age (OR, 1.04; 95% CI, 1.02-1.06; P = 1.37 × 10−4), favorable cytogenetics (OR, 0.23; 95% CI, 0.05-0.66; P = .016), adverse cytogenetics (OR, 2.33; 95% CI, 1.35-4.01; P = 2.22 × 10−3), and NPM1 mutations in the absence of FLT3-ITD (OR, 0.45; 95% CI, 0.23-0.83; P = .015). The diagnosis of secondary AML (OR, 1.70; 95% CI, 0.96-2.94; P = .063) was slightly above the level of significance. The Eastern Cooperative Oncology Group Performance Status (ECOG), sex, white blood count, platelets, bone marrow blasts, and a mutated FLT3-ITD status did not reach the level of significance (supplemental Table 4). The retrained response LSC17 score is calculated as a weighted linear sum of 6 LSC17 gene expressions (MMRN1, KIAA0125, CD34, GPR56, LAPTM4B, NYNRIN) and was available for 360 patients.[7] The score reached an AUC of 0.56 (OR, 1.001; 95% CI, 1.000-1.002; P = .046). When comparing the ability to predict induction failure, PS29MRCcont was superior to the clinical score of Walter et al and the retrained response LSC17 score in univariate and multivariable analyses.

Predictive value of the dichotomous score in risk subgroups

When applying the cutoff defined in our previous study, PS29MRCdic was highly significant in the prediction of treatment failure (OR, 4.81; 95% CI, 2.95-7.93; P = 4.15 × 10−10). Although the specificity for RD was high (77%), the sensitivity of PS29MRCdic was only moderate (59%), reaching an overall accuracy of 72% (Table 1). When excluding patients with death in aplasia or death due to indeterminate cause (n = 23), the sensitivity of the classifier improved slightly from 59% to 62% (for a more detailed analysis see supplemental Appendix).

Table 1.

Diagnostic validity contingency table and parameter estimates of PS29MRC

PS29MRCdic	Induction failure	Induction response	Total	Measure (95% CI)
High	60	57	117	PPV: 0.51 (95% CI, 0.42-0.61)
Low	42	192	234	NPV: 0.82 (95% CI, 0.77-0.87)
Total	102	249	351
	SEN: 0.59 (95% CI, 0.49-0.68)	SPE: 0.77 (95% CI, 0.71-0.82)		DOR: 4.81 (95% CI, 2.94-7.88)

CI, confidence interval; DOR, diagnostic OR; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity.

Diagnostic validity contingency table and parameter estimates of PS29MRC CI, confidence interval; DOR, diagnostic OR; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity. Of 351 patients with available induction results and MRC classification, 117 (33%) were considered PS29MRCdic high risk. The median OS for PS29MRCdic high-risk patients was 1.8 years, whereas it was 4.3 years for PS29MRCdic low-risk patients. The classifier showed an accuracy of 91%, 73%, and 59% within the favorable-risk (n = 33; nonresponder: n = 0/3; responder: n = 30/30), intermediate-risk (n = 250; nonresponder: n = 33/69; responder: n = 149/181), and adverse-risk (n = 68; nonresponder: n = 27/30; responder: n = 13/38) MRC subgroups, respectively (supplemental Table 5). Furthermore, we tested the dichotomized score in the risk groups as defined by ELN-2017. ELN-2017 and outcome variables were available for 301 patients. In the favorable-risk (n = 129; nonresponder: n = 4/17, responder: n = 106/112), intermediate-risk (n = 68; nonresponder: n = 10/21, responder: n = 39/47), and adverse-risk (n = 104; nonresponder: n = 37/48, responder: n = 24/56) ELN-2017 subgroups, PS29MRCdic reached an accuracy of 85%, 72%, and 59%, respectively (supplemental Table 6). The score was predictive of treatment failure in the favorable-risk (OR, 5.44; 95% CI, 1.26-21.72; P = .017), intermediate-risk (OR, 4.43; 95% CI, 1.43-14.44; P = .011), and adverse-risk (OR, 2.52; 95% CI, 1.09-6.11; P = .034) groups. The dichotomous score significantly predicted induction failure in younger patients (<60 years; OR, 3.10; 95% CI, 1.41-6.80; P = 4.57 × 10−3) and older patients (≥60 years; OR, 5.26; 95% CI, 2.72-10.46; P = 1.25 × 10−6) (supplemental Table 3).

Prognostic value of PS29MRC

RD is associated with inferior survival. Since PS29MRC was predictive of RD, we performed survival analysis. The continuous classifier (hazard ratio [HR], 1.38; 95% CI, 1.21-1.58; P = 2.62 × 10−6) and the dichotomous classifier (HR, 1.73; 95% CI, 1.27-2.37; P = 5.12 × 10−4) were significant prognostic markers. We observed an inferior OS among PS29MRCdic high-risk patients, particularly within the older subgroup (HR, 1.59; 95% CI, 1.06-2.39; P = .025) (Figure 3A-C). A prognostic analysis of relapse-free survival is provided in supplemental Appendix.

Figure 3.

PS29MRCdic identifies patients with AML with inferior prognosis. Kaplan-Meier curve showing outcomes of patients according to the PS29MRC risk groups. (A) Outcomes of all patients. (B) Outcomes of patients younger than 60 years. (C) Outcomes of patients ≥60 years of age. (D) Comparison of patients with TP53 mutations and patients without a TP53 mutation, but with high PS29MRC values (top 10%). (E) Proportions of RD in groups defined by 4 risk factors (ELN-2017, complex karyotype, age, TP53 ), and the PS29MRCdic high-risk group within each risk category. The striped bar represents the high-risk category for each risk factor. adv, adverse risk; fav, favorable risk; int, intermediate risk; KT, karyotype; mut, mutated, wt, wild-type. TP53 mutations in AML are associated with a dismal outcome; therefore, this very high–risk subgroup of patients requires special attention.[12] In our cohort, patients with mutated TP53 (n = 14) had a median OS of only 1.4 years. Patients who were among the 10% with the highest PS29MRC score, but who did not have a TP53 mutation, had a survival comparable to patients with a TP53 mutation (Figure 3D). Moreover, patients without a TP53 mutation, but with a high-risk PS29MRCdic, had a higher probability for RD than did patients with a TP53 mutation (Figure 3E). Likewise, patients with other risk factors, such as a complex karyotype or older age, had a similar risk for RD as did patients without the risk factor, but with a high-risk PS29MRCdic (Figure 3E).

Individual risk prediction

To help guide decision making for physicians, as well as for patients, we calculated the individual risk of induction failure using PS29MRCcont (Figure 4). Each score is associated with a percentage of the patient’s risk of not responding to intensive chemotherapy at the time point of their initial diagnosis. PS29MRCcont ranges from −5.91 to 5.72 (median, −0.01). Although patients with high PS29MRCcont scores tend to fare poorly, more favorable outcomes are observed in patients with low PS29MRCcont scores. Patients with a PS29MRCcont ≥ 4.0 have a ≥90% risk for RD, and patients with a PS29MRCcont score of −1.0 or less have a probability of induction failure that is <10%.

Figure 4.

Individual risk prediction in patients with AML. Plot showing the probability of induction failure with a cutoff at PS29MRCcont = 0.4 (blue dashed line).

Multivariate analysis

Because PS29MRCcont and the dichotomous classifier were highly significant in predicting induction failure in univariate models, we also performed multivariable analysis with predictive variables having a P value ≤ .10 in univariate analysis (Table 2). In multivariate analysis, only PS29MRCdic and age remained significant in the model. Additionally, we used these variables to perform forward and backward selection with the Akaike information criterion as the selection criterion. Forward and backward selection chose the optimal model as the one consisting of PS29MRCdic, age, and ELN-2017. In forward selection, PS29MRCdic was the first variable to enter the model. When analyzing the subgroup of older patients (≥60 years), the dichotomous score was the only variable that was significantly associated with RD (Table 3).

Table 2.

Univariate and multivariable analysis of induction failure

Variable	Multivariable analysis, n = 227		Model selection		Univariate analysis		Evaluable patients, n
Variable	OR (95% CI)	P	OR (95% CI)	P	OR (95% CI)	P	Evaluable patients, n
PS29MRCdic	3.47 (1.65-7.39)	1.09 × 10⁻³	3.54 (1.74-7.33)	.00054	4.81 (2.95-7.93)	4.15 × 10⁻¹⁰	351
Retrained response LSC17	1.00 (1.00-1.00)	.65			1.001 (1.000-1.002)	.046	360
Age continuous	1.03 (1.00-1.06)	.044	1.03 (1.00-1.05)	.043	1.04 (1.02-1.06)	1.37 × 10⁻⁴	375
Secondary AML	1.06 (0.45-2.43)	.89			1.70 (0.96-2.94)	.063	375
NPM1mut	0.81 (0.34-1.93)	.64			0.49 (0.29-0.79)	4.80 × 10⁻³	372
RUNX1mut	1.27 (0.46-3.44)	.64			2.58 (1.24-5.35)	.011	238
TP53mut	0.67 (0.16-2.75)	.58			2.62 (0.93-7.43)	.065	237
ASXL1mut	0.97 (0.34-2.75)	.96			2.21 (0.98-4.87)	.051	237
ELN-2017fav	0.41 (0.12-1.39)	.15	0.33 (0.13-0.81)	.016	0.17 (0.09-0.32)	5.72 × 10⁻⁸	319
ELN-2017int	0.94 (0.33-2.70)	.91	0.88 (0.41-1.92)	.75	0.50 (0.26-0.94)	.033	319

fav, favorable risk; int, intermediate risk; mut, mutated; P-values marked in bold indicate numbers that are significant (P < .05).

Table 3.

Univariate and multivariable analysis of induction failure among older patients

Variable	Multivariable analysis, n = 82		Univariate analysis		Evaluable patients, n
Variable	OR (95% CI)	P	OR (95% CI)	P	Evaluable patients, n
PS29MRCdic	4.41 (1.55-13.41)	6.62 × 10⁻³	5.26 (2.72-10.46)	1.25 × 10⁻⁶	145
Retrained response LSC17	1.00 (1.00-1.00)	.39	1.00 (1.00-1.00)	.12	152
Age continuous	1.01 (0.91-1.11)	.90	1.01 (0.96-1.07)	.60	161
Secondary AML	0.68 (0.22-1.93)	.48	1.25 (0.62-2.47)	.53	161
NPM1mut	0.76 (0.22-2.76)	.67	0.31 (0.15-0.60)	7.47 × 10⁻⁴	159
RUNX1mut	1.50 (0.36-6.46)	.58	1.91 (0.71-5.12)	.19	91
TP53mut	0.58 (0.08-3.97)	.58	1.44 (0.39-5.11)	.57	90
ASXL1mut	2.31 (0.54-10.44)	.26	2.12 (0.78-5.85)	.14	90
ELN-2017fav	1.06 (0.15-7.39)	.95	0.20 (0.08-0.44)	1.40 × 10⁻⁴	128
ELN-2017int	2.14 (0.42-11.18)	.36	0.81 (0.33-1.98)	.65	128

fav, favorable risk; int, intermediate risk; mut, mutated; P-values marked in bold indicate numbers that are significant (P < .05).

Univariate and multivariable analysis of induction failure fav, favorable risk; int, intermediate risk; mut, mutated; P-values marked in bold indicate numbers that are significant (P < .05). Univariate and multivariable analysis of induction failure among older patients fav, favorable risk; int, intermediate risk; mut, mutated; P-values marked in bold indicate numbers that are significant (P < .05). Comparable results were seen with PS29MRCcont (supplemental Table 7). A multivariable model with prognostic variables is provided in supplemental Table 8.

Discussion

In this study, PS29MRC was successfully transferred to the NanoString platform and independently validated in a multicenter AML patient cohort that was treated between 2009 and 2019. This analysis further confirms the predictive and prognostic value of PS29MRC. The NanoString platform is routinely used in stratifying the risk of breast cancer recurrence and is widely available.[9,19] Gene expression measurements using NanoString are highly robust, reproducible, and fast.[20] Automated RNA preparation and measurement can be conducted within 2 days. The platform allows physicians to immediately apply PS29MRC and presents a method for the translation of the classifier into clinical trials and practice. We also transferred and validated a previously defined threshold to the novel platform. The threshold can be used to identify patients with a high risk for induction failure, although larger patient cohorts would probably be necessary to find a more optimal cutoff for the NanoString platform. Although the sensitivity for predicting patients with induction failure was only moderate (59%), the specificity was high (77%). A more refined cutoff may improve sensitivity. Furthermore, the classifier reached a fair predictive performance with an AUC of 0.75, which is remarkable in this field, although there is still room for improvement. Additional factors not captured by the classifier seem to influence response to treatment. Important risk factors, such as age or gene mutations, are not reflected in the score. Additional omics data (eg, methylation profiling) or more recently identified prognostic factors (eg, splicing profiles) may have the potential to refine our models.[21,22] Clinical classifiers must always be viewed in connection with the analyzed end point. Several classifiers, such as the AML score or PINA score, help to estimate complete remission and early death rate in patients ≥60 years of age or the probabilities of OS.[23,24] We decided to focus our analysis on the important end point RD and compared our classifier with 2 of the most important and well-known models: the clinical score of Walter et al and the retrained response LSC17 score. PS29MRC outperformed these clinical or gene expression–based classifiers.[5,7] The reasons for this are speculative, but some possibilities are discussed below. The clinical score of Walter et al and the retrained response LSC17 score were developed using data sets from patients who were primarily treated in the 1990s or early 2000s. Since then, substantial improvements in supportive care have been included in clinical management. The patients in our cohort were treated within the last 10 years, most of them within the last 5 years (n = 227; 65% of patients treated between 2015 and 2019), which may account for some differences. Another factor might be that PS29MRC combines the prognostic information of cytogenetics with gene expression variables. Previous classifiers relied only on gene expression analysis or a combination of cytogenetics, a few mutations, and clinical variables. It seems that the combination of gene expression data and cytogenetics, as achieved in PS29MRC, summarizes information from 2 worlds and results in a more powerful predictor. In the context of different end points only achieved after CR (eg, relapse-free survival), PS29MRC performed far less effectively (supplemental Appendix). It is tempting to speculate that the mechanisms of resistance and relapse differ and are not represented equally by the classifier designed to specifically identify RD. In several analyses, we were able to demonstrate that PS29MRC added predictive and prognostic information to subgroups defined by MRC, ELN-2017, or age. Particularly, the score significantly predicted RD within older patients and was the only predictive variable left in the multivariate model. In addition, PS29MRC identifies very high-risk patients who have an equally dismal prognosis as those with TP53 mutations who are not identified by current classification approaches. Older and very high–risk patients resemble subgroups of high clinical relevance, and clinicians are familiar with discussions if a patient benefits from intensive induction treatment. This discussion gained further relevance as the result of the implementation of alternative treatment regimens that showed promising results.[25] As an example, the combination of azacitidine and venetoclax proved to be effective and might be a valuable option in older or very high–risk patients with a low probability of achieving CR with cytarabine and anthracycline–based induction treatment.[25] PS29MRC may facilitate clinical decision making within this subgroup of patients. Unfortunately, we were not able to analyze a patient cohort of relevant size that was treated with a combination of azacitidine and venetoclax. Future evaluation of PS29MRC must focus on this alternative or other recently approved regimens, such as CPX-351 or standard 7 + 3 chemotherapy with gemtuzumab-ozogamicin or FLT3 inhibitors.[26,27] Of note, when analyzing the small group of patients (n = 15) who received the FLT3 inhibitor midostaurin after approval in the European Union, PS29MRCcont indicated a trend toward a possible prediction of RD. Of 2 PS29MRCdic high-risk patients, 1 patient experienced treatment failure. Of 13 PS29MRCdic low-risk patients, 9 patients achieved CR/CR with incomplete hematologic recovery (P = .091). However, these data are too preliminary to allow any conclusions, and analyses of larger cohorts of patients are warranted. Informed consent is critical when talking to patients with cancer about their treatment options.[28] By establishing a model for individual risk prediction, PS29MRC provides additional information on the risks and benefits of induction therapy. Communication between patients and physicians may be facilitated. In summary, we further confirmed PS29MRC as a valuable classifier to identify high-risk patients with AML. The score was successfully transferred to a platform that is widely available. Analysis can be conducted quickly, and it may help to guide decision making. Risk classification of patients with AML can still be refined beyond ELN-2017, and concerted efforts are needed to improve the prognosis of the large proportion of patients with very high-risk AML.

Supplementary Material

The full-text version of this article contains a data supplement. Click here for additional data file.

27 in total

1. Seeking informed consent to cancer clinical trials: describing current practice.

Authors: R F Brown; P N Butow; P Ellis; F Boyle; M H N Tattersall
Journal: Soc Sci Med Date: 2004-06 Impact factor: 4.634

2. Digital multiplexed gene expression analysis using the NanoString nCounter system.

Authors: Meghana M Kulkarni
Journal: Curr Protoc Mol Biol Date: 2011-04

3. Effect of gemtuzumab ozogamicin on survival of adult patients with de-novo acute myeloid leukaemia (ALFA-0701): a randomised, open-label, phase 3 study.

Authors: Sylvie Castaigne; Cécile Pautas; Christine Terré; Emmanuel Raffoux; Dominique Bordessoule; Jean-Noel Bastie; Ollivier Legrand; Xavier Thomas; Pascal Turlure; Oumedaly Reman; Thierry de Revel; Lauris Gastaud; Noémie de Gunzburg; Nathalie Contentin; Estelle Henry; Jean-Pierre Marolleau; Ahmad Aljijakli; Philippe Rousselot; Pierre Fenaux; Claude Preudhomme; Sylvie Chevret; Hervé Dombret
Journal: Lancet Date: 2012-04-05 Impact factor: 79.321

4. Midostaurin plus Chemotherapy for Acute Myeloid Leukemia with a FLT3 Mutation.

Authors: Richard M Stone; Sumithra J Mandrekar; Ben L Sanford; Kristina Laumann; Susan Geyer; Clara D Bloomfield; Christian Thiede; Thomas W Prior; Konstanze Döhner; Guido Marcucci; Francesco Lo-Coco; Rebecca B Klisovic; Andrew Wei; Jorge Sierra; Miguel A Sanz; Joseph M Brandwein; Theo de Witte; Dietger Niederwieser; Frederick R Appelbaum; Bruno C Medeiros; Martin S Tallman; Jürgen Krauter; Richard F Schlenk; Arnold Ganser; Hubert Serve; Gerhard Ehninger; Sergio Amadori; Richard A Larson; Hartmut Döhner
Journal: N Engl J Med Date: 2017-06-23 Impact factor: 91.245

5. Acute Myeloid Leukemia (AML): different treatment strategies versus a common standard arm--combined prospective analysis by the German AML Intergroup.

Authors: Thomas Büchner; Richard F Schlenk; Markus Schaich; Konstanze Döhner; Rainer Krahl; Jürgen Krauter; Gerhard Heil; Utz Krug; Maria Cristina Sauerland; Achim Heinecke; Daniela Späth; Michael Kramer; Sebastian Scholl; Wolfgang E Berdel; Wolfgang Hiddemann; Dieter Hoelzer; Rüdiger Hehlmann; Joerg Hasford; Verena S Hoffmann; Hartmut Döhner; Gerhard Ehninger; Arnold Ganser; Dietger W Niederwieser; Markus Pfirrmann
Journal: J Clin Oncol Date: 2012-09-10 Impact factor: 44.544

Review 6. An update of current treatments for adult acute myeloid leukemia.

Authors: Hervé Dombret; Claude Gardin
Journal: Blood Date: 2015-12-10 Impact factor: 22.113

7. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay.

Authors: Brett Wallden; James Storhoff; Torsten Nielsen; Naeem Dowidar; Carl Schaper; Sean Ferree; Shuzhen Liu; Samuel Leung; Gary Geiss; Jacqueline Snider; Tammi Vickery; Sherri R Davies; Elaine R Mardis; Michael Gnant; Ivana Sestak; Matthew J Ellis; Charles M Perou; Philip S Bernard; Joel S Parker
Journal: BMC Med Genomics Date: 2015-08-22 Impact factor: 3.063

8. CPX-351 (cytarabine and daunorubicin) Liposome for Injection Versus Conventional Cytarabine Plus Daunorubicin in Older Patients With Newly Diagnosed Secondary Acute Myeloid Leukemia.

Authors: Jeffrey E Lancet; Geoffrey L Uy; Jorge E Cortes; Laura F Newell; Tara L Lin; Ellen K Ritchie; Robert K Stuart; Stephen A Strickland; Donna Hogge; Scott R Solomon; Richard M Stone; Dale L Bixby; Jonathan E Kolitz; Gary J Schiller; Matthew J Wieduwilt; Daniel H Ryan; Antje Hoering; Kamalika Banerjee; Michael Chiarella; Arthur C Louie; Bruno C Medeiros
Journal: J Clin Oncol Date: 2018-07-19 Impact factor: 44.544

9. A 29-gene and cytogenetic score for the prediction of resistance to induction treatment in acute myeloid leukemia.

Authors: Tobias Herold; Vindi Jurinovic; Aarif M N Batcha; Stefanos A Bamopoulos; Maja Rothenberg-Thurley; Bianka Ksienzyk; Luise Hartmann; Philipp A Greif; Julia Phillippou-Massier; Stefan Krebs; Helmut Blum; Susanne Amler; Stephanie Schneider; Nikola Konstandin; Maria Cristina Sauerland; Dennis Görlich; Wolfgang E Berdel; Bernhard J Wörmann; Johanna Tischer; Marion Subklewe; Stefan K Bohlander; Jan Braess; Wolfgang Hiddemann; Klaus H Metzeler; Ulrich Mansmann; Karsten Spiekermann
Journal: Haematologica Date: 2017-12-14 Impact factor: 9.941

10. Sequential high-dose cytarabine and mitoxantrone (S-HAM) versus standard double induction in acute myeloid leukemia-a phase 3 study.

Authors: Jan Braess; Susanne Amler; Karl-Anton Kreuzer; Karsten Spiekermann; Hans Walter Lindemann; Eva Lengfelder; Ullrich Graeven; Peter Staib; Wolf-Dieter Ludwig; Harald Biersack; Yon-Dschun Ko; Michael J Uppenkamp; Maike De Wit; Stefan Korsten; Rudolf Peceny; Tobias Gaska; Xaver Schiel; Dirk M Behringer; Michael G Kiehl; Bettina Zinngrebe; Gerald Meckenstock; Eva Roemer; Dirk Medgenberg; Ernst Spaeth-Schwalbe; Gero Massenkeil; Heidrun Hindahl; Rainer Schwerdtfeger; Guido Trenn; Cristina Sauerland; Raphael Koch; Martin Lablans; Andreas Faldum; Dennis Görlich; Stefan K Bohlander; Stephanie Schneider; Annika Dufour; Christian Buske; Michael Fiegl; Marion Subklewe; Birgit Braess; Michael Unterhalt; Anja Baumgartner; Bernhard Wörmann; Dietrich Beelen; Wolfgang Hiddemann
Journal: Leukemia Date: 2018-10-01 Impact factor: 11.528