| Literature DB >> 33854980 |
Adrián Mosquera Orgueira1,2,3, Andrés Peleteiro Raíndo1,2,3, Miguel Cid López1,2,3, José Ángel Díaz Arias1,2, Marta Sonia González Pérez1,2, Beatriz Antelo Rodríguez1,2,3, Natalia Alonso Vence1,2,3, Laura Bao Pérez1,2,3, Roi Ferreiro Ferro1,2, Manuel Albors Ferreiro1,2, Aitor Abuín Blanco1,2, Emilia Fontanes Trabazo1,2, Claudio Cerchione4, Giovanni Martinnelli4, Pau Montesinos Fernández5, Manuel Mateo Pérez Encinas1,2,3, José Luis Bello López1,2,3.
Abstract
Acute Myeloid Leukemia (AML) is a heterogeneous neoplasm characterized by cytogenetic and molecular alterations that drive patient prognosis. Currently established risk stratification guidelines show a moderate predictive accuracy, and newer tools that integrate multiple molecular variables have proven to provide better results. In this report, we aimed to create a new machine learning model of AML survival using gene expression data. We used gene expression data from two publicly available cohorts in order to create and validate a random forest predictor of survival, which we named ST-123. The most important variables in the model were age and the expression of KDM5B and LAPTM4B, two genes previously associated with the biology and prognostication of myeloid neoplasms. This classifier achieved high concordance indexes in the training and validation sets (0.7228 and 0.6988, respectively), and predictions were particularly accurate in patients at the highest risk of death. Additionally, ST-123 provided significant prognostic improvements in patients with high-risk mutations. Our results indicate that survival of patients with AML can be predicted to a great extent by applying machine learning tools to transcriptomic data, and that such predictions are particularly precise among patients with high-risk mutations.Entities:
Keywords: acute myeloid leukemia; cancer; gene expression; machine learning; prognosis; survival
Year: 2021 PMID: 33854980 PMCID: PMC8040929 DOI: 10.3389/fonc.2021.657191
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Predicted individual survival curves according to the best random forests model. (A) Out-of-bag survival curves predicted for patients within the training cohort. The thick red line represents overall ensemble survival and the thick green line indicates the Nelson-Aalen estimator. (B) Individual survival curves predicted for patients within the test cohort. The thick red line represents overall ensemble survival. (C) Representation of out-of-bag CRPS over time. Red line is the overall CRPS. Additionally, stratified CRPS by quarters of predicted ensemble mortality are provided. Vertical lines above the x axis represent death events.
Figure 2Kaplan-Meyer plots representing the survival of patients depending on their classification to each quartile of predicted survival by ST-123 in the training (A) and validation (B) cohorts. (C) Kaplan-Meyer plots representing the outcomes of patients affected by high-risk mutations (TP53 mutation/deletion, ASXL1 mutation or RUNX1 mutation) depending on their classification to the upper or lower median of predicted mortality by ST-123 in the validation set.
C-indexes of ST-123 after restricting the analysis to different time points since diagnosis.
| Days since diagnosis | C-index (Training set) | C-index (Test set) |
|---|---|---|
|
| 68.65 | 59.66 |
|
| 70.13 | 59.54 |
|
| 71.88 | 60.19 |
|
| 73.86 | 62.13 |
|
| 74.06 | 63.11 |
|
| 71.94 | 69.17 |