| Literature DB >> 33087075 |
Adrián Mosquera Orgueira1,2,3,4, José Ángel Díaz Arias5,6,7, Miguel Cid López5,6, Andrés Peleteiro Raíndo5,6, Beatriz Antelo Rodríguez5,6,7, Carlos Aliste Santos5,8, Natalia Alonso Vence5,6, Ángeles Bendaña López5,6, Aitor Abuín Blanco5,6, Laura Bao Pérez5,6, Marta Sonia González Pérez5,6, Manuel Mateo Pérez Encinas5,6,7, Máximo Francisco Fraga Rodríguez5,7,8, José Luis Bello López5,6,7.
Abstract
BACKGROUND: Thirty to forty percent of patients with Diffuse Large B-cell Lymphoma (DLBCL) have an adverse clinical evolution. The increased understanding of DLBCL biology has shed light on the clinical evolution of this pathology, leading to the discovery of prognostic factors based on gene expression data, genomic rearrangements and mutational subgroups. Nevertheless, additional efforts are needed in order to enable survival predictions at the patient level. In this study we investigated new machine learning-based models of survival using transcriptomic and clinical data.Entities:
Keywords: DLBCL; Lymphoma; Prediction; Survival; Transcriptomics
Mesh:
Substances:
Year: 2020 PMID: 33087075 PMCID: PMC7579992 DOI: 10.1186/s12885-020-07492-y
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Patient characteristics
| Cohort | GSE10846 | GSE23501 | |
|---|---|---|---|
Fig. 1Kaplan-Meier plots of both 4-gene expression based clusters in the training (a) and test (b) cohorts. The blue line represents patients in the high-risk cluster (cluster 1), and the red line represents the remaining group of patients (cluster 2). Survival probability is represented in the y axis. Time scale (in years) is represented in the x axis
Fig. 2Scatterplot matrix representing the distribution of patients according to the expression of TNFRSF9, BIRC3, BCL2L1 and G3BP2. Separate plots are provided for the training (a) and test (b) cohorts. Red dots represent patients in the high-risk cluster (cluster 1), whereas black dots represent the remaining patients (cluster 2)
Patient characteristics by subgroups using 4-gene based clusterization
| Cohort | GSE10846 | GSE23501 | |||
|---|---|---|---|---|---|
| 184 | 49 | 51 | 13 | ||
| 60.32 | 46.94 | 74.51 | 61.53 | ||
| 61 | 63 | 62 | 71 | ||
| 41.30% | 63.26% | 27.45 | 38.46 | ||
| 42.93% | 28.57% | 56.86 | 61.54 | ||
| 15.76% | 8.16% | 15.69 | 0 | ||
Random Forest models for overall survival prediction. C-index results are presented for each combination of variables in the training and test cohorts
| Training Cohort | Test Cohort | |
|---|---|---|
| 0.5934 | 0.6301 | |
| 0.7530 | 0.6649 | |
| 0.7783 | 0.7415 | |
| 0.6340 | 0.6202 | |
| 0.6761 | 0.6837 | |
| 0.6725 | 0.6971 | |
| 0.7059 | 0.7221 | |
| 0.7792 | 0.7558 | |
| 0.7784 | 0.7487 | |
| 0.7788 | 0.7522 | |
| 0.7889 | 0.7416 | |
| 0.7854 | 0.7538 | |
| 0.7896 | 0.7596 | |
| 0.8051 | 0.7615 | |
| 0.8404 | 0.7942 |
Fig. 3Predicted individual survival curves according to the most accurate random forest model (see text). a) Out-of-bag survival curves predicted for patients within the training cohort (discontinuous black lines). The thick red line represents overall ensemble survival and the thick green line indicates the Nelson-Aalen estimator. b) Individual survival curves predicted for patients within the test cohort (discontinuous black lines). The thick red line represents overall ensemble survival. Time scale is in years